Using the GHC API to write an interpreter

Christopher Done

26 Jun 2016 26 Jun '16

10:28 a.m.

I've been pondering how feasible it would be to: * Compile in stages a module with the byte code linker * Keep hold of the Core source * Interpret the Core AST within Haskell * When encountering built-in/primitives (or things from other libraries), we compile that Core term and link it as an HValue and then run it with the arguments expected. So () would be such a HValue, as would be "show" which in interpretable unoptimized Core would take an extra argument for the Show instance. When passing in values to such "foreign" functions it would wrap them up in an interpretive way. This is the hypothetical idea, it seems like it would yield a really trivial way to write a new and interesting interpreter for GHC Haskell without having to re-implement any prim ops, ready to work on regular Haskell code. In my case, I would use this to write an interpreter which: * is not tagless, so we preserve type info * allows top-level names to be redefined * when a function is applied, it checks the type of its arguments Both of these are pretty much necessary for being able to do in-place update of a running program while developing (a la Emacs or Smalltalk), and type tags let us throw a regular Haskell exception of type error, a la deferred type errors. It means in your running program, if you make a mistake or forget to update one part, it doesn't bring the whole program down with an RTS error or a segfault, maybe a handler in a thread (like a server or a video game) throws an exception and the developer just updates their code and tries again. I'd love support for something like this, but I'd rather not have to re-create the world just to add this capability. Because it's really just conceptually regular interpreted GHC Haskell plus type tags and updating, it seems like it should be a small diff. Any input into this? How far away is GHC's current architecture from supporting such a concept? Ciao!

Attachments:

attachment.html (text/html — 2.1 KB)

Show replies by date

Edward Z. Yang

27 Jun 27 Jun

2:11 a.m.

I am not sure I entirely understand your proposal, but a good way of finding out if it works is giving it a try. Excerpts from Christopher Done's message of 2016-06-26 06:28:55 -0400:

...

I've been pondering how feasible it would be to:

* Compile in stages a module with the byte code linker * Keep hold of the Core source * Interpret the Core AST within Haskell * When encountering built-in/primitives (or things from other libraries), we compile that Core term and link it as an HValue and then run it with the arguments expected. So () would be such a HValue, as would be "show" which in interpretable unoptimized Core would take an extra argument for the Show instance. When passing in values to such "foreign" functions it would wrap them up in an interpretive way.

I don't understand what the bytecode format has to do here. Since your suggestion is to just store Core you can just compile to object code. I prototyped "fat interface" files https://ghc.haskell.org/trac/ghc/ticket/10871 which store core into interface files, so they could be compiled later. The patchset was here: https://github.com/ezyang/ghc/tree/ghc-fat-interface

...

This is the hypothetical idea, it seems like it would yield a really trivial way to write a new and interesting interpreter for GHC Haskell without having to re-implement any prim ops, ready to work on regular Haskell code.

In my case, I would use this to write an interpreter which:

* is not tagless, so we preserve type info * allows top-level names to be redefined * when a function is applied, it checks the type of its arguments

Both of these are pretty much necessary for being able to do in-place update of a running program while developing (a la Emacs or Smalltalk), and type tags let us throw a regular Haskell exception of type error, a la deferred type errors. It means in your running program, if you make a mistake or forget to update one part, it doesn't bring the whole program down with an RTS error or a segfault, maybe a handler in a thread (like a server or a video game) throws an exception and the developer just updates their code and tries again.

I'd love support for something like this, but I'd rather not have to re-create the world just to add this capability. Because it's really just conceptually regular interpreted GHC Haskell plus type tags and updating, it seems like it should be a small diff.

Any input into this? How far away is GHC's current architecture from supporting such a concept?

Well, if you are going to support update you need to make sure that the tag information is more elaborate than what GHC currently supports (a type would just be a Name, which is going to get reused when you recompile.) Edward

Christopher Done

12:06 p.m.

On 27 June 2016 at 04:11, Edward Z. Yang wrote:

...

I don't understand what the bytecode format has to do here. Since your suggestion is to just store Core you can just compile to object code.

True, I could compile to either as long as I can link it dynamically.

...

...
Any input into this? How far away is GHC's current architecture from supporting such a concept?

Well, if you are going to support update you need to make sure that the tag information is more elaborate than what GHC currently supports (a type would just be a Name, which is going to get reused when you recompile.)

Indeed -- like in GHCi when you redefine a named thing, I'd hope to implement an incrementing Name[n] versioning for names. But Core's AST is trivial so it'd be easy to make this kind of transformation.

Simon Marlow

8:01 a.m.

On 26 June 2016 at 11:28, Christopher Done wrote:

...

I've been pondering how feasible it would be to:

* Compile in stages a module with the byte code linker * Keep hold of the Core source * Interpret the Core AST within Haskell

Interestingly, the first implementation of GHCi was a Core interpreter, but it ran into a lot of problems. For starters it would have unsafeCoerce everywhere. Support for unboxed values is very very difficult.

...

* When encountering built-in/primitives (or things from other libraries), we compile that Core term and link it as an HValue and then run it with the arguments expected. So () would be such a HValue, as would be "show" which in interpretable unoptimized Core would take an extra argument for the Show instance. When passing in values to such "foreign" functions it would wrap them up in an interpretive way.

This is the hypothetical idea, it seems like it would yield a really trivial way to write a new and interesting interpreter for GHC Haskell without having to re-implement any prim ops, ready to work on regular Haskell code.

In my case, I would use this to write an interpreter which:

* is not tagless, so we preserve type info

Not sure what you mean here. Your interpreter would be running on top of the same RTS with the same data representation, so it would have to use the same tagging and representation conventions as the rest of GHC

...

* allows top-level names to be redefined

This you could do with the extisting byte-code interpreter, by instead of linking Names directly you link to some runtime Name-lookup function. You would probably want to revert all CAFs when the code changes too; this is currently not implemented for byte code.

...

* when a function is applied, it checks the type of its arguments

Aha, but what if the arguments come from compiled code? GHC doesn't carry type information around at runtime, except that it is possible reconstruct types in a limited kind of way (this is what the GHC debugger does). Cheers Simon

...

Both of these are pretty much necessary for being able to do in-place update of a running program while developing (a la Emacs or Smalltalk), and type tags let us throw a regular Haskell exception of type error, a la deferred type errors. It means in your running program, if you make a mistake or forget to update one part, it doesn't bring the whole program down with an RTS error or a segfault, maybe a handler in a thread (like a server or a video game) throws an exception and the developer just updates their code and tries again.

I'd love support for something like this, but I'd rather not have to re-create the world just to add this capability. Because it's really just conceptually regular interpreted GHC Haskell plus type tags and updating, it seems like it should be a small diff.

Any input into this? How far away is GHC's current architecture from supporting such a concept?

Ciao!

_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Christopher Done

12:31 p.m.

On 27 June 2016 at 10:01, Simon Marlow wrote:

...

On 26 June 2016 at 11:28, Christopher Done wrote:

...
I've been pondering how feasible it would be to:

* Compile in stages a module with the byte code linker * Keep hold of the Core source * Interpret the Core AST within Haskell

Interestingly, the first implementation of GHCi was a Core interpreter, but it ran into a lot of problems. For starters it would have unsafeCoerce everywhere. Support for unboxed values is very very difficult.

What year is that implementation from? I wouldn't mind taking a look for it in the GHC repo history.

...

...
* is not tagless, so we preserve type info

Not sure what you mean here. Your interpreter would be running on top of the same RTS with the same data representation, so it would have to use the same tagging and representation conventions as the rest of GHC

That's true, if a value comes from a compiled RTS function with a polymorphic type then I don't know what its real type is to marshal it properly. Drat.

...

...
* allows top-level names to be redefined

This you could do with the extisting byte-code interpreter, by instead of linking Names directly you link to some runtime Name-lookup function. You would probably want to revert all CAFs when the code changes too; this is currently not implemented for byte code.

Right, I considered this but without the type information it's going to blow up if I change the arity of a function or a data type or whatever.

...

...
* when a function is applied, it checks the type of its arguments

Aha, but what if the arguments come from compiled code? GHC doesn't carry type information around at runtime, except that it is possible reconstruct types in a limited kind of way (this is what the GHC debugger does).

Indeed, from compiled code e.g. id then id (undefined :: Foo) would come back as something unidentifiable as being of type Foo. That's the flaw in my plan. Looks like the current interpreter would have to be extended to support this or a whole new one re-implementing all the primitives like in GHCJS. Thanks!

Simon Marlow

1:27 p.m.

On 27 June 2016 at 13:31, Christopher Done wrote:

...

On 27 June 2016 at 10:01, Simon Marlow wrote:

...
On 26 June 2016 at 11:28, Christopher Done wrote:

...
I've been pondering how feasible it would be to:

* Compile in stages a module with the byte code linker * Keep hold of the Core source * Interpret the Core AST within Haskell

Interestingly, the first implementation of GHCi was a Core interpreter, but it ran into a lot of problems. For starters it would have unsafeCoerce everywhere. Support for unboxed values is very very difficult.

What year is that implementation from? I wouldn't mind taking a look for it in the GHC repo history.

I think around here is a good place to start looking: https://phabricator.haskell.org/rGHCbca9dd54c2b39638cb4638aaccf6015a104a1df5... Cheers Simon

...

...
...
* is not tagless, so we preserve type info

Not sure what you mean here. Your interpreter would be running on top of the same RTS with the same data representation, so it would have to use the same tagging and representation conventions as the rest of GHC

That's true, if a value comes from a compiled RTS function with a polymorphic type then I don't know what its real type is to marshal it properly. Drat.

...
...
* allows top-level names to be redefined

This you could do with the extisting byte-code interpreter, by instead of linking Names directly you link to some runtime Name-lookup function. You would probably want to revert all CAFs when the code changes too; this is currently not implemented for byte code.

Right, I considered this but without the type information it's going to blow up if I change the arity of a function or a data type or whatever.

...
...
* when a function is applied, it checks the type of its arguments

Aha, but what if the arguments come from compiled code? GHC doesn't carry type information around at runtime, except that it is possible reconstruct types in a limited kind of way (this is what the GHC debugger does).

Indeed, from compiled code e.g. id then id (undefined :: Foo) would come back as something unidentifiable as being of type Foo. That's the flaw in my plan.

Looks like the current interpreter would have to be extended to support this or a whole new one re-implementing all the primitives like in GHCJS.

Thanks!

Christopher Done

28 Jun 28 Jun

7:15 p.m.

Thanks! It's strange to think there was once no GHCi. This is an interesting piece of Haskell implementation history! =) On 27 June 2016 at 15:27, Simon Marlow wrote:

...

On 27 June 2016 at 13:31, Christopher Done wrote:

...
On 27 June 2016 at 10:01, Simon Marlow wrote:

...
On 26 June 2016 at 11:28, Christopher Done wrote:

...
I've been pondering how feasible it would be to:

* Compile in stages a module with the byte code linker * Keep hold of the Core source * Interpret the Core AST within Haskell

Interestingly, the first implementation of GHCi was a Core interpreter, but it ran into a lot of problems. For starters it would have unsafeCoerce everywhere. Support for unboxed values is very very difficult.

What year is that implementation from? I wouldn't mind taking a look for it in the GHC repo history.

I think around here is a good place to start looking: https://phabricator.haskell.org/rGHCbca9dd54c2b39638cb4638aaccf6015a104a1df5...

Cheers Simon

...
...
...
* is not tagless, so we preserve type info

Not sure what you mean here. Your interpreter would be running on top of the same RTS with the same data representation, so it would have to use the same tagging and representation conventions as the rest of GHC

That's true, if a value comes from a compiled RTS function with a polymorphic type then I don't know what its real type is to marshal it properly. Drat.

...
...
* allows top-level names to be redefined

This you could do with the extisting byte-code interpreter, by instead of linking Names directly you link to some runtime Name-lookup function. You would probably want to revert all CAFs when the code changes too; this is currently not implemented for byte code.

Right, I considered this but without the type information it's going to blow up if I change the arity of a function or a data type or whatever.

...
...
* when a function is applied, it checks the type of its arguments

Aha, but what if the arguments come from compiled code? GHC doesn't carry type information around at runtime, except that it is possible reconstruct types in a limited kind of way (this is what the GHC debugger does).

Indeed, from compiled code e.g. id then id (undefined :: Foo) would come back as something unidentifiable as being of type Foo. That's the flaw in my plan.

Looks like the current interpreter would have to be extended to support this or a whole new one re-implementing all the primitives like in GHCJS.

Thanks!

Evan Laforge

10:29 p.m.

On Tue, Jun 28, 2016 at 12:15 PM, Christopher Done wrote:

...

Thanks! It's strange to think there was once no GHCi. This is an interesting piece of Haskell implementation history! =)

It was really exciting when ghci showed up. No need to separately load everything into hugs!

3301

Age (days ago)

3303

Last active (days ago)

List overview

Download

7 comments

4 participants

participants (4)

Christopher Done
Edward Z. Yang
Evan Laforge
Simon Marlow