[GHC] #7933: JavaScript Cmm backend

#7933: JavaScript Cmm backend -----------------------------+---------------------------------------------- Reporter: bosu | Owner: Type: feature request | Status: new Priority: normal | Component: Compiler Version: 7.6.3 | Keywords: Os: Unknown/Multiple | Architecture: Unknown/Multiple Failure: None/Unknown | Blockedby: Blocking: | Related: -----------------------------+---------------------------------------------- I'd like to RFC on the attached patch implementing JavaScript Cmm backend for GHC. It adds -fjavascript compilation option. Calling ghc -fjavascript produces JS in the output file. Otherwise the ghc binary should be fully functional as a native compiler. Thus -fjavascript is similar to -fllvm in spirit. The patch adds HscJavaScript constructor to HscTarget. It is used to dispatch code output to the new JsCodeGen module. Generated JavaScript code relies on the built-in JS garbage collection. JsTransforms module disables GHC Hp and Sp overflow checks. As JavaScript has no pointers, we emulate them using JS closures containing arrays and indices. To distinguish between pointers and scalars we run Hoopl heuristics in PointerMarker module. As in native world, the generated JS object files are to be linked. In order to do this, there is another project, tentatively called Josh[1]. Josh is regular cabalized Haskell binary which links JS object files using function maps provided by GHC. The JavaScript RTS has rts/*.cmm compiled to JavaScript almost as is. In addition there is a handful of handwritten JS code residing in Josh distribution[2]. ghc-prim[3], integer-gmp[4] and base[5] are compiled with small changes. Those changes seem to be orthogonal to the GHC patch. For bootstrap process please see Josh README at github. Josh also includes several tests which work on 32-bit Debian Wheezy. Generated code is in order of 2 MB uncompressed and un-minified. Most of it is in RTS. It compresses very well though (to 150Kb approx). Plenty of low-hanging fruit optimizations are possible. There are plenty caveats to the current patch. It works on 32 bits only. Math is fishy and Integer support is nonexistant. Lots of tests should be imported from GHC, GHCJS, Fay. No Handle based IO works at the moment. No performance tests were done. Despite its shortcomings, the patch is fairly non-invasive, IMHO. It should be noted, that the same approach could work for other GC based platforms (e.g. Java, C#). I'd like to continue working towards merging the patch into GHC, if possible. Could GHC committers provide any guidance of what should be done in order to merge it? [1] https://github.com/bosu/josh [2] https://github.com/bosu/josh/blob/master/etc/ptr.js [3] https://github.com/bosu/ghc-prim [4] https://github.com/bosu/integer-gmp [5] https://github.com/bosu/base -- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/7933 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#7933: JavaScript Cmm backend -----------------------------+---------------------------------------------- Reporter: bosu | Owner: Type: feature request | Status: patch Priority: normal | Component: Compiler Version: 7.6.3 | Keywords: Os: Unknown/Multiple | Architecture: Unknown/Multiple Failure: None/Unknown | Blockedby: Blocking: | Related: -----------------------------+---------------------------------------------- Changes (by bosu): * status: new => patch -- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/7933#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#7933: JavaScript Cmm backend ---------------------------------+------------------------------------------ Reporter: bosu | Owner: Type: feature request | Status: patch Priority: normal | Milestone: Component: Compiler | Version: 7.6.3 Keywords: | Os: Unknown/Multiple Architecture: Unknown/Multiple | Failure: None/Unknown Difficulty: Unknown | Testcase: Blockedby: | Blocking: Related: | ---------------------------------+------------------------------------------ Changes (by simonpj): * difficulty: => Unknown Old description:
I'd like to RFC on the attached patch implementing JavaScript Cmm backend for GHC.
It adds -fjavascript compilation option. Calling ghc -fjavascript produces JS in the output file. Otherwise the ghc binary should be fully functional as a native compiler. Thus -fjavascript is similar to -fllvm in spirit.
The patch adds HscJavaScript constructor to HscTarget. It is used to dispatch code output to the new JsCodeGen module.
Generated JavaScript code relies on the built-in JS garbage collection. JsTransforms module disables GHC Hp and Sp overflow checks.
As JavaScript has no pointers, we emulate them using JS closures containing arrays and indices. To distinguish between pointers and scalars we run Hoopl heuristics in PointerMarker module.
As in native world, the generated JS object files are to be linked. In order to do this, there is another project, tentatively called Josh[1]. Josh is regular cabalized Haskell binary which links JS object files using function maps provided by GHC.
The JavaScript RTS has rts/*.cmm compiled to JavaScript almost as is. In addition there is a handful of handwritten JS code residing in Josh distribution[2].
ghc-prim[3], integer-gmp[4] and base[5] are compiled with small changes. Those changes seem to be orthogonal to the GHC patch.
For bootstrap process please see Josh README at github. Josh also includes several tests which work on 32-bit Debian Wheezy.
Generated code is in order of 2 MB uncompressed and un-minified. Most of it is in RTS. It compresses very well though (to 150Kb approx). Plenty of low-hanging fruit optimizations are possible.
There are plenty caveats to the current patch. It works on 32 bits only. Math is fishy and Integer support is nonexistant. Lots of tests should be imported from GHC, GHCJS, Fay. No Handle based IO works at the moment. No performance tests were done.
Despite its shortcomings, the patch is fairly non-invasive, IMHO. It should be noted, that the same approach could work for other GC based platforms (e.g. Java, C#).
I'd like to continue working towards merging the patch into GHC, if possible. Could GHC committers provide any guidance of what should be done in order to merge it?
[1] https://github.com/bosu/josh [2] https://github.com/bosu/josh/blob/master/etc/ptr.js [3] https://github.com/bosu/ghc-prim [4] https://github.com/bosu/integer-gmp [5] https://github.com/bosu/base
New description: I'd like to RFC on the attached patch implementing !JavaScript Cmm backend for GHC. It adds `-fjavascript` compilation option. Calling `ghc -fjavascript` produces JS in the output file. Otherwise the ghc binary should be fully functional as a native compiler. Thus `-fjavascript` is similar to `-fllvm` in spirit. The patch adds `HscJavaScript` constructor to `HscTarget`. It is used to dispatch code output to the new `JsCodeGen` module. Generated !JavaScript code relies on the built-in JS garbage collection. `JsTransforms` module disables GHC Hp and Sp overflow checks. As !JavaScript has no pointers, we emulate them using JS closures containing arrays and indices. To distinguish between pointers and scalars we run Hoopl heuristics in `PointerMarker` module. As in native world, the generated JS object files are to be linked. In order to do this, there is another project, tentatively called Josh[1]. Josh is regular cabalized Haskell binary which links JS object files using function maps provided by GHC. The !JavaScript RTS has `rts/*.cmm` compiled to !JavaScript almost as is. In addition there is a handful of handwritten JS code residing in Josh distribution[2]. ghc-prim[3], integer-gmp[4] and base[5] are compiled with small changes. Those changes seem to be orthogonal to the GHC patch. For bootstrap process please see Josh README at github. Josh also includes several tests which work on 32-bit Debian Wheezy. Generated code is in order of 2 MB uncompressed and un-minified. Most of it is in RTS. It compresses very well though (to 150Kb approx). Plenty of low-hanging fruit optimizations are possible. There are plenty caveats to the current patch. * It works on 32 bits only. Math is fishy and Integer support is nonexistant. * Lots of tests should be imported from GHC, GHCJS, Fay. * No Handle based IO works at the moment. * No performance tests were done. Despite its shortcomings, the patch is fairly non-invasive, IMHO. It should be noted, that the same approach could work for other GC based platforms (e.g. Java, C#). I'd like to continue working towards merging the patch into GHC, if possible. Could GHC committers provide any guidance of what should be done in order to merge it? * [1] https://github.com/bosu/josh * [2] https://github.com/bosu/josh/blob/master/etc/ptr.js * [3] https://github.com/bosu/ghc-prim * [4] https://github.com/bosu/integer-gmp * [5] https://github.com/bosu/base -- -- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/7933#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#7933: JavaScript Cmm backend ---------------------------------+------------------------------------------ Reporter: bosu | Owner: Type: feature request | Status: patch Priority: normal | Milestone: Component: Compiler | Version: 7.6.3 Keywords: | Os: Unknown/Multiple Architecture: Unknown/Multiple | Failure: None/Unknown Difficulty: Unknown | Testcase: Blockedby: | Blocking: Related: | ---------------------------------+------------------------------------------ Comment(by simonpj): Interesting, thank you. Some thoughts: * I'd be happy to have a JS back end for GHC. As you say, it's pretty non-invasive, which is good. * You aren't the first to attack this problem; see [http://www.haskell.org/haskellwiki/The_JavaScript_Problem the Haskell wiki JS page]. How does your solution differ? I'd love to see comments from Fay's author, GHCJS's author etc. Maybe you can make common cause with some of them to get JS into GHC? * More generally, before adopting it for GHC, I'd ideally like to see a group of enthusiasts saying "yes, this is the way to go". I'm not well equipped to make a critical assessment from a JS point of view. * You could start a GHC Trac Wiki page describing the implementation in overview, so someone could figure out how it works (eg including some of the description above). * Also the code needs comments! At the moment the code has various places saying `if hscTarget dflags == HscJavaScript`, but no comment explaining why that special case is important at that point. I comment the the `Note [blah]` format; see [http://hackage.haskell.org/trac/ghc/wiki/Commentary/CodingStyle coding style]. I can see why you have not invested in comments so far; fair enough, but in the end they'll be necessary. Similarly, at a larger scale, I have no clue how `PointerMarker` works or what it is doing. Finally, who are you in real life, bosu? As you going to do this and move on, or would you plan to actively support/develop this JS back end? (We had a Java back end whose author moved on, and it was a pain. Eventually we deleted it again.) Thanks Simon -- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/7933#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#7933: JavaScript Cmm backend ---------------------------------+------------------------------------------ Reporter: bosu | Owner: bosu Type: feature request | Status: patch Priority: normal | Milestone: Component: Compiler | Version: 7.6.3 Keywords: | Os: Unknown/Multiple Architecture: Unknown/Multiple | Failure: None/Unknown Difficulty: Unknown | Testcase: Blockedby: | Blocking: Related: | ---------------------------------+------------------------------------------ Changes (by bosu): * owner: => bosu Comment: Replying to [comment:3 simonpj]:
You aren't the first to attack this problem; see
[http://www.haskell.org/haskellwiki/The_JavaScript_Problem the Haskell wiki JS page]. Yes, I am aware of this page. I am going to update it.
How does your solution differ?
Here is short recap, AFAIK: * Fay does not use GHC code generation, therefore there is no relation at all. * Both Haste and GHCJS skip Cmm step, generating JS from STG. * Haste and GHCJS are standalone compilers using GHC APIs.
I'd love to see comments from Fay's author, GHCJS's author etc. Maybe you can make common cause with some of them to get JS into GHC?
I definately will, once more tests are passing :)
More generally, before adopting it for GHC, I'd ideally like to see a
group
of enthusiasts saying "yes, this is the way to go". I'm not well equipped to make a critical assessment from a JS point of view.
I understand and will do.
You could start a GHC Trac Wiki page describing the implementation in overview, so someone could figure out how it works (eg including some of the description above).
Yes, I'll do this.
Also the code needs comments! At the moment the code has various
`if hscTarget dflags == HscJavaScript`, but no comment explaining why
is important at that point. I comment the the `Note [blah]` format; see [http://hackage.haskell.org/trac/ghc/wiki/Commentary/CodingStyle coding
places saying that special case style].
I can see why you have not invested in comments so far; fair enough, but in the end they'll be necessary.
Sure, I'll add comments and Note's.
Similarly, at a larger scale, I have no clue how `PointerMarker` works
or what it
is doing.
It tries to distinguish between pointers and int values. Suppose that p is a GC pointer. Native GHC backend can regard (p + 4) as a regular int value. However on the JS backend, p is allocated as JS object (closure). p + 4 is meaningless there. Therefore I convert p + 4 as p(OP_ADD, 4). `PointerMarker` tries to deduce whether register is a pointer by finding loads and stores using it. Unfortunately, there are some corner cases where these heuristics do not work. In that case we have to fallback to the runtime checks. Eventually, I'd like to solve this problem by having separate `CmmType` category for pointers.
Finally, who are you in real life, bosu? As you going to do this and
move on,
or would you plan to actively support/develop this JS back end? (We had a Java back end whose author moved on, and it was a pain. Eventually we deleted it again.)
I'm just a programmer doing this as a hobby. GHC is wonderful, I'd like to stick around :). Thanks, Boris. -- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/7933#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#7933: JavaScript Cmm backend ---------------------------------+------------------------------------------ Reporter: bosu | Owner: bosu Type: feature request | Status: patch Priority: normal | Milestone: Component: Compiler | Version: 7.6.3 Keywords: | Os: Unknown/Multiple Architecture: Unknown/Multiple | Failure: None/Unknown Difficulty: Unknown | Testcase: Blockedby: | Blocking: Related: | ---------------------------------+------------------------------------------ Comment(by luite): Hi, I'm the author of most of the new version of GHCJS that we plan to release when GHC 7.8 is released, around ICFP. Most of our RTS works, and we have things like preemptive lightweight threading, STM, Fay/UHC-like FFI import splices and cross-platform building (running GHCJS on a 64 bit host compiler works and gives you efficient JavaScript). We have a GHC patch and a Cabal patch, to support doing this as a GHC API client, so that we aren't bound to the GHC release schedule in the short term, and can prove things work as planned, before (if ever) proposing a full merge. Perhaps we can extend this patch to help Josh as well (more details about this patch in the next message). Boris' approach is different from both Fay and GHCJS in that it translates much lower level code to JavaScript. It's different though from the current fashion of doing low level JavaScript with asm.js/emscripten, and quite original, I must say! (at least I haven't seen this approach to pointers before) Last summer, when deciding which direction to go with the new GHCJS code generator, I briefly explored a similar approach, compiling Cmm to JS. It looks appealing, Cmm looks relatively close to imperative JS, you get things like stack frame layout optimization and Hoopl dataflow analysis for free (we're doing both of these now in GHCJS, but probably not as good as the GHC versions). Ultimately we abandoned the approach due to concerns about interoperability with existing JavaScript libraries (userfriendliness and easy JS interop are major selling points of Fay) and doubts of the feasibility of getting memory management working with it. Boris' approach with pointers is different, so it could well be worth exploring. So the three approaches are quite different: * Fay: Directly from Haskell AST to JavaScript, no rewrite rule optimization, GHC only used as typechecker, direct function calls (no CPS transformation), tail calls to the same function are converted to loops. No threads. Very flexible FFI. Generated code is easy to read and you get JS stack traces for free. * GHCJS: STG to JavaScript: Haskell heap object represented as JavaScript objects (except in the cases where JavaScripts own type tagging allows us to use primitive types) stacks are JS arrays, CPS transformed code, tail calls through trampoline (to be replaced with native JS tail calls when (if) they arrive in ECMAScript 6). Threading with eager blackholing, async exceptions and STM. * Josh: Cmm to JavaScript, interesting approach with closures as pointers, performance unknown (it seems that lots of function calls are needed for simple operations). Compiling part of the RTS could save a great deal of work. Threading possible probably. Looks like the current implementation leaks memory. I have some questions about memory management in Josh: * How do you deal with things that are in native code done by the garbage collector. For example when a shared thunk is being computed, it will be overwritten by a black hole before the GC is entered, so pointers in the thunk aren't followed. It seems that in Josh, the references are retained, since they're never overwritten? Also for indirection closures, the GC will weed them out, following them, how will you do that? * In functions allocating multiple heap objects at once, you start with only one fresh Hp, so one object will keep the other alive even if it doesn't reference it. Can you reliably fix that? Luite -- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/7933#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#7933: JavaScript Cmm backend ---------------------------------+------------------------------------------ Reporter: bosu | Owner: bosu Type: feature request | Status: patch Priority: normal | Milestone: Component: Compiler | Version: 7.6.3 Keywords: | Os: Unknown/Multiple Architecture: Unknown/Multiple | Failure: None/Unknown Difficulty: Unknown | Testcase: Blockedby: | Blocking: Related: | ---------------------------------+------------------------------------------ Comment(by luite): (sorry for messing up the formatting in the previous message) Our GHC patch so far does the following: * Add a JavaScript platform * Add a Custom Way, allowing you to generate multiple versions of the code side by side (In GHCJS we use this to build one version for the JavaScript platform, one for the native platform, for Template Haskell and native executables. * Add an override for GHC.Prim (so we can build for the 32 bit JavaScript platform on 64 bit hosts) * Add 'foreign import javascript' FFI calling convention, and a JavaScriptFFI extension that enables it, this gives us the UHC/Fay-like import patterns [https://github.com/ghcjs/ghcjs- jquery/blob/master/JavaScript/JQuery/Internal.hs example] * Add a special JSRef type (from the ghcjs-prim package) that can be passed to FFI For us, this is enough to make a working standalone compiler, and since we can have an easily updatable/cabal installable package that works on both 32 and 64 bit systems this way, we prefer this approach for at least the near future. Would you be interested in helping extend the patch for Josh, so it can be developed as a standalone compiler using the GHC API, and perhaps merge it when it has proven itself (for example by passing the relevant part of the GHC testsuite and demonstrating good performance/memory behaviour)? Feel free to contact me, or hop into #ghcjs on irc.freenode.net to discuss. [https://github.com/ghcjs/ghcjs.github.com/blob/master/patches/ghc- ghcjs.patch] -- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/7933#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#7933: JavaScript Cmm backend -------------------------------+-------------------------------------------- Reporter: bosu | Owner: Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.6.3 Resolution: | Keywords: Os: Unknown/Multiple | Architecture: Unknown/Multiple Failure: None/Unknown | Difficulty: Unknown Testcase: | Blockedby: Blocking: | Related: -------------------------------+-------------------------------------------- Changes (by igloo): * owner: bosu => * status: patch => new Comment: The native codegen, LLVM backend, unreg backend, and this JS codegen, all have an interface that is roughly of the form {{{ someCodeGen :: DynFlags -> FilePath -> Stream IO RawCmmGroup () -> IO () }}} which might suggest that we could make them separate packages, and have some sort of plugin system. However, I'm not sure if this would really buy us anything: A new backend is likely to need to alter `DynFlags` too, and may also need other changes around the compiler. I think most of the backends export some other functions too, although I don't know what for OTTOMH. My gut feeling is that it's too early to merge this into GHC: Let's wait a while and see which of the 3 JS approaches turns out to be the most fruitful. -- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/7933#comment:13 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#7933: JavaScript Cmm backend -------------------------------+-------------------------------------------- Reporter: bosu | Owner: bosu Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.6.3 Resolution: | Keywords: Os: Unknown/Multiple | Architecture: Unknown/Multiple Failure: None/Unknown | Difficulty: Unknown Testcase: | Blockedby: Blocking: | Related: -------------------------------+-------------------------------------------- Changes (by igloo): * owner: => bosu -- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/7933#comment:14 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#7933: JavaScript Cmm backend -------------------------------+-------------------------------------------- Reporter: bosu | Owner: bosu Type: feature request | Status: closed Priority: normal | Milestone: Component: Compiler | Version: 7.6.3 Resolution: wontfix | Keywords: Os: Unknown/Multiple | Architecture: Unknown/Multiple Failure: None/Unknown | Difficulty: Unknown Testcase: | Blockedby: Blocking: | Related: -------------------------------+-------------------------------------------- Changes (by bosu): * status: new => closed * resolution: => wontfix Comment: Thank you all for the comments! I fully agree that the patch is better to be done as GHC plugin. ATM I am trying to target Java with the same approach. JS compilation space is too crowded right now :). Therefore, I am marking this ticket as wontfix. Thanks a lot, Boris. -- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/7933#comment:15 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (2)
-
GHC
-
GHC