On Sat, May 7, 2016 at 9:02 PM, Thomas Jakway<tjakway@nyu.edu> wrote:

This is a strange coincidence. I'm definitely no expert GHC hacker but I started (highly preliminary) work on a JVM backend for GHC a few weeks ago. It's here:https://github.com/tjakway/ghcjvm/tree/jvm/compiler/jvmGen/Jvm
(The memory runtime is here:https://github.com/tjakway/lljvm)

Wow that is coincidence!

I'm very new to this so pardon my ignorance, but I don't understand what the benefit is of intercepting STG code and translating that to bytecode vs. translating Cmm to bytecode (or Jasmin assembly, as I'd prefer)? It seems like Cmm is designed for backends and the obvious choice. Or have I got this really mixed up?

In the GHC RTS, the heap is composed of basic units called closures which consist of an info table (metadata of the closure), entry code, and a payload, and these are used to implement the STG machine. The GHC backend translates STG code into closures and tells you how they should be laid out in memory (not using exact addresses, but using abstract labels) using the language of Cmm. That format is perfect for consumption by register-based machines and architectures like most processors these days. However, the JVM is stack-based, not register-based.

The point is, Cmm is too low-level when trying to compile to the JVM you don't have control over the memory layouts (but new projects like Valhalla as mentioned by Ian above seem to make that possible). Another problem is that you need to write your own garbage collector if you go the Cmm route because Cmm generates stack and heap checks which call into the RTS GC functions, which you need to implement in order to make it work.

The main benefit of STG is that it's the lowest level of Haskell code you can get without specifying low-level details like memory layouts,

placement of data in the stack vs registers, and heap/stack checks.

In GHCVM, I am trying to avoid reinventing the wheel and re-use existing infrastructure as much as possible. This allows the final product to come out sooner and performance can be optimized at a later stage. That is why I use as much of the JVM's features as possible (especially GC and automatic heap/stack management) because the JIT will not be able to optimize DirectByteBuffers (which is what I think lljvm uses).‎

In my implementation, I'll probably end up using a modified version of Cmm that is more relevant for the JVM (without references to registers/memory locations), but for now I'm going for direct bytecode generation.

I hope this isn't out of line considering my overall lack of experience but I think I can give some advice:
read the JVM 7 spec cover-to-cover.

I highly suggest outputting Jasmin assembly instead of raw bytecode. The classfile format is complicated and you will have to essentially rewrite Jasmin in Haskell if you don't want to reuse it. Jasmin is also the de facto standard assembler and much more thoroughly tested than any homegrown solution we might make.

read the LLVM code generator. This project is more like the LLVM backend than the native code generator.
Don't go for speed. The approach that I've begun is to emulate a C stack and memory system the RTS can run on top of (https://github.com/tjakway/lljvm/blob/master/src/main/java/lljvm/runtime/Memory.java). This will make getting something working much faster and also solves the problem of how to deal with memcpy/memset/memmove on the JVM. This will of course be very slow (I think) and is not a permanent solution. Can't do everything at once. Any other approach will probably require rewriting the entire RTS from the beginning.

I don't think Frege is especially useful to this project, though I'd love to be proven wrong. Frege's compilation model is completely different from GHC's: they compile Haskell to Java and then send that to javac. Porting GHC to the JVM is really more like writing a Cmm to JVM compiler.

Information on Jasmin:
http://web.mit.edu/javadev/packages/jasmin/doc/
http://web.mit.edu/javadev/packages/jasmin/doc/instructions.html
http://web.mit.edu/javadev/packages/jasmin/doc/about.html
Once you've tried manually dealing with constant pools you'll appreciate Jonathan Meyer's work!

I've tried simple bytecode experiments with Jasmin when exploring the different methods of doing tail calls on the JVM and one of the methods can't be done using Java so I had to resort to hand-writing the assembly. Jasmin was pretty cool but I wanted to have composability which a monad would provide and add my own helper instructions that can generate GHCVM-specific patterns of bytecode.

Moreover, the Haskell JVM implementation, MateVM used the hs-java library for bytecode generation and so I figured it would be reliable. Why should I spend time writing up a Jasmin AST/Pretty printer in Haskell when bytecode generation is already available with hs-java? Yes, there were a couple features that were not present like Haskell ADT representations of the different classfile attributes, but it took hardly a day's worth of work to add it in.

I forked davidar's extended version of Jasmin. The differences versus the original Jasmin are detailedhere. Some nice additions:
supports invokedynamic
supports .annotation, .inner, .attribute, .deprecated directives
better handling of the ldc_w instruction
multi-line fields
.debug directives
signatures for local variables
.bytecode directive to specify bytecode version
(most importantly, I think): support for the StackMap attribute. If we eventually want to use new JVM instructions like invokedynamic, weneed stack map frames or the JVM will reject our bytecode. JVM 7 has options to bypass this (but it's a hack), but they're deprecated and I believe not optional going forward. Alternatively we can stick with older bytecode versions indefinitely and not use the new features.

I'm aware of the need of generating StackMapTables for JVMs beyond 7 and for right now I just want compatibility with the older JVMs (5 and 6), so it's not a big issue in the first phase.‎ I have the option of using the ASM bytecode engineering library if I get a chance to make GHCVM self-hoisted, or I can integrate it in the pipeline as an extra step after code generation using a standalone jar whose job is to generate StackMapTables.

I think the biggest risk is taking too much on at once. Any one of these subtasks, writing a bytecode assembler, porting the RTS, etc. could consume the whole summer if you're not careful.

Yeah good point. Right now my goal is just to compile very simple Haskell programs so I'm only porting the bare minimum required to get to that target. If you actually go through the STG->Cmm code generator, you'll find that if you remove all the noise of memory layouts, heap/stack checks, LDV + CCS + Ticky profiling, you get a nice and simple core which can be transformed to bytecode in a straightforward manner once the basic concepts of the GHC RTS have been translated over (like closures, stack frames, etc).

I'd love to help out with this project!

Great! Please post any additional ideas you have as separate issues on the GitHub repo, so that your suggestions don't get lost in this mailing list.