This is a strange coincidence. I'm definitely no expert GHC hacker but I started (highly preliminary) work on a JVM backend for GHC a few weeks ago. It's here:https://github.com/tjakway/ghcjvm/tree/jvm/compiler/jvmGen/Jvm
(The memory runtime is here:https://github.com/tjakway/lljvm)
I'm very new to this so pardon my ignorance, but I don't understand what the benefit is of intercepting STG code and translating that to bytecode vs. translating Cmm to bytecode (or Jasmin assembly, as I'd prefer)? It seems like Cmm is designed for backends and the obvious choice. Or have I got this really mixed up?
I hope this isn't out of line considering my overall lack of experience but I think I can give some advice:
- read the JVM 7 spec cover-to-cover.
- I highly suggest outputting Jasmin assembly instead of raw bytecode. The classfile format is complicated and you will have to essentially rewrite Jasmin in Haskell if you don't want to reuse it. Jasmin is also the de facto standard assembler and much more thoroughly tested than any homegrown solution we might make.
- read the LLVM code generator. This project is more like the LLVM backend than the native code generator.
- Don't go for speed. The approach that I've begun is to emulate a C stack and memory system the RTS can run on top of (https://github.com/tjakway/lljvm/blob/master/src/main/java/lljvm/runtime/Memory.java). This will make getting something working much faster and also solves the problem of how to deal with memcpy/memset/memmove on the JVM. This will of course be very slow (I think) and is not a permanent solution. Can't do everything at once. Any other approach will probably require rewriting the entire RTS from the beginning.
- I don't think Frege is especially useful to this project, though I'd love to be proven wrong. Frege's compilation model is completely different from GHC's: they compile Haskell to Java and then send that to javac. Porting GHC to the JVM is really more like writing a Cmm to JVM compiler.
Information on Jasmin:
http://web.mit.edu/javadev/packages/jasmin/doc/
http://web.mit.edu/javadev/packages/jasmin/doc/instructions.html
http://web.mit.edu/javadev/packages/jasmin/doc/about.htmlOnce you've tried manually dealing with constant pools you'll appreciate Jonathan Meyer's work!
I forked davidar's extended version of Jasmin. The differences versus the original Jasmin are detailedhere. Some nice additions:
- supports invokedynamic
- supports .annotation, .inner, .attribute, .deprecated directives
- better handling of the ldc_w instruction
- multi-line fields
- .debug directives
- signatures for local variables
- .bytecode directive to specify bytecode version
- (most importantly, I think): support for the StackMap attribute. If we eventually want to use new JVM instructions like invokedynamic, weneed stack map frames or the JVM will reject our bytecode. JVM 7 has options to bypass this (but it's a hack), but they're deprecated and I believe not optional going forward. Alternatively we can stick with older bytecode versions indefinitely and not use the new features.
I think the biggest risk is taking too much on at once. Any one of these subtasks, writing a bytecode assembler, porting the RTS, etc. could consume the whole summer if you're not careful.
I'd love to help out with this project!