
On Tue, Dec 20, 2005 at 10:36:36AM -0600, Creighton Hogg wrote:
I was wondering where I should get started in learing about how to implement a haskell compiler?
Well, the way I learned was by just making the decision that one way or another, I was going to write a haskell compiler and then reading and reading and coding and tossing code and recoding until I succeeded. :) it is sort of how I learn, choose something I have no idea how to do, and do it anyway learning what I need to along the way out of neccesity. in any case, assuming you know something about basic compilers, lexing, parsing, and code generation and are interested in how you get parsed haskell source into a form that you can generate code from, I'd highly recommend the following (in order) (of course, this is just my experience) * The haskell report itself: this will give the rules on how to desugar 'complicated' haskell into simple haskell. things like getting rid of list comprehensions and do notation and turning complicated pattern matching into simple pattern matching (the most complicated of the desugaring transformations). I'd recommend desugaring and expanding type synonyms as soon as possible for your first attempt even though it gives worse error messages because it _greatly_ simplifies the later steps and you will likely go through several iterations of the typechecker before you get one you are happy with. you can always extend your typechecker later and postpone certain desugarings until you reach a balance of good error messages you are comfortable with. * typing haskell in haskell: http://www.cse.ogi.edu/~mpj/thih/ this will tell you how to go from haskell source to explicitly typed haskell source, which is now in a form suitable for transforming into an intermediate language. the explicitly typed, desugared haskell can be converted directly to core. It may take several readings to fully understand and looking at the source to hatchet could help considerably (it did for me) * then you have a couple choices, basically, You want to learn about ghc core. there are a lot of papers on ghc optimizations that are great because they give examples of how core is used which really helps get an intuitive understanding of why core is a good intermediate language: recommend are: Secrets of the GHC inliner. http://www.research.microsoft.com/~simonpj/Papers/inlining/index.htm this is by far the most important, the simplifier is the key to any other optimizations and this paper goes into a lot of the details other things leave out. http://haskell.org/ghc/docs/papers/unboxed-values.ps.gz very good paper. my main regret about jhc development is that I didn't incorperate unboxed values right from the start thinking of them as an optimization rather than an essential part of expressing program properties. I just accumulated hack-after-hack which would have gone away with unboxed values in the core. http://research.microsoft.com/copyright/accept.asp?path=/users/simonpj/papers/float.ps.gz&pub=ACM let floating, not only an important optimization, but will get you thinking about what it means to preserve lazyness and some of the trickiness of not breaking performance by accident. http://research.microsoft.com/copyright/accept.asp?path=/users/simonpj/papers/comp-by-trans.ps.gz&pub=18 great overview, either read this, or all of the above. (or ideally everything) starting out with this one is probably the best idea because it will give you an idea what you are in for. * next you have your simplified optimized core and you want to turn it into machine code, there are various choices you can make, the most well understood is G-machine based implementations like nhc or the more advanced STG machine used in ghc. the lester book mentioned in another email is a great introduction and highly recommended even if the source there is a lot of practical info out there on how to implement these and you have pretty much a straight shot from core to G-machine code either to be compiled directly or interpreted. for jhc I used boquist's GRIN: http://www.cs.chalmers.se/~boquist/phd/ which is not the simplest way of getting things working right away, but could be very interesting. there are various other abstract machines out there, the Lazy Virtual Machine used by Helium described in Daan Leijen's Phd thesis is quite interesting, and might make a better first target than G-machine code. so, I'd recommend, at least _understand_ the G-machine, enough to write an interpreter for it. (if you already have core, translating and interpreting G-machine code shouldn't be more than a few hundreds of lines of code). then read about the LVM, STG, and GRIN, and any other ones anyone recommends? and decide where you want to go from there. whichever one inspires you. Hopefully this helps you, this is just the route I would recommend based on my experience with jhc, I read a whole lot of papers and books, and there are a ton more great ones out there but I think this would be the easiest path to a simple haskell compiler that is based on concepts extensible to a full-scale non-toy implementation. I left out details that should be covered in any traditional compiler book and just mentioned the issues specific to a haskell compiler. I am sure someone can fill in the holes if needed. I wish you good luck, I started jhc a month after hearing about haskell and boy was it a great crash course in the language having to learn how to program in it and its details at the same time. The great news is that Haskell is a _superlative_ language to write compilers for any language in and is quite fun to boot. writing a compiler you won't find anywhere that haskell itself is lacking so can concentrate on the task at hand. John -- John Meacham - ⑆repetae.net⑆john⑈