Re: [Haskell-cafe] I want to write a compiler

8 Mar 2009

      OK, thank you all for your responses. I'm impressed how fast you were.

2009/3/8 Rick R :
...
http://en.wikibooks.org/wiki/Write_Yourself_a_Scheme_in_48_Hours
I have it. However, Scheme is strict, which is the reason I didn't
complete my reading. Maybe I am wrong.
...
Or go for the Real Thing:
http://mitpress.mit.edu/sicp/full-text/book/book-Z-H-35.html#%_sec_5.5
I just opened the page. I'll see what I can take. Thank you.

2009/3/8 Jeremy Shaw :
...
I can say from experience that the spineless-tagless-gmachine paper
you referenced does contain all the information needed to implement an
STG->C compiler. But, you should expect to spend a lot of time
alternating between trying to implement the thing and rereading the
paper.
So I just didn't tried hard enough. Time for a re-read, then!
...
Also, you may find that it is easier to do STG->Assembly at first (via
harpy or something). With assembly it is easier to express exactly
what you want to happen. With C you have to try to figure out how to
trick the C compiler into doing your bidding.
Scary. I know only C.
...
Also, maybe it is more satisfying to output assembly, because then you
really feel like you are writing a compiler ;)
Well, maybe later.
...
I am sure others will say that LLVM is a better option than C or
assembly. I know almost nothing about LLVM, so they may be right.
Neither do I. But I definitely try this. One day.
...
-- jeremy
2009/3/8 Austin Seipp :
...
Excerpts from Loup Vaillant's message of Sat Mar 07 17:32:48 -0600 2009:
...
Ideally, I would very much like to compile to C.
[...]
-> have some kind of FFI with C (I suppose it imply some support for
unboxed values).
Unfortunately, after working on LHC, I can verifiably say (like all
the researchers who wrote the papers which I *didn't* read
beforehand,) that for a lot of purposes, C is an unsuitable fit for a
compilers' target language. It works pretty well if you can make the
source language execution model fit well enough, but if you can't,
you've a price to pay (both in sanity and/or performance.)
Yep.
...
However, since your goal is *not* efficiency, you will be happy to
know that the issues with compiling to C (like garbage collection and
exception handling) can be worked around viably.
I hope so.
...
For garbage collection, please see.
"Accurate Garbage Collection in an Uncooperative Environment" -
http://citeseer.ist.psu.edu/538613.html
[...]
http://code.haskell.org/ddc/ddc-head/runtime/
Thank you, I'll look into that.
...
You can find a lot of info on C-- here; I recommend the paper at the
bottom to start off:
http://research.microsoft.com/en-us/um/people/simonpj/papers/c--/
I'll look at that. But I am not sure it is the simplest path for me: I know C,
and I don't know C-- —yet.
...
http://research.microsoft.com/en-us/um/people/simonpj/papers/slpj-book-1987/
I have it. Maybe I should study it a bit more.
...
As for the FFI, I don't really have any papers on 'how to implement an
FFI', but Matthias Blume's paper might be of interest:
"No-Longer-Foreign: Teaching an ML compiler to speak c "natively""
http://ttic.uchicago.edu/~blume/papers/nlffi-entcs.pdf
libffi may also be worth investigating:
http://sourceware.org/libffi/
I didn't know them, thank you.
...
...
I have chosen the STG machine because I thought i would be easier to
get an FFI and case exprs out of it. Maybe I am wrong, and template
instantiation is really easier. There is also the BRISK machine, and
GRIN. But the paper I read were less readable to me than the three
mentioned.
How far into GRIN have you looked?
Not far enough, it seems.
...
It is one of the intermediate
languages for lhc/jhc, and for a low-level intermediate representation
it works quite well I think; it is a strict, monadic intermediate
language - being strict, we must still be able to model closures
('suspendeded applications' or thunks,) so we model closures/partial
applications in a sort of 'defunctionalized' way (making grin a sort
of 'defunctionalized intermediate language' really,) by turning them
into something along the lines of an algebraic data type in GRIN. A
good benefit of this is that you actually /don't/ have to write
eval/apply yourself, because the compiler is supposed to generate it
*for* you.
Here is my biggest problem with GRIN: it is not STG by far. I formatted
my brain one way, so I find difficult to reformat it another way
...
In fact, this is a core part of GRIN - the generation of eval by the
compiler is critical for many important optimizations to take
place. It also makes the runtime a bit simpler.
If I find GRIN is overall simpler, I am happy.
...
This comes at the price that GRIN is by design a whole-program
intermediate form; in order to pull off any of these sophisticated
transformations everything must be in memory at once. But as we have
seen with LHC/JHC, it can make a *huge* difference (apps that are 10x
smaller *and* 10x faster than ones created by GHC.)
I intend my final language to be quite simple. That imply a relatively small
standard library. So, maybe having all the program in memory will not be
such a big deal.
...
Having dealt with GRIN as I work on LHC (when time now permits...) I
can say that it's a reasonable strategy and in practice it turns out
pretty well, but if you're not into optimizing the code, then STG
might be a better fit.
OK, maybe I will choose STG for now, then.
...
I can't really give you that much advice on this area so much as
you'll just have to choose. But GRIN does have at least two excellent
references:
Code Optimisation Techniques for Lazy Functional Languages -
 U. Boquist, http://www.cs.chalmers.se/~boquist/phd/index.html
The GRIN Project: A Highly Optimising Back End For Lazy Functional
 Languages, U. Boquist, http://www.cs.chalmers.se/~boquist/ifl96-abstract.html
Oh, I forgot about the second paper, thank you. I find it a bit clearer than the
thesis. Having the GRIN BNF helps a lot.
...
Then again you said you're not interested in optimization, so perhaps
I'm just preaching to the totally wrong choir... :)
No, you're not. A long (looong) term goal of mine is having crazy optimizations.
I will definitely explore that path.
...
Austin
2009/3/8 wren ng thornton :
...
Loup Vaillant wrote:
...
-> support algebraic data types and case expressions (unless I can get
away with encoding them as functions),
Which you always can,
[snip]
The only trick is that you need to have closures (in order to build the
Foos) and you need to have first-class functions (in order to pass the
destructors in). If you're familiar with the STG machine, this is what they
do under the covers anyways. At the core level, all you need is some
primitive to force evaluation.
Ah, that is what I was missing. I thought I had to implement seq on top
of case, not the other way around. Maybe that can make Template
instantiation viable
...
~wren
Anyway, thank you all very much. Time for some reading.

Cheers,
Loup