
On Tue, May 5, 2009 at 11:07 PM, Rouan van Dalen
Hi everyone.
I am designing my own programming language.
I would like to know what is the best way to go about writing my compiler in haskell. What are the tools available in haskell that can help with compiler construction?
I think GHC itself uses Alex and Happy for tokenizing and parsing respectively. Parsec is a nice parser combinator library. Another tool you could employ is Higher Order Abstract Syntax (more of an idiom), and there are papers about it if you google for them. It's a way of representing the program you want to compile and doing transformations on it.
I know about Happy. Is that a good tool to use?
My understanding is that GHC uses it. I think they are happy with happy, but maybe a GHC dev could comment.
The compiler is intended for serious use and I would like it to be very efficient, maybe competing with compilers written in C. It should also be very easy to extend as the languoge grows.
I think one of the biggest slow downs for compilers is the complexity of parsing and the complexity of the compilation. For example, If you pick a grammar that parses efficiently and do less sophisticated analyses involving a single pass and so on it will be fast to compile. The draw back is that the compiled programs may execute more slowly than if you did lots of analysis or optimization. As an example of what not to do, C++ has a notoriously complex to parse grammar along with some other issues that make it a slow compiling language :)
Are there any good books that you can recommend on compiler construction in general and specific to haskell?
I like "Modern Compiler Design" myself: http://www.amazon.com/Modern-Compiler-Design-D-Grune/dp/0471976970 I would also recommend reading the haskell report to get an idea of what a friendly language specification looks like: http://www.haskell.org/onlinereport/ One point of advice I would give is defining your language as an embedded language in Haskell first. You can start with the data type that holds your intermediate representation or abstract syntax. Then you can work on either code generation or evaluation. As you start to settle on your data structures and the language primitives you can begin working on a parser. In a sense, the parser is the last part you need to finish. I've found this approach to be quite accommodating and to also save me a lot of time. If you google for it, there is a fair bit of literature about domain specific and embedded domain specific languages in Haskell. It's quite a good language for prototyping this way. Once you write some code you'll discover that some things are taking too long. In that case, you'll want to learn how to make proper use of the GHC profiler, assuming you use GHC as your haskell compiler. It's quite powerful and by using it correctly you can zoom in on the slow spots and dramatically improve the performance of your compiler. I found this paper to be a nice way to get started: http://scheme2006.cs.uchicago.edu/11-ghuloum.pdf One of the brilliant bits in that paper is that it teaches you to use an existing compiler (gcc) to generate code and then learn how to implement things by example! I highly recommend reading it. It's written about a scheme compiler, but it applies equally well to other languages. Good luck! Jason