Re: [Haskell-cafe] Tiger compiler: variable escaping analysis phase

24 Jun 2010

      2010/6/22 José Romildo Malaquias 
...
Hello.
I have been teaching an introductory course on compiler construction to
our undergraduates students using Appel's "Modern Compiler
Implementation in Java". There are also versions of the book in ML and
C. The books explain how to write a compiler for the Tiger programming
language.
Now I want to implement a Tiger compiler in Haskell.
The lexer and parser (built with the help of alex and happy) and the
type checker are already working.
Next step is implementing a variable escape analysis phase, needed for
the intermediate code generator. The objective of this phase is to find
out if each variable escapes or not. In general an escaping variable
should be allocated in the stack frame (in memory). Non escaping
variables may be allocated in the stack frame or in a register.
Generally, a variable escapes if it is passed by reference, its address
is taken (using C's & operator, for instance), or it is accessed from a
nested function. Only the last is possible with Tiger.
The approach adopted by Appel in his books is easy: a muttable field in
the abstract syntax of variable declarations, of for expressions, and of
function formal parameters, which introduce new variables, is used to
collect the escaping information of the variable. In the ML version of
the book this is a reference to boolean (bool ref). Initially, in a
conservative approach, the reference is initialized to true.
In the variable escaping analysis phase of the compiler, a function
findEscape looks for escaping variables and record this information in
the escape fields of the abstract syntax. To do this the entire abstract
syntax tree is traversed looking for escaping uses of every variable,
and, when found, the field is set to true.
findEscape uses a symbol table to accomplish its work, binding variable
names to a reference to boolean (the same reference used in the abstract
syntax tree). When processing a variable declaraction, for instance, it
inserts the variable name into the symbol table, binding it to its
escaping field. When processing an expression which is a simple
variable, if the variable occurs in a nested function, its binding in
the symbol table is set to true. This reflects directly in the abstract
syntax tree of the variable declaration, as the escape field in the
variable declaration and the binding in the symbol table are the same
reference to bool.
I am look for good ways to implement the variable escaping analysis
phase in Haskell, which is a pure language. I see two alternatives:
1) adopt the same approach as Appel used in his ML version of the
  compiler: use a mutable reference in the IO monad (Data.IORef) to
  hold the variable escaping information, and write everything inside
  the IO monad
2) build a new abstract syntax tree with updated information regarding
  variable escaping
The second option is more elegant in my point of view, but would be much
less efficient than the first one.
In Haskell you will get a lot of sharing between the original AST and the
updated AST.  This greatly reduces the overhead of #2.  Have you read Chris
Okasaki's Purely Functional Datastructures book?  My other bit of advice
would be to code it in the  purely functional way and then do some
benchmarking / profiling to see that it does have sufficiently good
performance characteristics.  If you find that it's getting awkward to
manage the purely functional code then refactor that code into a monad that
hides the tedious bits.

I had some other comments to make but others here have already covered it :)
 (Such as using Writer or State or ST instead of IO, etc).

Jason

Re: [Haskell-cafe] Tiger compiler: variable escaping analysis phase

Jason Dagit