Tiger compiler: variable escaping analysis phase

Hello. I have been teaching an introductory course on compiler construction to our undergraduates students using Appel's "Modern Compiler Implementation in Java". There are also versions of the book in ML and C. The books explain how to write a compiler for the Tiger programming language. Now I want to implement a Tiger compiler in Haskell. The lexer and parser (built with the help of alex and happy) and the type checker are already working. Next step is implementing a variable escape analysis phase, needed for the intermediate code generator. The objective of this phase is to find out if each variable escapes or not. In general an escaping variable should be allocated in the stack frame (in memory). Non escaping variables may be allocated in the stack frame or in a register. Generally, a variable escapes if it is passed by reference, its address is taken (using C's & operator, for instance), or it is accessed from a nested function. Only the last is possible with Tiger. The approach adopted by Appel in his books is easy: a muttable field in the abstract syntax of variable declarations, of for expressions, and of function formal parameters, which introduce new variables, is used to collect the escaping information of the variable. In the ML version of the book this is a reference to boolean (bool ref). Initially, in a conservative approach, the reference is initialized to true. In the variable escaping analysis phase of the compiler, a function findEscape looks for escaping variables and record this information in the escape fields of the abstract syntax. To do this the entire abstract syntax tree is traversed looking for escaping uses of every variable, and, when found, the field is set to true. findEscape uses a symbol table to accomplish its work, binding variable names to a reference to boolean (the same reference used in the abstract syntax tree). When processing a variable declaraction, for instance, it inserts the variable name into the symbol table, binding it to its escaping field. When processing an expression which is a simple variable, if the variable occurs in a nested function, its binding in the symbol table is set to true. This reflects directly in the abstract syntax tree of the variable declaration, as the escape field in the variable declaration and the binding in the symbol table are the same reference to bool. I am look for good ways to implement the variable escaping analysis phase in Haskell, which is a pure language. I see two alternatives: 1) adopt the same approach as Appel used in his ML version of the compiler: use a mutable reference in the IO monad (Data.IORef) to hold the variable escaping information, and write everything inside the IO monad 2) build a new abstract syntax tree with updated information regarding variable escaping The second option is more elegant in my point of view, but would be much less efficient than the first one. So I want to know what advices Haskell programmers has to me about implementing this phase of the Tiger compiler in Haskell. Regards, Romildo -- Computer Science Department Universidade Federal de Ouro Preto, Brazil

2010/6/22 José Romildo Malaquias
Hello.
I have been teaching an introductory course on compiler construction to our undergraduates students using Appel's "Modern Compiler Implementation in Java". There are also versions of the book in ML and C. The books explain how to write a compiler for the Tiger programming language.
Now I want to implement a Tiger compiler in Haskell.
The lexer and parser (built with the help of alex and happy) and the type checker are already working.
Next step is implementing a variable escape analysis phase, needed for the intermediate code generator. The objective of this phase is to find out if each variable escapes or not. In general an escaping variable should be allocated in the stack frame (in memory). Non escaping variables may be allocated in the stack frame or in a register.
Generally, a variable escapes if it is passed by reference, its address is taken (using C's & operator, for instance), or it is accessed from a nested function. Only the last is possible with Tiger.
The approach adopted by Appel in his books is easy: a muttable field in the abstract syntax of variable declarations, of for expressions, and of function formal parameters, which introduce new variables, is used to collect the escaping information of the variable. In the ML version of the book this is a reference to boolean (bool ref). Initially, in a conservative approach, the reference is initialized to true.
In the variable escaping analysis phase of the compiler, a function findEscape looks for escaping variables and record this information in the escape fields of the abstract syntax. To do this the entire abstract syntax tree is traversed looking for escaping uses of every variable, and, when found, the field is set to true.
findEscape uses a symbol table to accomplish its work, binding variable names to a reference to boolean (the same reference used in the abstract syntax tree). When processing a variable declaraction, for instance, it inserts the variable name into the symbol table, binding it to its escaping field. When processing an expression which is a simple variable, if the variable occurs in a nested function, its binding in the symbol table is set to true. This reflects directly in the abstract syntax tree of the variable declaration, as the escape field in the variable declaration and the binding in the symbol table are the same reference to bool.
I am look for good ways to implement the variable escaping analysis phase in Haskell, which is a pure language. I see two alternatives:
1) adopt the same approach as Appel used in his ML version of the compiler: use a mutable reference in the IO monad (Data.IORef) to hold the variable escaping information, and write everything inside the IO monad
2) build a new abstract syntax tree with updated information regarding variable escaping
The second option is more elegant in my point of view, but would be much less efficient than the first one.
So I want to know what advices Haskell programmers has to me about implementing this phase of the Tiger compiler in Haskell.
Hi, I think there is a third way to do what you describe (if I understood everything). You can use a Writer monad (basically a state monad where the state is never read, only written to). Essentially you walk the tree and record the information you want (a mapping from variable name to a boolean 'does-escape'). That information is threaded through the tree-walking functions. The information you record is the underlying monoid of the Writer monad. Anyway, the second option sounds better than the first one as you don't have to rely on IO (for which IMO there is no real reason in the problem you provide). Cheers, Thu

On Tue, Jun 22, 2010 at 02:54:08PM +0200, Vo Minh Thu wrote:
2010/6/22 José Romildo Malaquias
: Hello.
I have been teaching an introductory course on compiler construction to our undergraduates students using Appel's "Modern Compiler Implementation in Java". There are also versions of the book in ML and C. The books explain how to write a compiler for the Tiger programming language.
Now I want to implement a Tiger compiler in Haskell.
The lexer and parser (built with the help of alex and happy) and the type checker are already working.
Next step is implementing a variable escape analysis phase, needed for the intermediate code generator. The objective of this phase is to find out if each variable escapes or not. In general an escaping variable should be allocated in the stack frame (in memory). Non escaping variables may be allocated in the stack frame or in a register.
Generally, a variable escapes if it is passed by reference, its address is taken (using C's & operator, for instance), or it is accessed from a nested function. Only the last is possible with Tiger.
The approach adopted by Appel in his books is easy: a muttable field in the abstract syntax of variable declarations, of for expressions, and of function formal parameters, which introduce new variables, is used to collect the escaping information of the variable. In the ML version of the book this is a reference to boolean (bool ref). Initially, in a conservative approach, the reference is initialized to true.
In the variable escaping analysis phase of the compiler, a function findEscape looks for escaping variables and record this information in the escape fields of the abstract syntax. To do this the entire abstract syntax tree is traversed looking for escaping uses of every variable, and, when found, the field is set to true.
findEscape uses a symbol table to accomplish its work, binding variable names to a reference to boolean (the same reference used in the abstract syntax tree). When processing a variable declaraction, for instance, it inserts the variable name into the symbol table, binding it to its escaping field. When processing an expression which is a simple variable, if the variable occurs in a nested function, its binding in the symbol table is set to true. This reflects directly in the abstract syntax tree of the variable declaration, as the escape field in the variable declaration and the binding in the symbol table are the same reference to bool.
I am look for good ways to implement the variable escaping analysis phase in Haskell, which is a pure language. I see two alternatives:
1) adopt the same approach as Appel used in his ML version of the compiler: use a mutable reference in the IO monad (Data.IORef) to hold the variable escaping information, and write everything inside the IO monad
2) build a new abstract syntax tree with updated information regarding variable escaping
The second option is more elegant in my point of view, but would be much less efficient than the first one.
So I want to know what advices Haskell programmers has to me about implementing this phase of the Tiger compiler in Haskell.
Hi,
I think there is a third way to do what you describe (if I understood everything). You can use a Writer monad (basically a state monad where the state is never read, only written to).
Essentially you walk the tree and record the information you want (a mapping from variable name to a boolean 'does-escape'). That information is threaded through the tree-walking functions.
The information you record is the underlying monoid of the Writer monad.
The 'does-escape' information should be available for each variable at the point the variable is introduced (a variable declaration or a formal parameter in a function declaration, for instance). If this information is collected in another data structure that is not the abstract syntax tree, it may be difficult to access the 'does-escape' information when needed. In this line I thought about using a mapping from a pair consisting of the variable name and its position in the source code, to the 'does-escape' boolean. The findEscape function would construct this mapping while traversing the abstract syntax tree. The function that generates the intermediate representation of the program would be given the abstract syntax tree and the escaping mapping as arguments. When traversing the abstract syntax tree to generate code, it would consult the escaping mapping to determine if a varable escapes or not, when allocating a variable. The variable name and its position are available from the abstract syntax tree. Are you suggesting that the Writer monad would be used to construct a data structure like the escaping mapping I mentioned above?
Anyway, the second option sounds better than the first one as you don't have to rely on IO (for which IMO there is no real reason in the problem you provide).
Romildo

2010/6/22 José Romildo Malaquias
On Tue, Jun 22, 2010 at 02:54:08PM +0200, Vo Minh Thu wrote:
2010/6/22 José Romildo Malaquias
: Hello.
I have been teaching an introductory course on compiler construction to our undergraduates students using Appel's "Modern Compiler Implementation in Java". There are also versions of the book in ML and C. The books explain how to write a compiler for the Tiger programming language.
Now I want to implement a Tiger compiler in Haskell.
The lexer and parser (built with the help of alex and happy) and the type checker are already working.
Next step is implementing a variable escape analysis phase, needed for the intermediate code generator. The objective of this phase is to find out if each variable escapes or not. In general an escaping variable should be allocated in the stack frame (in memory). Non escaping variables may be allocated in the stack frame or in a register.
Generally, a variable escapes if it is passed by reference, its address is taken (using C's & operator, for instance), or it is accessed from a nested function. Only the last is possible with Tiger.
The approach adopted by Appel in his books is easy: a muttable field in the abstract syntax of variable declarations, of for expressions, and of function formal parameters, which introduce new variables, is used to collect the escaping information of the variable. In the ML version of the book this is a reference to boolean (bool ref). Initially, in a conservative approach, the reference is initialized to true.
In the variable escaping analysis phase of the compiler, a function findEscape looks for escaping variables and record this information in the escape fields of the abstract syntax. To do this the entire abstract syntax tree is traversed looking for escaping uses of every variable, and, when found, the field is set to true.
findEscape uses a symbol table to accomplish its work, binding variable names to a reference to boolean (the same reference used in the abstract syntax tree). When processing a variable declaraction, for instance, it inserts the variable name into the symbol table, binding it to its escaping field. When processing an expression which is a simple variable, if the variable occurs in a nested function, its binding in the symbol table is set to true. This reflects directly in the abstract syntax tree of the variable declaration, as the escape field in the variable declaration and the binding in the symbol table are the same reference to bool.
I am look for good ways to implement the variable escaping analysis phase in Haskell, which is a pure language. I see two alternatives:
1) adopt the same approach as Appel used in his ML version of the compiler: use a mutable reference in the IO monad (Data.IORef) to hold the variable escaping information, and write everything inside the IO monad
2) build a new abstract syntax tree with updated information regarding variable escaping
The second option is more elegant in my point of view, but would be much less efficient than the first one.
So I want to know what advices Haskell programmers has to me about implementing this phase of the Tiger compiler in Haskell.
Hi,
I think there is a third way to do what you describe (if I understood everything). You can use a Writer monad (basically a state monad where the state is never read, only written to).
Essentially you walk the tree and record the information you want (a mapping from variable name to a boolean 'does-escape'). That information is threaded through the tree-walking functions.
The information you record is the underlying monoid of the Writer monad.
The 'does-escape' information should be available for each variable at the point the variable is introduced (a variable declaration or a formal parameter in a function declaration, for instance). If this information is collected in another data structure that is not the abstract syntax tree, it may be difficult to access the 'does-escape' information when needed.
I was thinking 'accessing when needed' was just a lookup into that other datastructure (i.e. a map). As someone said, and related to your second solution, this analysis and the resulting mapping can be put pack into the tree. I.e. instead of data Tree = ... you have data Tree a = ... and the 'a' can be whatever you want, including the 'does-escape' boolean value.
In this line I thought about using a mapping from a pair consisting of the variable name and its position in the source code, to the 'does-escape' boolean. The findEscape function would construct this mapping while traversing the abstract syntax tree.
Yes, that was what I had in mind. But don't use (variable name, position) to index the map. Instead have a renaming pass that ensure that every name is unique.
The function that generates the intermediate representation of the program would be given the abstract syntax tree and the escaping mapping as arguments. When traversing the abstract syntax tree to generate code, it would consult the escaping mapping to determine if a varable escapes or not, when allocating a variable. The variable name and its position are available from the abstract syntax tree.
Yes. So it appears you don't need the 'data Tree a' if you don't plan to manipulate it beyond what you state here, just the pair (Tree,Mapping) will do it.
Are you suggesting that the Writer monad would be used to construct a data structure like the escaping mapping I mentioned above?
Yes. The data structure is just a mapping name -> bool (or you can do it with just a set, if a name belongs to the set, it escapes). Constructing it seems possible with a Writer monad. You would use the 'tell' method of the Writer monad everytime you want to record a new name. Alternatively you can explicitely thread the datastructure if it seems fine. Cheers, Thu

On Tue, Jun 22, 2010 at 04:44:09PM +0200, Vo Minh Thu wrote:
2010/6/22 José Romildo Malaquias
: On Tue, Jun 22, 2010 at 02:54:08PM +0200, Vo Minh Thu wrote:
2010/6/22 José Romildo Malaquias
: Hello.
I have been teaching an introductory course on compiler construction to our undergraduates students using Appel's "Modern Compiler Implementation in Java". There are also versions of the book in ML and C. The books explain how to write a compiler for the Tiger programming language.
Now I want to implement a Tiger compiler in Haskell.
The lexer and parser (built with the help of alex and happy) and the type checker are already working.
Next step is implementing a variable escape analysis phase, needed for the intermediate code generator. The objective of this phase is to find out if each variable escapes or not. In general an escaping variable should be allocated in the stack frame (in memory). Non escaping variables may be allocated in the stack frame or in a register.
Generally, a variable escapes if it is passed by reference, its address is taken (using C's & operator, for instance), or it is accessed from a nested function. Only the last is possible with Tiger.
The approach adopted by Appel in his books is easy: a muttable field in the abstract syntax of variable declarations, of for expressions, and of function formal parameters, which introduce new variables, is used to collect the escaping information of the variable. In the ML version of the book this is a reference to boolean (bool ref). Initially, in a conservative approach, the reference is initialized to true.
In the variable escaping analysis phase of the compiler, a function findEscape looks for escaping variables and record this information in the escape fields of the abstract syntax. To do this the entire abstract syntax tree is traversed looking for escaping uses of every variable, and, when found, the field is set to true.
findEscape uses a symbol table to accomplish its work, binding variable names to a reference to boolean (the same reference used in the abstract syntax tree). When processing a variable declaraction, for instance, it inserts the variable name into the symbol table, binding it to its escaping field. When processing an expression which is a simple variable, if the variable occurs in a nested function, its binding in the symbol table is set to true. This reflects directly in the abstract syntax tree of the variable declaration, as the escape field in the variable declaration and the binding in the symbol table are the same reference to bool.
I am look for good ways to implement the variable escaping analysis phase in Haskell, which is a pure language. I see two alternatives:
1) adopt the same approach as Appel used in his ML version of the compiler: use a mutable reference in the IO monad (Data.IORef) to hold the variable escaping information, and write everything inside the IO monad
2) build a new abstract syntax tree with updated information regarding variable escaping
The second option is more elegant in my point of view, but would be much less efficient than the first one.
So I want to know what advices Haskell programmers has to me about implementing this phase of the Tiger compiler in Haskell.
Hi,
I think there is a third way to do what you describe (if I understood everything). You can use a Writer monad (basically a state monad where the state is never read, only written to).
Essentially you walk the tree and record the information you want (a mapping from variable name to a boolean 'does-escape'). That information is threaded through the tree-walking functions.
The information you record is the underlying monoid of the Writer monad.
The 'does-escape' information should be available for each variable at the point the variable is introduced (a variable declaration or a formal parameter in a function declaration, for instance). If this information is collected in another data structure that is not the abstract syntax tree, it may be difficult to access the 'does-escape' information when needed.
I was thinking 'accessing when needed' was just a lookup into that other datastructure (i.e. a map). As someone said, and related to your second solution, this analysis and the resulting mapping can be put pack into the tree.
I.e. instead of data Tree = ...
you have data Tree a = ...
and the 'a' can be whatever you want, including the 'does-escape' boolean value.
In this line I thought about using a mapping from a pair consisting of the variable name and its position in the source code, to the 'does-escape' boolean. The findEscape function would construct this mapping while traversing the abstract syntax tree.
Yes, that was what I had in mind. But don't use (variable name, position) to index the map. Instead have a renaming pass that ensure that every name is unique.
The Tiger compiler would not need a variable renaming pass otherwise. This renaming pass (renameVars :: Exp -> Exp) would build a new AST. For each variable declaration and for each parameter in every function declaration, a new (unique) name should be invented, in every use of the old name should be replaced by the new name. A state monad to thread an integer counter would be helpful here. The counter would be incremented each time a new variable name is needed, and it would be used to form the name. But would all this extra work be worth just for using the unique names to index the 'does-escape' map (or set)? The overhead of using the pair (variable_name,position) is that bad?
The function that generates the intermediate representation of the program would be given the abstract syntax tree and the escaping mapping as arguments. When traversing the abstract syntax tree to generate code, it would consult the escaping mapping to determine if a varable escapes or not, when allocating a variable. The variable name and its position are available from the abstract syntax tree.
Yes. So it appears you don't need the 'data Tree a' if you don't plan to manipulate it beyond what you state here, just the pair (Tree,Mapping) will do it.
Exactly.
Are you suggesting that the Writer monad would be used to construct a data structure like the escaping mapping I mentioned above?
Yes. The data structure is just a mapping name -> bool (or you can do it with just a set, if a name belongs to the set, it escapes). Constructing it seems possible with a Writer monad. You would use the 'tell' method of the Writer monad everytime you want to record a new name.
Right.
Alternatively you can explicitely thread the datastructure if it seems fine.
How would that be? Romildo

2010/6/24 José Romildo Malaquias
On Tue, Jun 22, 2010 at 04:44:09PM +0200, Vo Minh Thu wrote:
2010/6/22 José Romildo Malaquias
: On Tue, Jun 22, 2010 at 02:54:08PM +0200, Vo Minh Thu wrote:
2010/6/22 José Romildo Malaquias
: Hello.
I have been teaching an introductory course on compiler construction to our undergraduates students using Appel's "Modern Compiler Implementation in Java". There are also versions of the book in ML and C. The books explain how to write a compiler for the Tiger programming language.
Now I want to implement a Tiger compiler in Haskell.
The lexer and parser (built with the help of alex and happy) and the type checker are already working.
Next step is implementing a variable escape analysis phase, needed for the intermediate code generator. The objective of this phase is to find out if each variable escapes or not. In general an escaping variable should be allocated in the stack frame (in memory). Non escaping variables may be allocated in the stack frame or in a register.
Generally, a variable escapes if it is passed by reference, its address is taken (using C's & operator, for instance), or it is accessed from a nested function. Only the last is possible with Tiger.
The approach adopted by Appel in his books is easy: a muttable field in the abstract syntax of variable declarations, of for expressions, and of function formal parameters, which introduce new variables, is used to collect the escaping information of the variable. In the ML version of the book this is a reference to boolean (bool ref). Initially, in a conservative approach, the reference is initialized to true.
In the variable escaping analysis phase of the compiler, a function findEscape looks for escaping variables and record this information in the escape fields of the abstract syntax. To do this the entire abstract syntax tree is traversed looking for escaping uses of every variable, and, when found, the field is set to true.
findEscape uses a symbol table to accomplish its work, binding variable names to a reference to boolean (the same reference used in the abstract syntax tree). When processing a variable declaraction, for instance, it inserts the variable name into the symbol table, binding it to its escaping field. When processing an expression which is a simple variable, if the variable occurs in a nested function, its binding in the symbol table is set to true. This reflects directly in the abstract syntax tree of the variable declaration, as the escape field in the variable declaration and the binding in the symbol table are the same reference to bool.
I am look for good ways to implement the variable escaping analysis phase in Haskell, which is a pure language. I see two alternatives:
1) adopt the same approach as Appel used in his ML version of the compiler: use a mutable reference in the IO monad (Data.IORef) to hold the variable escaping information, and write everything inside the IO monad
2) build a new abstract syntax tree with updated information regarding variable escaping
The second option is more elegant in my point of view, but would be much less efficient than the first one.
So I want to know what advices Haskell programmers has to me about implementing this phase of the Tiger compiler in Haskell.
Hi,
I think there is a third way to do what you describe (if I understood everything). You can use a Writer monad (basically a state monad where the state is never read, only written to).
Essentially you walk the tree and record the information you want (a mapping from variable name to a boolean 'does-escape'). That information is threaded through the tree-walking functions.
The information you record is the underlying monoid of the Writer monad.
The 'does-escape' information should be available for each variable at the point the variable is introduced (a variable declaration or a formal parameter in a function declaration, for instance). If this information is collected in another data structure that is not the abstract syntax tree, it may be difficult to access the 'does-escape' information when needed.
I was thinking 'accessing when needed' was just a lookup into that other datastructure (i.e. a map). As someone said, and related to your second solution, this analysis and the resulting mapping can be put pack into the tree.
I.e. instead of data Tree = ...
you have data Tree a = ...
and the 'a' can be whatever you want, including the 'does-escape' boolean value.
In this line I thought about using a mapping from a pair consisting of the variable name and its position in the source code, to the 'does-escape' boolean. The findEscape function would construct this mapping while traversing the abstract syntax tree.
Yes, that was what I had in mind. But don't use (variable name, position) to index the map. Instead have a renaming pass that ensure that every name is unique.
The Tiger compiler would not need a variable renaming pass otherwise.
Ok. I thought you might be interested to have such unique identifiers for other tasks. And I'm not sure (name,position) would be enough to resolve every possible reuse of a name. Anyway, at this point I can't help much as I don't know Tiger but it seems you have plenty of information to make your decision.
This renaming pass (renameVars :: Exp -> Exp) would build a new AST. For each variable declaration and for each parameter in every function declaration, a new (unique) name should be invented, in every use of the old name should be replaced by the new name. A state monad to thread an integer counter would be helpful here. The counter would be incremented each time a new variable name is needed, and it would be used to form the name.
But would all this extra work be worth just for using the unique names to index the 'does-escape' map (or set)? The overhead of using the pair (variable_name,position) is that bad?
The function that generates the intermediate representation of the program would be given the abstract syntax tree and the escaping mapping as arguments. When traversing the abstract syntax tree to generate code, it would consult the escaping mapping to determine if a varable escapes or not, when allocating a variable. The variable name and its position are available from the abstract syntax tree.
Yes. So it appears you don't need the 'data Tree a' if you don't plan to manipulate it beyond what you state here, just the pair (Tree,Mapping) will do it.
Exactly.
Are you suggesting that the Writer monad would be used to construct a data structure like the escaping mapping I mentioned above?
Yes. The data structure is just a mapping name -> bool (or you can do it with just a set, if a name belongs to the set, it escapes). Constructing it seems possible with a Writer monad. You would use the 'tell' method of the Writer monad everytime you want to record a new name.
Right.
Alternatively you can explicitely thread the datastructure if it seems fine.
How would that be?
Using a monad has two great advantages: the syntactic one, where you don't have to explicitely thread the state, and the more important one, where it makes clear what you want to do (in this case, accumulate/write some information that can be abstracted as a monoid). If your code presentation/taste/understanding (as I understand you want to use your implementation for didactic purpose) dictate otherwise, and if it is not too cumbersome to explicitely thread the state, you might want to do so. Cheers, Thu p.s. don't forget (if you can of course) to put your code on hackage.

On Tue, Jun 22, 2010 at 09:33:22AM -0300, José Romildo Malaquias wrote:
In the variable escaping analysis phase of the compiler, a function findEscape looks for escaping variables and record this information in the escape fields of the abstract syntax. To do this the entire abstract syntax tree is traversed looking for escaping uses of every variable, and, when found, the field is set to true.
findEscape uses a symbol table to accomplish its work, binding variable names to a reference to boolean (the same reference used in the abstract syntax tree). When processing a variable declaraction, for instance, it inserts the variable name into the symbol table, binding it to its escaping field. When processing an expression which is a simple variable, if the variable occurs in a nested function, its binding in the symbol table is set to true. This reflects directly in the abstract syntax tree of the variable declaration, as the escape field in the variable declaration and the binding in the symbol table are the same reference to bool.
Idea of pure algorithm: 1) Update the symbol table itself, that is: instead of using (a) Map Symbol (IORef Bool) use (b) Map Symbol Bool. This doesn't change the complexity of the algorithm as searching and updating have the same complexity for many data structures (maps and hashtables, for example). In reality, you don't need to use a Map at all. Just use (c) Set Symbol and symbols not in the set do not escape. Using (a) gives you O(n log k) for this step, where n is the size of the AST and k is the number of symbols. On the other hand, (c) gives you O(n log e), where e is the number of escaping symbols. 2) After doing the whole analysis, use a function 'addEscapeInfo' to process the whole pure AST again adding information about escaped symbols. This is O(n log e) as well.
The second option is more elegant in my point of view, but would be much less efficient than the first one.
While O(n log e) is better than O(n log k), probably the constants in the pure algorithm are higher than their impure counterparts. I guess you could also try to write: 1) Take an 'AST e' into 'AST (STRef s Bool, e)' in O(n). 2) Use the impure algorithm inside ST monad in O(n log k). 3) Take 'AST (STRef s Bool, e)' into 'AST (Bool, e)' in O(n). 4) 'runST' on the above steps to get a pure function from 'AST e' into 'AST (Bool, e)'. The ST monad has the same runtime overhead as IO. Steps 1) and 3) are used to hide the ST monad from the rest of the compiler. Cheers, -- Felipe.

On Tue, Jun 22, 2010 at 10:01:37AM -0300, Felipe Lessa wrote:
On Tue, Jun 22, 2010 at 09:33:22AM -0300, José Romildo Malaquias wrote:
In the variable escaping analysis phase of the compiler, a function findEscape looks for escaping variables and record this information in the escape fields of the abstract syntax. To do this the entire abstract syntax tree is traversed looking for escaping uses of every variable, and, when found, the field is set to true.
findEscape uses a symbol table to accomplish its work, binding variable names to a reference to boolean (the same reference used in the abstract syntax tree). When processing a variable declaraction, for instance, it inserts the variable name into the symbol table, binding it to its escaping field. When processing an expression which is a simple variable, if the variable occurs in a nested function, its binding in the symbol table is set to true. This reflects directly in the abstract syntax tree of the variable declaration, as the escape field in the variable declaration and the binding in the symbol table are the same reference to bool.
Idea of pure algorithm:
1) Update the symbol table itself, that is: instead of using
(a) Map Symbol (IORef Bool)
use
(b) Map Symbol Bool.
This doesn't change the complexity of the algorithm as searching and updating have the same complexity for many data structures (maps and hashtables, for example).
In reality, you don't need to use a Map at all. Just use
(c) Set Symbol
and symbols not in the set do not escape. Using (a) gives you O(n log k) for this step, where n is the size of the AST and k is the number of symbols. On the other hand, (c) gives you O(n log e), where e is the number of escaping symbols.
These solutions would not work because they do not deal with scopes of variables. In a Tiger program (as in most programming languges) a variable name can be reused many times to refer to different variables, and each variable can escape. Maybe a set of pairs containing the variable name and its position in the source code (available from the abstract syntax tree) would be a good idea. Then the code generator would traverse the abstract syntax tree and, when needed, use this set to find out whether a variable escapes.
2) After doing the whole analysis, use a function 'addEscapeInfo' to process the whole pure AST again adding information about escaped symbols. This is O(n log e) as well.
If the variable escaping analysis is done and its result is made available in another data structure, there would be no need to add the escaping information back to the abstract syntax tree.
The second option is more elegant in my point of view, but would be much less efficient than the first one.
While O(n log e) is better than O(n log k), probably the constants in the pure algorithm are higher than their impure counterparts. I guess you could also try to write:
1) Take an 'AST e' into 'AST (STRef s Bool, e)' in O(n).
2) Use the impure algorithm inside ST monad in O(n log k).
3) Take 'AST (STRef s Bool, e)' into 'AST (Bool, e)' in O(n).
4) 'runST' on the above steps to get a pure function from 'AST e' into 'AST (Bool, e)'.
The ST monad has the same runtime overhead as IO. Steps 1) and 3) are used to hide the ST monad from the rest of the compiler.
Romildo

Hello Doaitse Swierstra has a Tiger compiler written in Haskell + UUAG as a demonstration for UUAG attribute grammar system. The package on Hackage only contains the derived source - i.e not the original attribute grammar code, but the generated Haskell source after running UUAG on the *.ag files. http://hackage.haskell.org/package/tiger You could try contacting Doaitse Swierstra for the original UUAG source. Best wishes Stephen

On Tue, Jun 22, 2010 at 02:30:04PM +0100, Stephen Tetley wrote:
Hello
Doaitse Swierstra has a Tiger compiler written in Haskell + UUAG as a demonstration for UUAG attribute grammar system.
The package on Hackage only contains the derived source - i.e not the original attribute grammar code, but the generated Haskell source after running UUAG on the *.ag files.
http://hackage.haskell.org/package/tiger
You could try contacting Doaitse Swierstra for the original UUAG source.
I have found the sources at http://www.cs.uu.nl/wiki/HUT/WebHome. What is provided is an implementation of a compiler front-end and type checker for Andrew Appel's Tiger language. The variable escaping phase (needed only to decide where variables would be allocated in the back-end) is not implemented. At least I could not find it in a quick view. Romildo

2010/6/22 José Romildo Malaquias
Hello.
I have been teaching an introductory course on compiler construction to our undergraduates students using Appel's "Modern Compiler Implementation in Java". There are also versions of the book in ML and C. The books explain how to write a compiler for the Tiger programming language.
Now I want to implement a Tiger compiler in Haskell.
The lexer and parser (built with the help of alex and happy) and the type checker are already working.
Next step is implementing a variable escape analysis phase, needed for the intermediate code generator. The objective of this phase is to find out if each variable escapes or not. In general an escaping variable should be allocated in the stack frame (in memory). Non escaping variables may be allocated in the stack frame or in a register.
Generally, a variable escapes if it is passed by reference, its address is taken (using C's & operator, for instance), or it is accessed from a nested function. Only the last is possible with Tiger.
The approach adopted by Appel in his books is easy: a muttable field in the abstract syntax of variable declarations, of for expressions, and of function formal parameters, which introduce new variables, is used to collect the escaping information of the variable. In the ML version of the book this is a reference to boolean (bool ref). Initially, in a conservative approach, the reference is initialized to true.
In the variable escaping analysis phase of the compiler, a function findEscape looks for escaping variables and record this information in the escape fields of the abstract syntax. To do this the entire abstract syntax tree is traversed looking for escaping uses of every variable, and, when found, the field is set to true.
findEscape uses a symbol table to accomplish its work, binding variable names to a reference to boolean (the same reference used in the abstract syntax tree). When processing a variable declaraction, for instance, it inserts the variable name into the symbol table, binding it to its escaping field. When processing an expression which is a simple variable, if the variable occurs in a nested function, its binding in the symbol table is set to true. This reflects directly in the abstract syntax tree of the variable declaration, as the escape field in the variable declaration and the binding in the symbol table are the same reference to bool.
I am look for good ways to implement the variable escaping analysis phase in Haskell, which is a pure language. I see two alternatives:
1) adopt the same approach as Appel used in his ML version of the compiler: use a mutable reference in the IO monad (Data.IORef) to hold the variable escaping information, and write everything inside the IO monad
2) build a new abstract syntax tree with updated information regarding variable escaping
The second option is more elegant in my point of view, but would be much less efficient than the first one.
In Haskell you will get a lot of sharing between the original AST and the updated AST. This greatly reduces the overhead of #2. Have you read Chris Okasaki's Purely Functional Datastructures book? My other bit of advice would be to code it in the purely functional way and then do some benchmarking / profiling to see that it does have sufficiently good performance characteristics. If you find that it's getting awkward to manage the purely functional code then refactor that code into a monad that hides the tedious bits. I had some other comments to make but others here have already covered it :) (Such as using Writer or State or ST instead of IO, etc). Jason
participants (5)
-
Felipe Lessa
-
Jason Dagit
-
José Romildo Malaquias
-
Stephen Tetley
-
Vo Minh Thu