
On Tue, Jun 22, 2010 at 09:33:22AM -0300, José Romildo Malaquias wrote:
In the variable escaping analysis phase of the compiler, a function findEscape looks for escaping variables and record this information in the escape fields of the abstract syntax. To do this the entire abstract syntax tree is traversed looking for escaping uses of every variable, and, when found, the field is set to true.
findEscape uses a symbol table to accomplish its work, binding variable names to a reference to boolean (the same reference used in the abstract syntax tree). When processing a variable declaraction, for instance, it inserts the variable name into the symbol table, binding it to its escaping field. When processing an expression which is a simple variable, if the variable occurs in a nested function, its binding in the symbol table is set to true. This reflects directly in the abstract syntax tree of the variable declaration, as the escape field in the variable declaration and the binding in the symbol table are the same reference to bool.
Idea of pure algorithm: 1) Update the symbol table itself, that is: instead of using (a) Map Symbol (IORef Bool) use (b) Map Symbol Bool. This doesn't change the complexity of the algorithm as searching and updating have the same complexity for many data structures (maps and hashtables, for example). In reality, you don't need to use a Map at all. Just use (c) Set Symbol and symbols not in the set do not escape. Using (a) gives you O(n log k) for this step, where n is the size of the AST and k is the number of symbols. On the other hand, (c) gives you O(n log e), where e is the number of escaping symbols. 2) After doing the whole analysis, use a function 'addEscapeInfo' to process the whole pure AST again adding information about escaped symbols. This is O(n log e) as well.
The second option is more elegant in my point of view, but would be much less efficient than the first one.
While O(n log e) is better than O(n log k), probably the constants in the pure algorithm are higher than their impure counterparts. I guess you could also try to write: 1) Take an 'AST e' into 'AST (STRef s Bool, e)' in O(n). 2) Use the impure algorithm inside ST monad in O(n log k). 3) Take 'AST (STRef s Bool, e)' into 'AST (Bool, e)' in O(n). 4) 'runST' on the above steps to get a pure function from 'AST e' into 'AST (Bool, e)'. The ST monad has the same runtime overhead as IO. Steps 1) and 3) are used to hide the ST monad from the rest of the compiler. Cheers, -- Felipe.