Re: Tiger compiler in Haskell: annotating abstract syntax tree

Jose' Romildo Malaquias wrote:
I am writing here to ask suggestions on how to annotate an ast with types (or any other information that would be relevant in a compiler phase) in Haskell.
There is also a general way of annotating AST post factum, described in http://okmij.org/ftp/Algorithms.html#tree-annot The method lets one attach arbitrarily many annotations to an already built AST, *without* the need to change the definition of the datatype. One does not even have to anticipate annotations! The method would work with your AST
data Exp = IntExp Integer | VarExp Symbol | AssignExp Symbol Exp
as _it is_, without any modifications -- neither to the data type definition, nor to the tree. The method was demonstrated when writing a compiler of sorts: annotating an AST with an inferred type for each node. If the type inference fails, we can still print out the inferred types for the good subexpressions.

Hi Oleg,
From what I understand, you are using a list of integers to encode a path to a subterm. This is a practical and lightweight implementation, but less type-safe: it is easy to encode annotations that do not correspond to any value. Also, it cannot guarantee, with types alone, that every subterm is annotated.
Our solution, on the other way, requires more intrusive changes to the
user's datatype.
Cheers,
Pedro
On Tue, Jul 20, 2010 at 11:06,
Jose' Romildo Malaquias wrote:
I am writing here to ask suggestions on how to annotate an ast with types (or any other information that would be relevant in a compiler phase) in Haskell.
There is also a general way of annotating AST post factum, described in http://okmij.org/ftp/Algorithms.html#tree-annot
The method lets one attach arbitrarily many annotations to an already built AST, *without* the need to change the definition of the datatype. One does not even have to anticipate annotations! The method would work with your AST
data Exp = IntExp Integer | VarExp Symbol | AssignExp Symbol Exp
as _it is_, without any modifications -- neither to the data type definition, nor to the tree.
The method was demonstrated when writing a compiler of sorts: annotating an AST with an inferred type for each node. If the type inference fails, we can still print out the inferred types for the good subexpressions.
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Jose Pedro Magalhaes wrote:
From what I understand, you are using a list of integers to encode a path to a subterm. This is a practical and lightweight implementation, but less type-safe: it is easy to encode annotations that do not correspond to any value. Also, it cannot guarantee, with types alone, that every subterm is annotated.
Let us consider the type safety closely. Let us assume a very simple data type: lists of integers of length 10. Suppose we want to add string annotations to the elements. In my solution, I create an IntMap whose key is an integer 0..9 (the index in the original list) and the value is the string annotation. It is true that the type system does not prevent me from adding an annotation for key 11 (which corresponds to no element in the original list) or from forgetting to add annotation for key 7. As I understand it, your approach is to transform the original list of integers into the list of (Int,String) pairs. Now we are sure that each element is paired with an annotation and there are no orphan annotations. The annotation process creates this new list of pairs and returns it. However, how do you get _the type checker_ to ensure that the annotated list does correspond to the original list? That all old elements are present in the annotated list, and in the same order? It all boils down to enforcing the invariant proj al == l where l is the original list, al is the annotation list, and proj is the operation that removes annotations. It is true that my lightweight method generally cannot, in type system, enforce that invariant. (I can still enforce it if I design an appropriate security kernel and prove, _offline_, its correctness). But how could you enforce that invariant *in the type system*?

Hi Oleg,
On Wed, Jul 21, 2010 at 09:36,
Jose Pedro Magalhaes wrote:
From what I understand, you are using a list of integers to encode a path to a subterm. This is a practical and lightweight implementation, but less type-safe: it is easy to encode annotations that do not correspond to any value. Also, it cannot guarantee, with types alone, that every subterm is annotated.
Let us consider the type safety closely. Let us assume a very simple data type: lists of integers of length 10. Suppose we want to add string annotations to the elements. In my solution, I create an IntMap whose key is an integer 0..9 (the index in the original list) and the value is the string annotation. It is true that the type system does not prevent me from adding an annotation for key 11 (which corresponds to no element in the original list) or from forgetting to add annotation for key 7.
As I understand it, your approach is to transform the original list of integers into the list of (Int,String) pairs. Now we are sure that each element is paired with an annotation and there are no orphan annotations. The annotation process creates this new list of pairs and returns it. However, how do you get _the type checker_ to ensure that the annotated list does correspond to the original list? That all old elements are present in the annotated list, and in the same order?
It all boils down to enforcing the invariant proj al == l where l is the original list, al is the annotation list, and proj is the operation that removes annotations. It is true that my lightweight method generally cannot, in type system, enforce that invariant. (I can still enforce it if I design an appropriate security kernel and prove, _offline_, its correctness). But how could you enforce that invariant *in the type system*?
Considering our library approach (using multirec), there is no type-level guarantee that the "generic" values correspond to their "normal" counterparts. From the type we can see that they have the same shape, but not that they have the same content. This would be an offline proof: that the embedding-projection pair really is an isomorphism (between "normal" and "generic" values). However, this proof is external to the annotations system: it belongs in the generic library (since the library generates the embedding-projection pair automatically), and only needs to be done once. Once the embedding-projection pair has been proven correct, the annotations are "shape-correct" by construction, and "value-correct" by the correctness of the generic library. As I understand it, in your system you would need a proof of correctness for every annotated type. Cheers, Pedro
participants (2)
-
José Pedro Magalhães
-
oleg@okmij.org