There are a number of ways to fix this of various complexity, depending on how many kinds of statements you have in your language and how adverse you are to code duplication.

One option is to remove the recursion from your statement type and to make a 'base functor' like you've proposed with your link based ADT.

data Stmt a = Slf Test [a] [a] | ...

and then make that recursive by using a newtype to make the folding explicit.

newtype Mu f = In { out :: f (Mu f) }

Now you can work on Mu Stmt which is a fixed point of your statement data type and get your original meaning, or work with Stmt Int and have a statement type that indexes statements by # and can look them up in a control flow graph or some such.

You can even attach annotations at every level by using a different ADT to wrap yourself up. (For the category-theory inclined, this gives rise to something known as the cofree comonad of your base functor)

newtype Ann f e = Ann e (f (Ann f e))

So now you can have something like:

Ann 2 (SIf ... (Ann 1 (Var "X")) (Ann 1 (Var "Y"))

if you wanted to count the number of variable references in a subtree for instance.

On the other hand, rarely does a programming language consist solely of statements. You often have expressions and other types floating around and tying with an explicit Mu can sometimes get in the way of that.

Forgetting those definitions for a moment, we can try to fix the one statement type problem.

You can use some interesting GADT based solutions to fix that, but another approach that I've been using recently is to use explicit recursion in a slightly different place.

type (v :> f) = f (v f)
data Var (f :: * -> *) = V String
data Exp f
    = App (Exp :> f) (Exp :> f)
    | Lam (Var :> f) (Exp :> f)
    | Var (Var :> f)
data Stmt f
    = If (Exp :> f) [Stmt :> f] [Stmt :> f]
    | ...

Now we can have a lot of different kinds of expressions based on what we substitute in for f.

data Ann a e = Ann a e
newtype Mu e = Mu e
data Free a e = Return a | Free e
newtype Base a e = Base a

now:

Stmt (Base Int) -- is a statement wrapped around integers
Stmt (Ann Int) -- is a statement wrapped around subtrees of various types annotated with integers
Stmt Mu -- is your old statement type with newtype Mu wrappers on its children.
Stmt (Free Int) is your old statement data type, with occasional integer place holders for unexpanded portions of the tree, they can act as standins for Exps, Vars, etc.

You can then borrow a trick from a recent post of mine:

http://comonad.com/reader/2009/incremental-folds/

with some minor modifications to extract data incrementally or return results as you grow the syntax tree.

The design space is large and there are a lot of options to explore around here, so don't take any of this as the one and only way to implement a syntax ADT. =)

-Edward Kmett

On Sun, Aug 2, 2009 at 1:25 AM, Michal D. <michal.dobrogost@gmail.com> wrote:

I'm in the process of writing a toy compiler but I'm having some
trouble trying to make my datatypes general. For example, using parsec
I parse statements as:

data Stmt = SIf Test [Stmt] [Stmt] | ...

However, when it's time to create a control flow graph it would be
nice to represent statements as (the Int's signify the node id's for
either case of the if statement):

data Stmt = SIf Test Int Int | ...

So, in a eureka moment I decided that this should be allowable with
the following declaration:

data Stmt link = SIf Test link link | ...

Ofcourse, the problem is trying to declare the resulting type for
parsing: "parse -> Stmt [Stmt [Stmt ....]]". Any hints on whether
there is a way to accomplish what I'm trying to do or do I have to
bite the bullet and declare two seperate datatypes? I tried being
clever and declaring a 'helper' type as "type StmtRec = Stmt [StmtRec]"
but to no avail... GHC won't let it slide: "Cycle in type synonym declarations"!

Cheers,

Michal
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe