Cyclic data declarations

I'm in the process of writing a toy compiler but I'm having some trouble trying to make my datatypes general. For example, using parsec I parse statements as: data Stmt = SIf Test [Stmt] [Stmt] | ... However, when it's time to create a control flow graph it would be nice to represent statements as (the Int's signify the node id's for either case of the if statement): data Stmt = SIf Test Int Int | ... So, in a eureka moment I decided that this should be allowable with the following declaration: data Stmt link = SIf Test link link | ... Ofcourse, the problem is trying to declare the resulting type for parsing: "parse -> Stmt [Stmt [Stmt ....]]". Any hints on whether there is a way to accomplish what I'm trying to do or do I have to bite the bullet and declare two seperate datatypes? I tried being clever and declaring a 'helper' type as "type StmtRec = Stmt [StmtRec]" but to no avail... GHC won't let it slide: "Cycle in type synonym declarations"! Cheers, Michal

Hello Michal, Sunday, August 2, 2009, 9:25:40 AM, you wrote:
data Stmt = SIf Test [Stmt] [Stmt] | ...
data Stmt = SIf Test Int Int | ...
data Stmt a = SIf Test [Stmt a] [Stmt a] | ... where a will represent type of your statement. this may be read as "IF statement with a left and right parts consisting of sequence of statements returning a, have type a" btw, you may need to use GADTs to implement more complex statement types machinery, this is rather popular example in various GADT papers, f.e. http://www.iai.uni-bonn.de/~ralf/publications/With.pdf -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

On Sat, Aug 1, 2009 at 10:25 PM, Michal D.
I'm in the process of writing a toy compiler but I'm having some trouble trying to make my datatypes general. For example, using parsec I parse statements as:
data Stmt = SIf Test [Stmt] [Stmt] | ...
One of your types could be: newtype Block = Block [Stmt] Then you could have: data Stmt = SIf Test Block Block | ...
However, when it's time to create a control flow graph it would be nice to represent statements as (the Int's signify the node id's for either case of the if statement):
data Stmt = SIf Test Int Int | ...
Depending on the amount of code duplication, I think I'd actually make two separate data types. data GStmt = GIf Test Int Int | ... And have a function: toFlowGraph :: Stmt -> GStmt
So, in a eureka moment I decided that this should be allowable with the following declaration:
data Stmt link = SIf Test link link | ...
Clever, but I don't think I would do it this way.
Ofcourse, the problem is trying to declare the resulting type for parsing: "parse -> Stmt [Stmt [Stmt ....]]". Any hints on whether there is a way to accomplish what I'm trying to do or do I have to bite the bullet and declare two seperate datatypes? I tried being clever and declaring a 'helper' type as "type StmtRec = Stmt [StmtRec]" but to no avail... GHC won't let it slide: "Cycle in type synonym declarations"!
I'd have to see your parser, but it shouldn't be too hard to work around this depending on how you want to solve it. By the way, for your StmtRec you probably want a newtype instead of a type. Jason

Michal D. wrote:
I'm in the process of writing a toy compiler but I'm having some trouble trying to make my datatypes general. For example, using parsec I parse statements as:
data Stmt = SIf Test [Stmt] [Stmt] | ...
However, when it's time to create a control flow graph it would be nice to represent statements as (the Int's signify the node id's for either case of the if statement):
data Stmt = SIf Test Int Int | ...
(I recommend to replace Int with something more descriptive, like type Id = Int )
So, in a eureka moment I decided that this should be allowable with the following declaration:
data Stmt link = SIf Test link link | ...
Ofcourse, the problem is trying to declare the resulting type for parsing: "parse -> Stmt [Stmt [Stmt ....]]". Any hints on whether there is a way to accomplish what I'm trying to do or do I have to bite the bullet and declare two seperate datatypes? I tried being clever and declaring a 'helper' type as "type StmtRec = Stmt [StmtRec]" but to no avail... GHC won't let it slide: "Cycle in type synonym declarations"!
newtype StmtRec = StmtRec (Stmt [StmtRec]) will do. More generally, you can use newtype Fix f = In { out :: f (Fix f) } and define type StmtRec = Fix ([] `O` Stmt) where O denotes composition of functors newtype O f g a = O (f (g a)) Introducing a parameter in Stmt like you did and tying the recursion afterwards is a known technique, but I can't seem to find a wiki page that concisely explains it right now. Regards, apfelmus -- http://apfelmus.nfshost.com

newtype StmtRec = StmtRec (Stmt [StmtRec])
That's pretty much were I threw in the towel last night. Except I had a bunch of places where I had to add the extra constructor statements. I wish there was a solution that didn't require these... they really butcher pattern matching clarity.
will do. More generally, you can use
newtype Fix f = In { out :: f (Fix f) }
and define
type StmtRec = Fix ([] `O` Stmt)
where O denotes composition of functors
newtype O f g a = O (f (g a))
Thanks for that! This provoked some thought on my part about what exactly is going on. I think I could solve this if I added some way to identify that a type parameter is actually referring to the whole type. Say we had a reserved word "fixpoint" for this. Then we'd have something like: data Stmt x = SIf x x then when we actually go to use it, it would be referred to as the type: "Stmt [fixpoint]" Which would get treated exactly like the data declaration: data Stmt = SIf [Stmt] [Stmt] I'll need to add the newtype declaration for the code but I'd be interested if anyone had further thoughts on this topic. I have an implementation of both approaches on a toy parser, but I doubt anyone's interested in seeing that. Michal

I ran into exactly the same problem while working on my own toy language :)
I used a fixed point datatype to solve it as well. This is really the best
way, as it lets your expression (or statment) type be a member of
functor/foldable/traversable, which is super handy. I made a graph module
that lets me convert any fixpointed functor into a graph, which made the
rest of that whole process much nicer. If you're interested in the graph
module, let me know :)
I think that in an ideal world haskell would have some way of allowing
infinite types if you asked for them explicitly (say in the type signature
somehow) and then just automatically wrap/unwrap everything with newtypes
behind the scenes (well maybe in an ideal world it wouldn't have to do this
either). This wouldn't change the underlying semantics, but would get rid of
alot of messyness.
Infinite types are possible, My toy language infers infinite types just fine
:) and I think Ocaml has an option for them, but niether of those have type
classes so I'm not sure how compatable the idea is with haskell in general.
- Job
On Sun, Aug 2, 2009 at 9:06 PM, Michal D.
newtype StmtRec = StmtRec (Stmt [StmtRec])
That's pretty much were I threw in the towel last night. Except I had a bunch of places where I had to add the extra constructor statements. I wish there was a solution that didn't require these... they really butcher pattern matching clarity.
will do. More generally, you can use
newtype Fix f = In { out :: f (Fix f) }
and define
type StmtRec = Fix ([] `O` Stmt)
where O denotes composition of functors
newtype O f g a = O (f (g a))
Thanks for that! This provoked some thought on my part about what exactly is going on. I think I could solve this if I added some way to identify that a type parameter is actually referring to the whole type. Say we had a reserved word "fixpoint" for this. Then we'd have something like:
data Stmt x = SIf x x
then when we actually go to use it, it would be referred to as the type:
"Stmt [fixpoint]"
Which would get treated exactly like the data declaration:
data Stmt = SIf [Stmt] [Stmt]
I'll need to add the newtype declaration for the code but I'd be interested if anyone had further thoughts on this topic. I have an implementation of both approaches on a toy parser, but I doubt anyone's interested in seeing that.
Michal _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Job Vranish wrote:
I think that in an ideal world haskell would have some way of allowing infinite types if you asked for them explicitly (say in the type signature somehow) and then just automatically wrap/unwrap everything with newtypes behind the scenes (well maybe in an ideal world it wouldn't have to do this either). This wouldn't change the underlying semantics, but would get rid of alot of messyness.
Infinite types are possible, My toy language infers infinite types just fine :) and I think Ocaml has an option for them, but niether of those have type classes so I'm not sure how compatable the idea is with haskell in general.
There was a thread with a compelling reason against vanilla infinite types some time ago: http://thread.gmane.org/gmane.comp.lang.haskell.cafe/17103 Of course, you can have all the recursion you want by using newtype , it's just that you need to annotate them with the extraneous constructor. In fact, that's exactly the purpose of the constructor; think of it as an aid for the type checker. Regards, apfelmus -- http://apfelmus.nfshost.com

In a lot of cases though annotating all the recursive aspects with newtypes
is a _royal_ pain, and is even worse if you want the datatypes to be
instances of common type classes like Functor, Applicative, etc... (try it
sometime)
I don't advocate allowing infinite types wholesale, just in specific cases
with a special annotation (like a type signature specifying the allowed
infinite type). I think this would be the best of both worlds.
- Job
On Tue, Aug 4, 2009 at 4:23 AM, Heinrich Apfelmus wrote: Job Vranish wrote: I think that in an ideal world haskell would have some way of allowing
infinite types if you asked for them explicitly (say in the type somehow) and then just automatically wrap/unwrap everything with newtypes
behind the scenes (well maybe in an ideal world it wouldn't have to do signature
this either). This wouldn't change the underlying semantics, but would get rid
of
alot of messyness. Infinite types are possible, My toy language infers infinite types just
fine
:) and I think Ocaml has an option for them, but niether of those have
type
classes so I'm not sure how compatable the idea is with haskell in
general. There was a thread with a compelling reason against vanilla infinite
types some time ago: http://thread.gmane.org/gmane.comp.lang.haskell.cafe/17103 Of course, you can have all the recursion you want by using newtype ,
it's just that you need to annotate them with the extraneous
constructor. In fact, that's exactly the purpose of the constructor;
think of it as an aid for the type checker. Regards,
apfelmus --
http://apfelmus.nfshost.com _______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

There are a number of ways to fix this of various complexity, depending on
how many kinds of statements you have in your language and how adverse you
are to code duplication.
One option is to remove the recursion from your statement type and to make a
'base functor' like you've proposed with your link based ADT.
data Stmt a = Slf Test [a] [a] | ...
and then make that recursive by using a newtype to make the folding
explicit.
newtype Mu f = In { out :: f (Mu f) }
Now you can work on Mu Stmt which is a fixed point of your statement data
type and get your original meaning, or work with Stmt Int and have a
statement type that indexes statements by # and can look them up in a
control flow graph or some such.
You can even attach annotations at every level by using a different ADT to
wrap yourself up. (For the category-theory inclined, this gives rise to
something known as the cofree comonad of your base functor)
newtype Ann f e = Ann e (f (Ann f e))
So now you can have something like:
Ann 2 (SIf ... (Ann 1 (Var "X")) (Ann 1 (Var "Y"))
if you wanted to count the number of variable references in a subtree for
instance.
On the other hand, rarely does a programming language consist solely of
statements. You often have expressions and other types floating around and
tying with an explicit Mu can sometimes get in the way of that.
Forgetting those definitions for a moment, we can try to fix the one
statement type problem.
You can use some interesting GADT based solutions to fix that, but another
approach that I've been using recently is to use explicit recursion in a
slightly different place.
type (v :> f) = f (v f)
data Var (f :: * -> *) = V String
data Exp f
= App (Exp :> f) (Exp :> f)
| Lam (Var :> f) (Exp :> f)
| Var (Var :> f)
data Stmt f
= If (Exp :> f) [Stmt :> f] [Stmt :> f]
| ...
Now we can have a lot of different kinds of expressions based on what we
substitute in for f.
data Ann a e = Ann a e
newtype Mu e = Mu e
data Free a e = Return a | Free e
newtype Base a e = Base a
now:
Stmt (Base Int) -- is a statement wrapped around integers
Stmt (Ann Int) -- is a statement wrapped around subtrees of various types
annotated with integers
Stmt Mu -- is your old statement type with newtype Mu wrappers on its
children.
Stmt (Free Int) is your old statement data type, with occasional integer
place holders for unexpanded portions of the tree, they can act as standins
for Exps, Vars, etc.
You can then borrow a trick from a recent post of mine:
http://comonad.com/reader/2009/incremental-folds/
with some minor modifications to extract data incrementally or return
results as you grow the syntax tree.
The design space is large and there are a lot of options to explore around
here, so don't take any of this as the one and only way to implement a
syntax ADT. =)
-Edward Kmett
On Sun, Aug 2, 2009 at 1:25 AM, Michal D.
I'm in the process of writing a toy compiler but I'm having some trouble trying to make my datatypes general. For example, using parsec I parse statements as:
data Stmt = SIf Test [Stmt] [Stmt] | ...
However, when it's time to create a control flow graph it would be nice to represent statements as (the Int's signify the node id's for either case of the if statement):
data Stmt = SIf Test Int Int | ...
So, in a eureka moment I decided that this should be allowable with the following declaration:
data Stmt link = SIf Test link link | ...
Ofcourse, the problem is trying to declare the resulting type for parsing: "parse -> Stmt [Stmt [Stmt ....]]". Any hints on whether there is a way to accomplish what I'm trying to do or do I have to bite the bullet and declare two seperate datatypes? I tried being clever and declaring a 'helper' type as "type StmtRec = Stmt [StmtRec]" but to no avail... GHC won't let it slide: "Cycle in type synonym declarations"!
Cheers,
Michal _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
participants (6)
-
Bulat Ziganshin
-
Edward Kmett
-
Heinrich Apfelmus
-
Jason Dagit
-
Job Vranish
-
Michal D.