
Hello everyone, I am trying to create an AST for Python. My approach is to create a data type for each syntactic construct. But I am stuck trying to statically enforce some constraints over my statements. A very short example to illustrate my problem: newtype Ident = Id String data BinOp = Add | Sub data Exp = IntLit Integer | BinOpExp BinOp Exp Exp data NormalCtx data LoopCtx data Statement ctx where Compound :: Compound -> Statement ctx Pass :: Statement ctx Break :: Statement LoopCtx newtype Global = Global [Ident] data Suite ctx = Suite [Global] [Statement ctx] type Else = Suite NormalCtx data Compound = If [(Exp, Suite NormalCtx)] (Maybe Else) | While Exp (Suite LoopCtx) (Maybe Else) newtype Program = Program [Statement NormalCtx] The "global" statement makes an identifier visible in the local scope. It holds for the entire current code block. So it also works backwards, which is why I didn't make it a statement but part of a suite (= block of statements). Some statements may occur in any context, such as the "pass" statement. But others are only allowed in certain situations, such as the "break" statement. This is why I defined the Statement as a GADT. I just supply the context in which the statement may be used and the typechecker magically does the rest. Feeling very content with this solution I tried a slightly more complex program and discovered that my AST can not represent this Python program: for i in range(10): if i == 6: break The compound if statement is perfectly valid nested in the loop because the Compound constructor of Statement allows any context. But the suites inside the clauses of the if statement only allow normal contexts. Since Break has a LoopCtx the typechecker complains. Is there some other way to statically enforce that break statements can only occur _nested_ inside a loop? There is a similar problem with return statements that may only occur in functions. These nested statements should somehow 'inherit' a context, if that makes any sense :-) I know I can simply create separate data types statements that can occur inside loops and function bodies. But that would make the AST a lot more complex, something I try to avoid. Python's syntax is already complex enough! Most of these constraints are not in the EBNF grammar which can be found in the language reference, but they are specified in the accompanying text. The cpython interpreter will generate SyntaxError's when you violate these constraints. See also Python's language reference: http://docs.python.org/ref/ref.html (see sections 6 and 7)

Hi Something like this would do? if_ = Compound $ If [(IntLit 6, Suite [] [Break])] Nothing while_ = Compound $ While (IntLit 6) (Suite [] [if_]) Nothing f = Program [while_] -- this one fails -- f2 = Program [if_] newtype Ident = Id String data BinOp = Add | Sub data Exp = IntLit Integer | BinOpExp BinOp Exp Exp data NormalCtx data LoopCtx data Statement ctx where Compound :: Compound ctx -> Statement ctx Pass :: Statement ctx Break :: Statement LoopCtx newtype Global = Global [Ident] data Suite ctx = Suite [Global] [Statement ctx] type Else ctx = Suite ctx data Compound ctx where If :: [(Exp, Suite ctx)] -> Maybe (Else ctx) -> Compound ctx While :: Exp -> (Suite LoopCtx) -> Maybe (Else LoopCtx) -> Compound ctx newtype Program = Program [Statement NormalCtx] Daniel On Feb 20, 2008, at 5:12 PM, Roel van Dijk wrote:
Hello everyone,
I am trying to create an AST for Python. My approach is to create a data type for each syntactic construct. But I am stuck trying to statically enforce some constraints over my statements. A very short example to illustrate my problem:
newtype Ident = Id String
data BinOp = Add | Sub
data Exp = IntLit Integer | BinOpExp BinOp Exp Exp
data NormalCtx data LoopCtx
data Statement ctx where Compound :: Compound -> Statement ctx Pass :: Statement ctx Break :: Statement LoopCtx
newtype Global = Global [Ident]
data Suite ctx = Suite [Global] [Statement ctx]
type Else = Suite NormalCtx
data Compound = If [(Exp, Suite NormalCtx)] (Maybe Else) | While Exp (Suite LoopCtx) (Maybe Else)
newtype Program = Program [Statement NormalCtx]
The "global" statement makes an identifier visible in the local scope. It holds for the entire current code block. So it also works backwards, which is why I didn't make it a statement but part of a suite (= block of statements).
Some statements may occur in any context, such as the "pass" statement. But others are only allowed in certain situations, such as the "break" statement. This is why I defined the Statement as a GADT. I just supply the context in which the statement may be used and the typechecker magically does the rest.
Feeling very content with this solution I tried a slightly more complex program and discovered that my AST can not represent this Python program:
for i in range(10): if i == 6: break
The compound if statement is perfectly valid nested in the loop because the Compound constructor of Statement allows any context. But the suites inside the clauses of the if statement only allow normal contexts. Since Break has a LoopCtx the typechecker complains.
Is there some other way to statically enforce that break statements can only occur _nested_ inside a loop? There is a similar problem with return statements that may only occur in functions. These nested statements should somehow 'inherit' a context, if that makes any sense :-)
I know I can simply create separate data types statements that can occur inside loops and function bodies. But that would make the AST a lot more complex, something I try to avoid. Python's syntax is already complex enough!
Most of these constraints are not in the EBNF grammar which can be found in the language reference, but they are specified in the accompanying text. The cpython interpreter will generate SyntaxError's when you violate these constraints.
See also Python's language reference: http://docs.python.org/ref/ref.html (see sections 6 and 7) _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Your solutions allows a bit more but fails with the equivalent of def foo(): for i in range(10): if i == 6: return None The loop context 'overwrites' the function context which makes the return statement illegal. I think I need a type level list.
participants (2)
-
Daniel Gorín
-
Roel van Dijk