
Robert Atkey wrote:
A deep embedding of a parsing DSL (really a context-sensitive grammar DSL) would look something like the following. I think I saw something like this in the Agda2 code somewhere, but I stumbled across it when I was trying to work out what "free" applicative functors were.
[snip code & explanation]
This is extremely cool. I tried to understand in my head how this all works but it just didn't click. It all seemed like magic. Then I went ahead and tried to write a printer for your example grammar and now everything is much clearer. Although I had to fight the type checker quite a bit. This is the generic part:
class Print f where pr :: f a -> String
instance Print nt => Print (Production nt) where pr = printProduction
printProduction :: Print nt => Production nt a -> String printProduction (Stop _) = "" printProduction (Terminal t (Stop _)) = show t printProduction (Terminal t p) = show t ++ " " ++ printProduction p printProduction (NonTerminal nt (Stop _)) = pr nt printProduction (NonTerminal nt p) = pr nt ++ " " ++ printProduction p
instance Print nt => Print (Rule nt) where pr (Rule ps) = printPs ps where printPs [] = "" printPs [p] = printProduction p printPs (p:ps) = printProduction p ++ " | " ++ printPs ps
data Any f = forall a. Any (f a)
class Enumerable f where enumeration :: [Any f]
printRule :: Print nt => (nt a -> Rule nt a) -> nt a -> String printRule g nt = pr nt ++ " ::= " ++ pr (g nt)
printGrammar :: (Print nt, Enumerable nt) => Grammar nt -> String printGrammar g = foldr (++) "" (intersperse "\n" rules) where rules = map printAnyRule enumeration printAnyRule (Any nt) = printRule g nt
We must also provide instances for the concrete types:
instance Enumerable NT where enumeration = [Any Sum, Any Product, Any Value]
instance Print NT where pr Value = "Value" pr Product = "Product" pr Sum = "Sum"
So far so good. This even works... almost ;-) *Main> putStrLn $ printGrammar myGrm Sum ::= Product '+' Sum | Product Product ::= Value '*' Product | Value Value ::= Interrupted. -- had to hit Ctrl-C here When I replace 'posInt' with 'digit' in the rule for Value
myGrm Value = ENum <$> digit <|> id <$ char '(' <*> nt Sum <* char ')'
then the printer terminates just fine: *Main> putStrLn $ printGrammar myGrm Sum ::= Product '+' Sum | Product Product ::= Value '*' Product | Value Value ::= '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' | '(' Sum ')' I found that the problem is the use of function 'some' from Control.Applicative in
posInt :: Rule nt Int posInt = fix 1 . reverse <$> some digit where fix n [] = 0 fix n (d:ds) = d*n + fix (n*10) ds
Since 'some' is defined recursively, this creates an infinite production for numbers that you can neither print nor otherwise analyse in finite time. I can see at least two solutions: One is to parameterize everything over the type of terminals, too. A type suitable for the example would be
data T = TNum Int | TPlus | TMult | TOParen | TCParen
and leave token recognition to a separate scanner. The second solution (which I followed) is to break the recursion by adding another nonterminal to the NT type:
data NT a where Sum :: NT Expr Product :: NT Expr Value :: NT Expr Number :: NT [Int] Digit :: NT Int
instance Enumerable NT where enumeration = [Any Sum, Any Product, Any Value, Any Number, Any Digit]
instance Print NT where pr Sum = "Sum" pr Product = "Product" pr Value = "Value" pr Number = "Number" pr Digit = "Digit"
(Adding Digit /and/ Number is not strictly necessary, but it makes for a nicer presentation.)
myGrm :: Grammar NT myGrm Sum = ESum <$> nt Product <* char '+' <*> nt Sum <|> id <$> nt Product
myGrm Product = EProduct <$> nt Value <* char '*' <*> nt Product <|> id <$> nt Value
myGrm Value = (ENum . toNat) <$> nt Number <|> id <$ char '(' <*> nt Sum <* char ')'
myGrm Number = extend <$> nt Digit <*> optional (nt Number)
myGrm Digit = digit
extend d Nothing = [d] extend d (Just ds) = d:ds
toNat :: [Int] -> Int toNat = fix 1 . reverse where fix n [] = 0 fix n (d:ds) = d*n + fix (n*10) ds
With this I get *Main> putStrLn $ printGrammar myGrm Sum ::= Product '+' Sum | Product Product ::= Value '*' Product | Value Value ::= Number | '(' Sum ')' Number ::= Digit Number | Digit Digit ::= '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' Morale: Be careful with recursive functions when constructing a data representation (e.g. for a deep DSL). You might get an infinite representation which isn't what you want in this case. Oh, and another point: there should be a distinguished "start" nonterminal, otherwise this is not really a grammar. This suggests something like
type Grammar nt a = (nt a, forall b. nt b -> Rule nt b)
Next thing I'll try is to transform such a grammar into an actual parser... Cheers Ben