
Ben Franksen wrote:
Ben Franksen wrote:
Next thing I'll try is to transform such a grammar into an actual parser...
Which I also managed to get working.
First, before all this talking to myself here is boring you to death, please shout and I'll go away. Anyway, at least one person has privately expressed interest, so I'll post my code for the translation.(*)
{-# LANGUAGE ExistentialQuantification, GADTs, Rank2Types #-} {-# LANGUAGE TypeSynonymInstances, MultiParamTypeClasses, ImpredicativeTypes #-} import qualified Text.ParserCombinators.Parsec as P
Note that I have parameterized everything on the token (terminal) type. Here are the data types, adapting the rest of the code is completely mechanical.
data Production nt t a = Stop a | Terminal t (Production nt t a) | forall b. NonTerminal (nt b) (Production nt t (b -> a))
newtype Rule nt t a = Rule [Production nt t a]
type RuleSet nt t = forall a. nt a -> Rule nt t a
type Grammar nt t b = (nt b, RuleSet nt t)
I should probably turn this into a proper data type, which would BTW also make the ImpredicativeTypes extension unnecessary. Translation to Parsec --------------------- We restrict ourselves to Char as terminals for simplicity. The generalization to arbitrary token types would need another three arguments: showTok :: (tok -> String), nextPos :: (SourcePos -> tok -> [tok] -> SourcePos), and testTok :: (tok -> Maybe a), which are needed by P.tokenPrim.
parseGrammar :: Print nt => Grammar nt Char a -> P.Parser a parseGrammar (start,rules) = parseNonTerminal rules start
parseNonTerminal :: Print nt => RuleSet nt Char -> nt a -> P.Parser a parseNonTerminal rs nt = parseRule rs (rs nt) P.> pr nt
parseRule :: Print nt => RuleSet nt Char -> Rule nt Char a -> P.Parser a parseRule rs (Rule ps) = P.choice (map ({- P.try . -} parseProduction rs) ps)
parseProduction :: Print nt => RuleSet nt Char -> Production nt Char a -> P.Parser a parseProduction _ (Stop x) = return x parseProduction rs (Terminal c p) = P.char c >> parseProduction rs p parseProduction rs (NonTerminal nt p) = do vnt <- parseNonTerminal rs nt vp <- parseProduction rs p return (vp vnt)
This is really not difficult, once you understand how the list-like Production type works. The trick is that a NonTerminal forces the "rest" of the production to return a function type, so you can apply its result to the result of parsing the nonterminal. Whereas the result of parsing terminals gets ignored by the "rest" of the production. You might wonder how the code manages to return the correct integer values inside an ENum. Well, I did, at least. I don't yet understand it completely but I think the answer is in in the Functor and Applicative instances: all the code that interprets syntactic elements (up to the abstract syntax) inside the myGrm function gets pushed down through the elements of a production until it ends up at a Stop, where we can finally pull it out (see the first clause of parseProduction). Note also the (commented-out) use of P.try in function parseRule. Let's try it: *Main> putStrLn (printGrammar myGrm) *Start ::= Sum Sum ::= Product '+' Sum | Product Product ::= Value '*' Product | Value Value ::= Number | '(' Sum ')' Number ::= Digit Number | Digit Digit ::= '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' *Main> P.parseTest (parseGrammar myGrm) "2*(2+52)" parse error at (line 1, column 2): unexpected "*" expecting Number After re-inserting the P.try call, I can actually parse expressions (yay!): *Main> :r [1 of 1] Compiling Main ( Grammar.lhs, interpreted ) Ok, modules loaded: Main. *Main> P.parseTest (parseGrammar myGrm) "2*(2+52)" EProduct (ENum 2) (ESum (ENum 2) (ENum 52)) BTW, does anyone know a source (books, papers, blogs, whatever) about algorithms for automatic left-factoring? I searched with google and found some interesting papers on eliminating left recursion but nothing so far on left-factoring. Have these problems all been solved before the internet age? Cheers Ben (*) One of these days I really should get my hands dirty and set up a weblog; suggestions for how to proceed are appreciated. I would especially like something where I can just upload a literate Haskell file and it gets formatted automagically. Bonus points for beautifying operator symbols a la lhs2tex ;-)