Simple parser question

Hello all, I just hit a sticking point when trying to parse something like data Exp = Lit Int -- literal integer | Plus Exp Exp where something like "1+2" should be parsed to "Plus (Lit 1) (Lit 2)". When I try to parse "1+2" my parser enters an infinite loop. I can understand why: it thinks "hmm, this expression could be a plus, but then it must start with an expression, lets check". and it tries to parse expression again and again considers Plus. When I change the rules, so it first checks for "Lit", it does parse the "1" just fine, but then gives up, because the remainder is not an expression anymore, but just a "+2". My parser is written in the style shown in Graham Hutton's book: Parser a :: String -> (a, String). I believe I am missing something obvious, but I can't see it. -- Martin

Left-recursion is always a problem for recursive-descend parsers. The solution is to rewrite the parser as: * first always parse an expression without a Plus * followed by zero or more "+ exp" parts. How exactly you write this depends on the combinators that the book defines for writing parsers. In Parsec you would write something like: parseExp = do lit <- parseLit pluses <- many (parsePlusToken *> parseLit) return (combinePlusesWithLit lit pluses) combinePlusesWithLit = foldr Plus -- or foldl I hope you get the idea. Note that the parsec library has functions chainl and chainr that do something like this under the hood, so you would never actually write the above code. Twan On 14/02/13 13:59, Martin Drautzburg wrote:
Hello all,
I just hit a sticking point when trying to parse something like
data Exp = Lit Int -- literal integer | Plus Exp Exp
where something like "1+2" should be parsed to "Plus (Lit 1) (Lit 2)".
When I try to parse "1+2" my parser enters an infinite loop. I can understand why: it thinks
"hmm, this expression could be a plus, but then it must start with an expression, lets check".
and it tries to parse expression again and again considers Plus.
When I change the rules, so it first checks for "Lit", it does parse the "1" just fine, but then gives up, because the remainder is not an expression anymore, but just a "+2".
My parser is written in the style shown in Graham Hutton's book:
Parser a :: String -> (a, String).
I believe I am missing something obvious, but I can't see it.

Thanks, that kind of worked. As long as I am only dealing with literal integers at the beginning of pluses is works fine. *Main> parse "(1+2+3)" Plus (Lit 1) (Plus (Lit 2) (Lit 3)) But this fails already: *Main> parse "(1+2)+3" Error "parse error" There is no problem with expressions in parenthes, but my "pluses" can only start with an integer (even one in parentheses), but not with an expression, since I had to get rid of the expression at the left side, to avoid left- recursion. I am also confused that saying "a plus expression is an integer followed by many "plus somethings" is not what the language says. So this requires a lot of "paying attention" to get right. I'd much rather say "a plus expression is two expressions with a '+' in between". Any other ideas? On Thursday, 14. February 2013 15:23:03 Twan van Laarhoven wrote:
Left-recursion is always a problem for recursive-descend parsers. The solution is to rewrite the parser as: * first always parse an expression without a Plus * followed by zero or more "+ exp" parts.
How exactly you write this depends on the combinators that the book defines for writing parsers. In Parsec you would write something like:
parseExp = do lit <- parseLit pluses <- many (parsePlusToken *> parseLit) return (combinePlusesWithLit lit pluses)
combinePlusesWithLit = foldr Plus -- or foldl
I hope you get the idea.
Note that the parsec library has functions chainl and chainr that do something like this under the hood, so you would never actually write the above code.
Twan
On 14/02/13 13:59, Martin Drautzburg wrote:
Hello all,
I just hit a sticking point when trying to parse something like
data Exp = Lit Int -- literal integer
| Plus Exp Exp
where something like "1+2" should be parsed to "Plus (Lit 1) (Lit 2)".
When I try to parse "1+2" my parser enters an infinite loop. I can understand why: it thinks
"hmm, this expression could be a plus, but then it must start with an expression, lets check".
and it tries to parse expression again and again considers Plus.
When I change the rules, so it first checks for "Lit", it does parse the "1" just fine, but then gives up, because the remainder is not an expression anymore, but just a "+2".
My parser is written in the style shown in Graham Hutton's book:
Parser a :: String -> (a, String).
I believe I am missing something obvious, but I can't see it.
_______________________________________________ Beginners mailing list Beginners@haskell.org http://www.haskell.org/mailman/listinfo/beginners
-- Martin
participants (2)
-
Martin Drautzburg
-
Twan van Laarhoven