
Hello all, this was previously asked on haskell-beginners, but only partially answered. As an exercise I am writing a parser roughly following the expamples in Graham Hutton's book. The language contains things like: data Exp = Lit Int -- literal integer | Plus Exp Exp My naive parser enters an infinite recursion, when I try to parse "1+2". I do understand why: "hmm, this expression could be a plus, but then it must start with an expression, lets check". and it tries to parse expression again and again considers Plus. Twan van Laarhoven told me that:
Left-recursion is always a problem for recursive-descend parsers.
and suggested to do something like:
parseExp = do lit <- parseLit pluses <- many (parsePlusToken *> parseLit) return (combinePlusesWithLit lit pluses)
combinePlusesWithLit = foldr Plus -- or foldl
This indeed does the trick, but only when the first token is a Lit (literal integer). I then added the possibility to optionally put things in parentheses. But then I cannot parse "(1+2)+3". The original code fails, because "(1+2)" is not a Lit and when I allow an expression as the first argument to "+" I get infinite recursion again. I am generally confused, that saying "a plus expression is an integer followed by many "plus somethings" is not what the language says. So this requires a lot of "paying attention" to get right. I'd much rather say "a plus expression is two expressions with a '+' in between". I do know for sure, that it is possible to parse "(1+2)+3" (ghci does it just fine). But I seem to be missing a trick. Can anyone shed some light on this? -- Martin

* Martin Drautzburg
I do know for sure, that it is possible to parse "(1+2)+3" (ghci does it just fine). But I seem to be missing a trick.
Can anyone shed some light on this?
The trick in this case is that ghci doesn't use a recursive descent parser — it uses an LR parser (generated by Happy). Another workaround is to use memoization of some sort — see e.g. GLL ("Generalized LL") parsing. Roman

Hi, Roman Cheplyaka wrote:
Another workaround is to use memoization of some sort — see e.g. GLL ("Generalized LL") parsing.
Is there a GLL parser combinator library for Haskell? I know about the gll-combinators for Scala, but havn't seen anything for Haskell. Bonus points for providing the graph-structured stack (for maximal sharing in the computation) and the shared packed parse forest (for maximal sharing in the results) as reusable components. Tillmann

* Tillmann Rendel
Hi,
Roman Cheplyaka wrote:
Another workaround is to use memoization of some sort — see e.g. GLL ("Generalized LL") parsing.
Is there a GLL parser combinator library for Haskell? I know about the gll-combinators for Scala, but havn't seen anything for Haskell.
I am not aware of any. Dmitry Astapov and I played with this idea a long time ago, but we didn't succeed. Might be a good time for someone interested to have another go at it. Roman

All,
Many (but not all) of the parsing algorithms that support left
recursion cannot be implemented in Haskell using the standard
representation of recursion in parser combinators. The problem
can be avoided in Scala because it has imperative features like
referential identity and/or mutable references. The most practical
solution currently is probably to manually transform your grammars
to a non-left-recursive form (as suggested above) and then use a
standard parser combinator library with a top-down parsing algorithm
(I suggest uu-parsinglib).
That being said, there is active research into alternative functional
representations of recursion in grammars/parsers that support a wider
range of algorithms. If you want to read up on such research, I
suggest the following papers to get an idea of some of the approaches:
Baars, Arthur, S. Doaitse Swierstra, and Marcos Viera. "Typed
transformations of typed grammars: The left corner transform."
Electronic Notes in Theoretical Computer Science 253.7 (2010): 51-64.
Devriese, Dominique, et al. "Fixing idioms: A recursion primitive
for applicative dsls." Proceedings of the ACM SIGPLAN 2013 workshop on
Partial evaluation and program manipulation. ACM, 2013.
Oliveira, Bruno CdS, and William R. Cook. "Functional programming
with structured graphs." Proceedings of the 17th ACM SIGPLAN
international conference on Functional programming. ACM, 2012.
Oliveira, Bruno C. D. S., and Andres Löh. "Abstract syntax graphs for
domain specific languages." Proceedings of the ACM SIGPLAN 2013
workshop on Partial evaluation and program manipulation. ACM, 2013.
DEVRIESE, DOMINIQUE, and FRANK PIESSENS. "Finally tagless observable
recursion for an abstract grammar model." Journal of Functional
Programming 1.1: 1-40.
For the last one, you can check out
http://projects.haskell.org/grammar-combinators/ about the
grammar-combinators library on Hackage. It has a packrat parser that
can deal with left-recursion and a grammar transformation that
transforms it away. There is a tutorial you can checkout.
Dominique
2013/2/20 Tillmann Rendel
Hi,
Roman Cheplyaka wrote:
Another workaround is to use memoization of some sort — see e.g. GLL ("Generalized LL") parsing.
Is there a GLL parser combinator library for Haskell? I know about the gll-combinators for Scala, but havn't seen anything for Haskell.
Bonus points for providing the graph-structured stack (for maximal sharing in the computation) and the shared packed parse forest (for maximal sharing in the results) as reusable components.
Tillmann
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Did you see expression parser in parsec
packagehttp://hackage.haskell.org/packages/archive/parsec/3.1.3/doc/html/Text-Parse...?
Is it not enough?
2013/2/20 Martin Drautzburg
Hello all,
this was previously asked on haskell-beginners, but only partially answered.
As an exercise I am writing a parser roughly following the expamples in Graham Hutton's book. The language contains things like:
data Exp = Lit Int -- literal integer | Plus Exp Exp
My naive parser enters an infinite recursion, when I try to parse "1+2". I do understand why:
"hmm, this expression could be a plus, but then it must start with an expression, lets check".
and it tries to parse expression again and again considers Plus.
Twan van Laarhoven told me that:
Left-recursion is always a problem for recursive-descend parsers.
and suggested to do something like:
parseExp = do lit <- parseLit pluses <- many (parsePlusToken *> parseLit) return (combinePlusesWithLit lit pluses)
combinePlusesWithLit = foldr Plus -- or foldl
This indeed does the trick, but only when the first token is a Lit (literal integer).
I then added the possibility to optionally put things in parentheses. But then I cannot parse "(1+2)+3". The original code fails, because "(1+2)" is not a Lit and when I allow an expression as the first argument to "+" I get infinite recursion again.
I am generally confused, that saying "a plus expression is an integer followed by many "plus somethings" is not what the language says. So this requires a lot of "paying attention" to get right. I'd much rather say "a plus expression is two expressions with a '+' in between".
I do know for sure, that it is possible to parse "(1+2)+3" (ghci does it just fine). But I seem to be missing a trick.
Can anyone shed some light on this?
-- Martin
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

More primitively, Parsec and its predecessor Hutton-Meijer provide the
chainl/chainr combinators, these automatically remove left recursion
"within" the parser - i.e. you don't have to rewrite the grammar.
On 20 February 2013 08:19, Dmitry Olshansky
Did you see expression parser in parsec package? Is it not enough?

Hi, Martin Drautzburg wrote:
As an exercise I am writing a parser roughly following the expamples in Graham Hutton's book. The language contains things like:
data Exp = Lit Int -- literal integer | Plus Exp Exp
So the grammar is: Exp ::= Int | Exp "+" Exp
My naive parser enters an infinite recursion, when I try to parse "1+2". I do understand why:
"hmm, this expression could be a plus, but then it must start with an expression, lets check".
and it tries to parse expression again and again considers Plus.
Indeed.
Twan van Laarhoven told me that:
Left-recursion is always a problem for recursive-descend parsers.
Note that the left recursion is already visible in the grammar above, no need to convert to parser combinators. The problem is that the nonterminal Exp occurs at the left of a rule for itself. One way to fix this problem is to refactor the grammar in order to avoid left recursion. So let's distinguish "expressions that can start with expressions" and "expressions that cannot start with expressions": Exp-norec ::= Int Exp-rec ::= Exp-norec | Exp-norec "+" Exp-rec Note that Exp-rec describes a list of Exp-norec with "+" in-between, so you can implement it with the combinator many. Now if you want to add a rule like Exp ::= "(" Exp ")" you need to figure out whether to add it to Exp-norec or Exp-rec. Since the new rule is not left recursive, you can add it to Exp-norec: Exp-norec ::= Int | "(" Exp-rec ")" Exp-rec ::= Exp-norec | Exp-norec "+" Exp-rec If you implement this grammar with parser combinators, you should be able to parse "(1+2)+3" just fine. Tillmann PS. Try adding multiplication to your grammar. You will need a similar trick to get the priorities right.

* Tillmann Rendel
One way to fix this problem is to refactor the grammar in order to avoid left recursion. So let's distinguish "expressions that can start with expressions" and "expressions that cannot start with expressions":
[...]
PS. Try adding multiplication to your grammar. You will need a similar trick to get the priorities right.
And then try adding subtraction ;-) Roman

Thank you very much. To clarify: I am not in need of a parser, I just wanted to understand why left recursion is an issue (that was easy) and what techniques help to circumvent the problem. So your answer was spot-on (though I haven't implemented it yet) On Wednesday, 20. February 2013 09:59:47 Tillmann Rendel wrote:
Hi,
Martin Drautzburg wrote:
As an exercise I am writing a parser roughly following the expamples in Graham Hutton's book. The language contains things like:
data Exp = Lit Int -- literal integer
| Plus Exp Exp
So the grammar is:
Exp ::= Int
| Exp "+" Exp
My naive parser enters an infinite recursion, when I try to parse "1+2". I do understand why:
"hmm, this expression could be a plus, but then it must start with an expression, lets check".
and it tries to parse expression again and again considers Plus.
Indeed.
Twan van Laarhoven told me that:
Left-recursion is always a problem for recursive-descend parsers.
Note that the left recursion is already visible in the grammar above, no need to convert to parser combinators. The problem is that the nonterminal Exp occurs at the left of a rule for itself.
One way to fix this problem is to refactor the grammar in order to avoid left recursion. So let's distinguish "expressions that can start with expressions" and "expressions that cannot start with expressions":
Exp-norec ::= Int Exp-rec ::= Exp-norec
| Exp-norec "+" Exp-rec
Note that Exp-rec describes a list of Exp-norec with "+" in-between, so you can implement it with the combinator many.
Now if you want to add a rule like
Exp ::= "(" Exp ")"
you need to figure out whether to add it to Exp-norec or Exp-rec. Since the new rule is not left recursive, you can add it to Exp-norec:
Exp-norec ::= Int
| "(" Exp-rec ")"
Exp-rec ::= Exp-norec
| Exp-norec "+" Exp-rec
If you implement this grammar with parser combinators, you should be able to parse "(1+2)+3" just fine.
Tillmann
PS. Try adding multiplication to your grammar. You will need a similar trick to get the priorities right.
-- Martin

On Wednesday, 20. February 2013 09:59:47 Tillmann Rendel wrote:
So the grammar is:
Exp ::= Int
| Exp "+" Exp
My naive parser enters an infinite recursion, when I try to parse "1+2". I do understand why:
"hmm, this expression could be a plus, but then it must start with an expression, lets check".
and it tries to parse expression again and again considers Plus.
Indeed.
Twan van Laarhoven told me that:
Left-recursion is always a problem for recursive-descend parsers.
Note that the left recursion is already visible in the grammar above, no need to convert to parser combinators. The problem is that the nonterminal Exp occurs at the left of a rule for itself.
Just a silly quick question: why isn't right-recursion a similar problem? -- Martin

Hi Martin, Martin Drautzburg wrote:
Note that the left recursion is already visible in the grammar above, no need to convert to parser combinators. The problem is that the nonterminal Exp occurs at the left of a rule for itself.
Just a silly quick question: why isn't right-recursion a similar problem?
I think the situation is symmetric: If you match the token stream right-to-left, right-recursion is problematic. Tillmann

* Martin Drautzburg
Twan van Laarhoven told me that:
Left-recursion is always a problem for recursive-descend parsers.
Note that the left recursion is already visible in the grammar above, no need to convert to parser combinators. The problem is that the nonterminal Exp occurs at the left of a rule for itself.
Just a silly quick question: why isn't right-recursion a similar problem?
Right recursion is recursion of the form A ::= B A There are two cases to consider here. The language defined by B may or may not contain the empty string. If it contains the empty string, then we have an indirect left recursion, since the rule A ::= A is an instance of the original rule. If it doesn't contain the empty string, then, by the time you get to A, you necessarily have to consume some part of the input. Thus, your recursion is well-founded — you enter the recursion with the input strictly smaller than you had in the beginning. Roman

On Sun, Feb 24, 2013 at 7:09 PM, Roman Cheplyaka
Thus, your recursion is well-founded — you enter the recursion with the input strictly smaller than you had in the beginning.
Perhaps you meant /productive/ corecursion? Because the definition "A ::= B A" you gave is codata. -- Kim-Ee

* Kim-Ee Yeoh
On Sun, Feb 24, 2013 at 7:09 PM, Roman Cheplyaka
wrote: Thus, your recursion is well-founded — you enter the recursion with the input strictly smaller than you had in the beginning.
Perhaps you meant /productive/ corecursion?
Because the definition "A ::= B A" you gave is codata.
It is just a grammar production, which I chose to interpret recursively. Whether it's productive when interpreted corecursively depends on the particular interpretation, I guess, and may not actually depend on factors like left recursion. I haven't thought about it much. Roman

* Kim-Ee Yeoh
On Sun, Feb 24, 2013 at 7:09 PM, Roman Cheplyaka
wrote: Thus, your recursion is well-founded — you enter the recursion with the input strictly smaller than you had in the beginning.
Perhaps you meant /productive/ corecursion? Because the definition "A ::= B A" you gave is codata.
Or perhaps you meant that the production itself, when interpreted as a definition, is corecursive? Well, yes, and so is any CFG written in BNF. But that doesn't buy us much, and is irrelevant to the discussion of parsing left-recursive grammars. Roman

On Sun, Feb 24, 2013 at 7:47 PM, Roman Cheplyaka
Or perhaps you meant that the production itself, when interpreted as a definition, is corecursive?
I was merely thrown off by your mention of "well-founded" and the assertion that you're left with a "strictly smaller" input. I don't see any of this. That's when I remembered that well-founded recursion (a desirable) is sometimes confused with productive corecursion (another desirable). -- Kim-Ee

* Kim-Ee Yeoh
I was merely thrown off by your mention of "well-founded" and the assertion that you're left with a "strictly smaller" input. I don't see any of this.
It may become more obvious if you try to write two recursive descent parsers (as recursive functions) which parse a left-recursive and a non-left-recursive grammars, and see in which case the recursion is well-founded and why. Roman

On Sun, Feb 24, 2013 at 8:03 PM, Roman Cheplyaka
It may become more obvious if you try to write two recursive descent parsers (as recursive functions) which parse a left-recursive and a non-left-recursive grammars, and see in which case the recursion is well-founded and why.
Which grammar are we discussing here? The one you outlined? A ::= B A As I pointed out earlier, this doesn't model a program (e.g. a compiler). It's an OS! What am I missing? -- Kim-Ee

Hi, Kim-Ee Yeoh wrote:
Perhaps you meant /productive/ corecursion? Because the definition "A ::= B A" you gave is codata.
If you write a recursive descent parser, it takes the token stream as an input and consumes some of this input. For example, the parser could return an integer that says how many tokens it consumed: parseA :: String -> Maybe Int parseB :: String -> Maybe Int Now, if we implement parseA according to the grammar rule A ::= B A we have, for example, the following: parseA text = case parseB text of Nothing -> Nothing Just n1 -> case parseA (drop n1 text) of Nothing -> Nothing Just n2 -> Just (n1 + n2) Note that parseA is recursive. The recursion is well-founded if (drop n1 text) is smaller then text. So we have two cases, as Roman wrote: If the language defined by B contains the empty string, then n1 can be 0, so the recursion is not well-founded and the parser might loop. If the language defined by B does not contain the empty string, then n1 is always bigger than 0, so (drop n1 text) is always smaller than text, so the recursion is well-founded and the parser cannot loop. So I believe the key to understanding Roman's remark about well-founded recursion is to consider the token stream as an additional argument to the parser. However, the difference between hidden left recursion and unproblematic recursion in grammars can also be understood in terms of productive corecursion. In that view, a parser is a process that can request input tokens from the token stream: data Parser = Input (Char -> Parser) | Success | Failure parseA :: Parser parseB :: Parser Now, if we implement parseA according to the grammar rule A ::= B A we have, for example, the following: andThen :: Parser -> Parser -> Parser andThen (Input f) p = Input (\c -> f c `andThen` p) andThen (Success) p = p andThen Failure p = p parseA = parseB `andThen` parseA Note that parseA is corecursive. The corecursion is productive if the corecursive call to parseA is guarded with an Input constructor. Again, there are two cases: If the language described by B contains the empty word, then parseB = Success, and (parseB `andThen` parseA) = parseA, so the corecursive call to parseA is not guarded and the parser is not productive. If the langauge described by B does not contain the empty word, then parseB = Input ..., and (parseB `andThen` parseA) = Input (... parseA ...), so the corecursive call to parseA is guarded and the parse is productive. So I believe the key to understanding left recursion via productive corecursion is to model the consumption of the token stream with a codata constructor. Both approaches are essentially equivalent, of course: Before considering the very same nonterminal again, we should have consumed at least one token. Tillmann

On Sun, Feb 24, 2013 at 10:04 PM, Tillmann Rendel < rendel@informatik.uni-marburg.de> wrote:
The recursion is well-founded if (drop n1 text) is smaller then text. So we have two cases, as Roman wrote:
If the language defined by B contains the empty string, then n1 can be 0, so the recursion is not well-founded and the parser might loop.
Ah! So "A ::= B A" is really /not/ the full grammar of the language but an abbreviated one, minus terminals. At the very least, partial parses make sense and the input stream is assumed finite. Because "A ::= B A" could be understood, not so much as a parsing rule, but as the full definition of a language which comprises only one word: BBBBB ... ad infinitum. So all that mention of well-foundedness was confusing. Thanks, Tillmann! -- Kim-Ee

On Sunday, 24. February 2013 16:04:11 Tillmann Rendel wrote:
Both approaches are essentially equivalent, of course: Before considering the very same nonterminal again, we should have consumed at least one token.
I see. Thanks So for the laymen: expr ::= expr "+" expr is a problem, because the parser considers expr again without having consumed any input. expr ::= literal pluses pluses ::= many ("+" expr) is not a problem, because by the time the parser gets to the rightmost expr is has consumes at least one "+". Instead of literal we can put anything which promises not to be left-recursive expr ::= exprNonLr "+" expr exprNonLr := ... As exprNonLr gets more complicated, we may end up with a whole set of nonLr parsers. I wonder if I can enforce the nonNr property somehow, i.e. enforce the rule "will not consider the same nonterminal again without having consumed any input". -- Martin

2013/2/26 Martin Drautzburg
I wonder if I can enforce the nonNr property somehow, i.e. enforce the rule "will not consider the same nonterminal again without having consumed any input".
You might be interested in this paper: Danielsson, Nils Anders. "Total parser combinators." ACM Sigplan Notices. Vol. 45. No. 9. ACM, 2010. Regards, Dominique

On Sun, Feb 24, 2013 at 6:31 AM, Martin Drautzburg wrote: Just a silly quick question: why isn't right-recursion a similar problem? Very roughly:
Left recursion is: let foo n = n + foo n in ...
Right recursion is: let foo 1 = 1; foo n = n + foo (n - 1) in ...
In short, matching the tokens before the right recursion will constitute an
end condition that will stop infinite recursion --- if only because you'll
hit the end of the input. Left recursion doesn't consume anything, just
re-executes itself.
--
brandon s allbery kf8nh sine nomine associates
allbery.b@gmail.com ballbery@sinenomine.net
unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net

As mentioned before, the way to handle this specific problem is to use either the pChainl or pChainr parser combinators, as e.g. found on:
http://hackage.haskell.org/packages/archive/uu-parsinglib/2.7.4.1/doc/html/T...
and many similar libraries. So one can write:
pExpr = pChainl ( (+) <$ pSym ' ')) pFactor
pFactor = iI '(' pExpr ')' Ii <|> pInteger <|> pIdentifier
What is even nicer is that one can easily extend this to deal with many different operators:
pExpr = foldr nextop [((+),'+'), ((*), '*'))] pGactor
where nextop (sem,sym) = pChainl sem <$ pSym sym))
It is obvious how to extend this further into operators with the same priority or being right associative. See furthermore:
@inproceedings{Fokker95:0,
title = {Functional Parsers},
author = {Jeroen Fokker},
year = {1995},
tags = {parsing},
researchr = {http://dutieq.st.ewi.tudelft.nl/publication/Fokker95%3A0},
cites = {0},
citedby = {0},
pages = {1-23},
booktitle = {Advanced Functional Programming, First International Spring School on Advanced Functional Programming Techniques, Båstad, Sweden, May 24-30, 1995, Tutorial Text},
editor = {Johan Jeuring and Erik Meijer},
volume = {925},
series = {Lecture Notes in Computer Science},
publisher = {Springer},
isbn = {3-540-59451-5},
}
Most left recursion stems from the fact that conventional CFG notation is sufficient, but unfortunately not ideally suited, to express oft occurring patterns. This is where parser combinators come in: they allow one to express what one wants to say instead of having to encode it using recursion, etc.
If you have a really nasty grammar where left recursion removal by hand would ruin your grammar, you may use a transform like the LeftCornerTransform as used e.g. in the ChristmasTree package, which removes the problem of exponential time behaviour of reading Haskell data types with infix operators. See: http://hackage.haskell.org/package/ChristmasTree-0.2, and which has been described in:
@article{DBLP
:journals/entcs/BaarsSV10,
author = {Arthur I. Baars and
S. Doaitse Swierstra and
Marcos Viera},
title = {Typed Transformations of Typed Grammars: The Left Corner
Transform},
journal = {Electr. Notes Theor. Comput. Sci.},
volume = {253},
number = {7},
year = {2010},
pages = {51-64},
ee = {http://dx.doi.org/10.1016/j.entcs.2010.08.031},
bibsource = {DBLP, http://dblp.uni-trier.de}
}
Doaitse
On Feb 20, 2013, at 8:13 , Martin Drautzburg
Hello all,
this was previously asked on haskell-beginners, but only partially answered.
As an exercise I am writing a parser roughly following the expamples in Graham Hutton's book. The language contains things like:
data Exp = Lit Int -- literal integer | Plus Exp Exp
My naive parser enters an infinite recursion, when I try to parse "1+2". I do understand why:
"hmm, this expression could be a plus, but then it must start with an expression, lets check".
and it tries to parse expression again and again considers Plus.
Twan van Laarhoven told me that:
Left-recursion is always a problem for recursive-descend parsers.
and suggested to do something like:
parseExp = do lit <- parseLit pluses <- many (parsePlusToken *> parseLit) return (combinePlusesWithLit lit pluses)
combinePlusesWithLit = foldr Plus -- or foldl
This indeed does the trick, but only when the first token is a Lit (literal integer).
I then added the possibility to optionally put things in parentheses. But then I cannot parse "(1+2)+3". The original code fails, because "(1+2)" is not a Lit and when I allow an expression as the first argument to "+" I get infinite recursion again.
I am generally confused, that saying "a plus expression is an integer followed by many "plus somethings" is not what the language says. So this requires a lot of "paying attention" to get right. I'd much rather say "a plus expression is two expressions with a '+' in between".
I do know for sure, that it is possible to parse "(1+2)+3" (ghci does it just fine). But I seem to be missing a trick.
Can anyone shed some light on this?
-- Martin
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
participants (9)
-
Brandon Allbery
-
Dmitry Olshansky
-
Dominique Devriese
-
Kim-Ee Yeoh
-
Martin Drautzburg
-
Roman Cheplyaka
-
S. Doaitse Swierstra
-
Stephen Tetley
-
Tillmann Rendel