Parsec combinator like Prolog's cut operator?

Hi all, I'm reading John Hughes' paper "Generalizing Monads to Arrows" and found the statement regarding parser combinators: "... depend on the programmer using an additional combinator similar to Prolog's 'cut' operator do declare that a parser need never backtrack beyond a certain point." Does something like this already exist in Parsec? If not is there a way to write it? Having this would really help with a parsing problem I have. Cheers, Erik -- ---------------------------------------------------------------------- Erik de Castro Lopo http://www.mega-nerd.com/

On Tue, Jun 29, 2010 at 10:26 PM, Erik de Castro Lopo
Hi all,
I'm reading John Hughes' paper "Generalizing Monads to Arrows" and found the statement regarding parser combinators:
"... depend on the programmer using an additional combinator similar to Prolog's 'cut' operator do declare that a parser need never backtrack beyond a certain point."
Does something like this already exist in Parsec? If not is there a way to write it?
Having this would really help with a parsing problem I have.
Cheers, Erik
For Parsec, in the absence of the "try" combinator, a parser will never back-track once it consumes a portion of the input. If "try" is pushed out into the leaves of you parser, you shouldn't run in to too much trouble with excessive backtracking. What problems are you running in to? Take care, Antoine

Antoine Latter wrote:
For Parsec, in the absence of the "try" combinator, a parser will never back-track once it consumes a portion of the input.
Thanks for reminding me.
If "try" is pushed out into the leaves of you parser, you shouldn't run in to too much trouble with excessive backtracking.
I agree.
What problems are you running in to?
Unfortunately I have an existing, mostly working and quite complex parser which does indeed use Parsec.try quite close to the root of the tree. I have tried on a number of ocassions to refactor this to remove this particular usage of Parsec.try without success. Cheers, Erik -- ---------------------------------------------------------------------- Erik de Castro Lopo http://www.mega-nerd.com/

Hi Erik Malcolm Wallace describes a commit combinator in the paper "Partial parsing: combining choice with commitment" which sounds like what you would want. It is implemented for Polyparse rather than Parsec though.
From a quick scan of the paper and code, the implementation appears to be built into the Parser type, so it is a primitive rather than a definable combinator.
If your Parsec parser uses try because it is not especially left-factored, one extra combinator I have found useful for left-factoring on the cheap is optionalSuffix: optionalSuffix :: (a -> c) -> (a -> b -> c) -> Parser a -> Parser b -> Parser c optionalSuffix f g pbody psuffix = do a <- pbody mb_b <- optionMaybe psuffix return $ maybe (f a) (\b -> g a b) mb_b

Stephen Tetley
Hi Erik
Malcolm Wallace describes a commit combinator in the paper "Partial parsing: combining choice with commitment" which sounds like what you would want. It is implemented for Polyparse rather than Parsec though. From a quick scan of the paper and code, the implementation appears to be built into the Parser type, so it is a primitive rather than a definable combinator.
From the Polyparse homepage (http://www.cs.york.ac.uk/fp/polyparse/):
,---- | If you are familiar with the Parsec library, then the key insight for | using PolyParse is that the two libraries' approach to backtracking | are the duals of each another. In Parsec, you must explicitly add a | try combinator at any location where backtracking might be | necessary. Users often find this a bit of a black art. In PolyParse by | contrast, all parsers are backtracking unless you explicitly add a | commit (or one of its variations). It is easy to tell where to add a | commit point, because you have already parsed enough of a data | structure to know that only one outcome is possible. For instance, if | you are parsing a Haskell value produced by 'show', then as soon as | you have parsed the initial constructor, you know that no other | constructor of that datatype is possible, so you can commit to | returning it. `---- -- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com IvanMiljenovic.wordpress.com

Ivan Lazar Miljenovic wrote:
From the Polyparse homepage (http://www.cs.york.ac.uk/fp/polyparse/):
,---- | If you are familiar with the Parsec library, then the key insight for | using PolyParse is that the two libraries' approach to backtracking | are the duals of each another. In Parsec, you must explicitly add a | try combinator at any location where backtracking might be | necessary. Users often find this a bit of a black art. In PolyParse by | contrast, all parsers are backtracking unless you explicitly add a | commit (or one of its variations). It is easy to tell where to add a | commit point, because you have already parsed enough of a data | structure to know that only one outcome is possible. For instance, if | you are parsing a Haskell value produced by 'show', then as soon as | you have parsed the initial constructor, you know that no other | constructor of that datatype is possible, so you can commit to | returning it. `----
Really? How interesting... I've never actually heard of Polyparse. I'll have to check it out.

Stephen Tetley wrote:
Malcolm Wallace describes a commit combinator in the paper "Partial parsing: combining choice with commitment" which sounds like what you would want. It is implemented for Polyparse rather than Parsec though.
Yes, I can see how that might help on some kinds of data. Thanks.
If your Parsec parser uses try because it is not especially left-factored, one extra combinator I have found useful for left-factoring on the cheap is optionalSuffix:
optionalSuffix :: (a -> c) -> (a -> b -> c) -> Parser a -> Parser b -> Parser c optionalSuffix f g pbody psuffix = do a <- pbody mb_b <- optionMaybe psuffix return $ maybe (f a) (\b -> g a b) mb_b
Funnily enough, while this function didn't help me directly, it did inspire me to take yet another look at the problem code (I've made at least 4 or 5 attempts to fix it over the last year). Revisiting the problem now, with your code in mind, I came up with a solution in about 5 minutes. Now that I've fixed it, it seems so obvious. That led me to wonder if there was a Parsec best-practices document. Does such a thing exist? Cheers, Erik -- ---------------------------------------------------------------------- Erik de Castro Lopo http://www.mega-nerd.com/

If you use the uu-parsing libraries you will get a breadth-first search, instead of a non-backtrcaking depth-first search of Parsec; furthermore you do not suffer from space leaks and get your results online. In addition you get error correction, with high-quality error messages. The principles of the library are explained in @inproceedings{uuparsing:piriapolis, Author = {Swierstra, S.~Doaitse}, Booktitle = {Language Engineering and Rigorous Software Development}, Date-Added = {2009-05-23 11:08:01 +0200}, Date-Modified = {2009-05-31 22:35:25 +0200}, Editor = {Bove, A. and Barbosa, L. and Pardo, A. and and Sousa Pinto, J.}, Pages = {252-300}, Place = {Piriapolis}, Publisher = {Spinger}, Series = {{LNCS}}, Title = {Combinator Parsers: a short tutorial}, Volume = {5520}, Year = {2009}} which is also available as a technical report: @techreport{UUCS2008044, Author = {Swierstra, S. Doaitse}, Date-Added = {2009-01-07 13:47:34 +0100}, Date-Modified = {2009-01-16 17:36:52 +0100}, Institution = {Department of Information and Computing Sciences, Utrecht University}, Number = {UU-CS-2008-044}, Pubcat = {techreport}, Title = {Combinator Parsing: A Short Tutorial}, Urlpdf = {{http://www.cs.uu.nl/research/techreps/repo/CS-2008/2008-044.pdf}} Follow the link to the pdf in this last Bibtex record. In the uu-parsing package you will find a file with examples, which you can just run to see the different constructs at work. Because te library is doing the hard work, writing parsers becomes simpler. Doaitse On 30 jun 2010, at 05:26, Erik de Castro Lopo wrote:
Hi all,
I'm reading John Hughes' paper "Generalizing Monads to Arrows" and found the statement regarding parser combinators:
"... depend on the programmer using an additional combinator similar to Prolog's 'cut' operator do declare that a parser need never backtrack beyond a certain point."
Does something like this already exist in Parsec? If not is there a way to write it?
Having this would really help with a parsing problem I have.
Cheers, Erik -- ---------------------------------------------------------------------- Erik de Castro Lopo http://www.mega-nerd.com/ _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
participants (6)
-
Andrew Coppin
-
Antoine Latter
-
Erik de Castro Lopo
-
Ivan Lazar Miljenovic
-
S. Doaitse Swierstra
-
Stephen Tetley