[Parsec] Backtracking with try does not work for me?

I'm trying to write a Parsec parser for a language which authorizes (this is a simplified example) "a" or "a,b,c" or "a,c" or "a,b". (I can change the grammar but not the language.) The first attempt was: ***** CUT HERE **** import Text.ParserCombinators.Parsec import System (getArgs) comma = char ',' minilang = do char 'a' optional (do {comma ; char 'b'}) optional (do {comma ; char 'c'}) eof return "OK" run parser input = case (parse parser "" input) of Left err -> putStr ("parse error at " ++ (show err) ++ "\n") Right x -> putStr (x ++ "\n") main = do args <- getArgs run minilang (head args) ***** CUT HERE **** Of course, it fails for "a,c": parse error at (line 1, column 3): unexpected "c" expecting "b" for a reason explained in Parsec's documentation (the parser "optional (do {comma ; char 'b'})" already consumed the input, do note the column number). What puzzles me is that the solution suggested in Parsec's documentation does not work either: ********* CUT HERE ******************* minilang = do char 'a' try (optional (do {comma ; char 'b'})) optional (do {comma ; char 'c'}) eof return "OK" ********* CUT HERE ******************* parse error at (line 1, column 2): unexpected "c" expecting "b" Apparently, "try" was used (do note that the column number indicates that there was backtracking) but the parser still fails for "a,c". Why?

On Mon, Jul 31, 2006 at 09:04:32AM +0200, Stephane Bortzmeyer wrote:
minilang = do char 'a' try (optional (do {comma ; char 'b'})) optional (do {comma ; char 'c'}) eof return "OK"
parse error at (line 1, column 2): unexpected "c" expecting "b"
Apparently, "try" was used (do note that the column number indicates that there was backtracking) but the parser still fails for "a,c". Why?
minilang = do char 'a' try b <|> (return '-') optional c eof return "OK" where b = do { comma ; char 'b' } c = do { comma ; char 'c' } The (return 'x') is needed for type consistency. The (try) combinator doesn't spare you the error, it merely resets the cursor on the input stream. To catch the parse error, you need to name a throwaway alternative. cheers, matthias -- Institute of Information Systems, Humboldt-Universitaet zu Berlin web: http://www.wiwi.hu-berlin.de/~fis/ e-mail: fis@wiwi.hu-berlin.de tel: +49 30 2093-5742 fax: +49 30 2093-5741 office: Spandauer Strasse 1, R.324, 10178 Berlin, Germany pgp: AD67 CF64 7BB4 3B9A 6F25 0996 4D73 F1FD 8D32 9BAA

On Mon, Jul 31, 2006 at 10:59:14AM +0200,
Matthias Fischmann
try b <|> (return '-')
...
The (return 'x') is needed for type consistency.
OK, it works. Many thanks for the solution and the explanations.

Stephane Bortzmeyer wrote:
minilang = do char 'a' try (optional (do {comma ; char 'b'})) optional (do {comma ; char 'c'}) eof return "OK"
********* CUT HERE *******************
parse error at (line 1, column 2): unexpected "c" expecting "b"
Apparently, "try" was used (do note that the column number indicates that there was backtracking) but the parser still fails for "a,c". Why?
Because 'try' can only help you if its argument fails. If the argument to 'try' succeeds, then it behaves as if it wasn't there. Now 'optional x' always succeeds, so the 'try' is useless where you placed it. You need to 'try' the argument to 'optional':
minilang = do char 'a' optional (try (do {comma ; char 'b'})) optional (do {comma ; char 'c'}) eof return "OK"
You could also factor your grammar or use ReadP, where backtracking is not an issue. Udo. -- Ours is a world where people don't know what they want and are willing to go through hell to get it. -- Don Marquis

On Mon, Jul 31, 2006 at 12:57:04PM +0200,
Udo Stenzel
Now 'optional x' always succeeds, so the 'try' is useless where you placed it. You need to 'try' the argument to 'optional':
It works, too. Many thanks for the code and the explanation.
You could also factor your grammar
It is a language I do not control, so I prefer to keep the grammar as close as possible from the official specification.

The semantics of Parsec's "optional" operation are what is causing the problem. "optional foo" can have 3 results: 1) foo can succeed, optional succeeds, proceed to next command 2) foo can fail without consuming any input, optional succeeds proceed to next command 3) foo can fail after consuming some input, optional fails, do not proceed
minilang = do char 'a' optional (do {comma ; char 'b'})
The comma in the above line consumes input even in the "a,c" case. When "c" is seen the "char 'b'" fails and then the optional fails, and you get the error message you posted.
optional (do {comma ; char 'c'}) eof return "OK"
Apparently, "try" was used (do note that the column number indicates that there was backtracking) but the parser still fails for "a,c". Why?
Your next attempt does not fix the problem, since the try is in the wrong place ( http://www.cs.uu.nl/~daan/download/parsec/parsec.html#try may help)
minilang = do char 'a' try (optional (do {comma ; char 'b'}))
In the above line, the ",c" causes (char 'b') to fail, which causes 'optional' to fail, and then "try" also fails. The "try" alters the stream so that the "comma" was not consumed, but the "try" still passes along the failure. In neither the original or the modified minilang does the 'char "c"' line ever get reached in the "a,c" input case. The working solution is a small tweak:
minilang = do char 'a' optional (try (do {comma ; char 'b'})) optional (do {comma ; char 'c'}) eof return "OK"
Now the "a,c" case causes the (char 'b') to fail, and then the "try" also fails, but also acts as if the comma had not been consumed. Thus we are in case #2 of the semantics of "optional" and so "optional" succeeds instead of failing, allowing the next line to parse ",c" then eof then return "OK". There is a very very important difference to Parsec between failing with and without having consumed input. It means Parsec can be more efficient, since any branch that consumes input cannot backtrack. The "try" command is a way to override this optimization and allow more backtracking. The other solution presented on this list was:
minilang = do char 'a' try b <|> (return '-') optional c eof return "OK" where b = do { comma ; char 'b' } c = do { comma ; char 'c' }
In this case, the "optional" was replace by (<|> (return '-')). In fact you could define optional this way:
optional :: GenParser tok st a -> GenParser tok st () optional foo = (foo >> return ()) <|> (return ())
Thus "optional (try b)" is actually the same as "(b >> return ()) <|> (return ())". So you can see my suggestion is really identical the previous one. I could not help generalizing your toy problem to an ordered list of comma separated Char. Note that "try" is not actually needed in listlang, but it would be if (char x) were replaced by something that can consume more than a single character:
listlang :: [Char] -> GenParser Char st [Char] listlang [] = eof >> return [] listlang (x:xs) = useX <|> listlang xs where useX = do try (char x) rest <- end <|> more return (x:rest) end = (eof >> return []) more = comma >> listlang xs
Now minilang (the fixed version) is the same as (listlang ['a','b','c']) or (listlang "abc"). This is a good example:
*Main> run (listlang "abcd") "c,b" parse error at (line 1, column 3): unexpected "b" expecting "d" or end of input
-- Chris

On Mon, Jul 31, 2006 at 06:51:27PM +0100,
Chris Kuklewicz
minilang = do char 'a' optional (try (do {comma ; char 'b'})) optional (do {comma ; char 'c'}) eof return "OK"
I now have a new problem which was hidden beneath. If the language authorizes "a,bb" and "a,bbc", "a,bbc" is not accepted by my parser since it already accepted "a,bb" and the "c" which is left triggers a syntax error. This time, "try" believes it succeeded but should not. I need more look-ahead but I'm not sure how? (Again, I do not control the language so I cannot make it more deterministic.)

Stephane Bortzmeyer wrote:
On Mon, Jul 31, 2006 at 06:51:27PM +0100, Chris Kuklewicz
wrote a message of 102 lines which said: minilang = do char 'a' optional (try (do {comma ; char 'b'})) optional (do {comma ; char 'c'}) eof return "OK"
I now have a new problem which was hidden beneath. If the language authorizes "a,bb" and "a,bbc", "a,bbc" is not accepted by my parser since it already accepted "a,bb" and the "c" which is left triggers a syntax error.
This time, "try" believes it succeeded but should not. I need more look-ahead but I'm not sure how?
The problem is mentioned here: http://www.cs.uu.nl/people/daan/download/parsec/parsec.html#notFollowedBy Your whole parser is indeed failing, and again it is because of the "failing after consuming some input" issue. For "a,bbc" your "bb" token parser consumes the "bb" and then the dangling "c" causes the error. So you cannot commit to consuming the "bb" unless you know the rest of the string is okay. There are a few ways to accomplish this. The first would be to test whether "bb" is followed by "eof" or "comma" before accepting it. Another solution is to try and parse what follows "bb" before accepting "bb". A small fix would look like:
minilang' = do string "a" optional (try $ do {comma ; string "bb"; endToken}) optional (do {comma ; string "bbc"}) eof return "OK" where endToken = eof <|> lookAhead (comma >> return ())
A more general fix looks like this:
stringLang :: [String] -> GenParser Char st [String] stringLang items = polyLang comma (map string items)
listLang :: [Char] -> GenParser Char st [Char] listLang items = polyLang comma (map char items)
The first version of polyLang uses the "test eof or comma before accepting" strategy:
polyLang :: (Show element,Show token) => GenParser element state ignore -> [GenParser element state token] -> GenParser element state [token] polyLang _ [] = eof >> return [] polyLang separator input = (use input) <|> polyLang separator (tail input) where use (opX:xs) = do (x,test) <- try (do x <- opX test <- more when test (separator >> return ()) return (x,test)) rest <- if test then (loop xs <|> unexpected ("(problem after "++show x++")")) else return [] return (x:rest) more = option True (eof >> return False) loop [] = (unexpected "cannot parse") loop input' = use input' <|> loop (tail input')
The second version polyLang' uses the "test rest of input before accepting" strategy:
polyLang' :: (Show element,Show token) => GenParser element state ignore -> [GenParser element state token] -> GenParser element state [token] polyLang' _ [] = eof >> return [] polyLang' separator input = (use input) <|> polyLang' separator (tail input) where use (opX:xs) = try (do x <- opX test <- more rest <- if test then separator >> (loop xs <|> unexpected ("(problem after "++show x++")")) else return [] return (x:rest)) more = option True (eof >> return False) loop [] = (unexpected "cannot parse") loop input' = use input' <|> loop (tail input')
It works:
*Main> run (stringLang ["a","bb","bbc"]) "a,bbc" ["a","bbc"]
The error reporting gets a bit strange, and is different between the two versions of polyLang'
*Main> run (polyLang comma (map string ["a","bb","bbc","dd"])) "a,bbc,bb" parse error at (line 1, column 7): unexpected cannot parse or (problem after "bbc") expecting "dd"
*Main> run (polyLang' comma (map string ["a","bb","bbc","d"])) "a,bbc,bb" parse error at (line 1, column 1): unexpected "c", cannot parse, (problem after "bbc"), (problem after "a") or "a" expecting end of input, ",", "dd", "bb" or "bbc"

On Tue, Aug 01, 2006 at 09:41:40AM +0100,
Chris Kuklewicz
The problem is mentioned here:
http://www.cs.uu.nl/people/daan/download/parsec/parsec.html#notFollowedBy
notFollowedBy seems to work for me and is quite simple, even for my brain. Thanks.
The first would be to test whether "bb" is followed by "eof" or "comma" before accepting it.
notFollowedBy actually does the opposite (checking that there are no more letters).

Stephane Bortzmeyer wrote:
The first would be to test whether "bb" is followed by "eof" or "comma" before accepting it.
notFollowedBy actually does the opposite (checking that there are no more letters).
Are you sure that you don't actually want *> many1 letter `sepBy1` comma ? Just asking, because somehow I have a feeling that the next step is to accept 'a,c,b'... Udo. -- The Force is what holds everything together. It has its dark side, and it has its light side. It's sort of like cosmic duct tape.

On Tue, Aug 01, 2006 at 10:08:16PM +0200, Stephane Bortzmeyer wrote:
notFollowedBy seems to work for me and is quite simple, even for my brain. Thanks.
Actually, it doesn't work, and is quite subtle, at least for my brain. There was a discussion in this thread: http://haskell.org/pipermail/haskell/2004-February/013621.html Andrew
participants (5)
-
Andrew Pimlott
-
Chris Kuklewicz
-
Matthias Fischmann
-
Stephane Bortzmeyer
-
Udo Stenzel