
Stephane Bortzmeyer wrote:
On Mon, Jul 31, 2006 at 06:51:27PM +0100, Chris Kuklewicz
wrote a message of 102 lines which said: minilang = do char 'a' optional (try (do {comma ; char 'b'})) optional (do {comma ; char 'c'}) eof return "OK"
I now have a new problem which was hidden beneath. If the language authorizes "a,bb" and "a,bbc", "a,bbc" is not accepted by my parser since it already accepted "a,bb" and the "c" which is left triggers a syntax error.
This time, "try" believes it succeeded but should not. I need more look-ahead but I'm not sure how?
The problem is mentioned here: http://www.cs.uu.nl/people/daan/download/parsec/parsec.html#notFollowedBy Your whole parser is indeed failing, and again it is because of the "failing after consuming some input" issue. For "a,bbc" your "bb" token parser consumes the "bb" and then the dangling "c" causes the error. So you cannot commit to consuming the "bb" unless you know the rest of the string is okay. There are a few ways to accomplish this. The first would be to test whether "bb" is followed by "eof" or "comma" before accepting it. Another solution is to try and parse what follows "bb" before accepting "bb". A small fix would look like:
minilang' = do string "a" optional (try $ do {comma ; string "bb"; endToken}) optional (do {comma ; string "bbc"}) eof return "OK" where endToken = eof <|> lookAhead (comma >> return ())
A more general fix looks like this:
stringLang :: [String] -> GenParser Char st [String] stringLang items = polyLang comma (map string items)
listLang :: [Char] -> GenParser Char st [Char] listLang items = polyLang comma (map char items)
The first version of polyLang uses the "test eof or comma before accepting" strategy:
polyLang :: (Show element,Show token) => GenParser element state ignore -> [GenParser element state token] -> GenParser element state [token] polyLang _ [] = eof >> return [] polyLang separator input = (use input) <|> polyLang separator (tail input) where use (opX:xs) = do (x,test) <- try (do x <- opX test <- more when test (separator >> return ()) return (x,test)) rest <- if test then (loop xs <|> unexpected ("(problem after "++show x++")")) else return [] return (x:rest) more = option True (eof >> return False) loop [] = (unexpected "cannot parse") loop input' = use input' <|> loop (tail input')
The second version polyLang' uses the "test rest of input before accepting" strategy:
polyLang' :: (Show element,Show token) => GenParser element state ignore -> [GenParser element state token] -> GenParser element state [token] polyLang' _ [] = eof >> return [] polyLang' separator input = (use input) <|> polyLang' separator (tail input) where use (opX:xs) = try (do x <- opX test <- more rest <- if test then separator >> (loop xs <|> unexpected ("(problem after "++show x++")")) else return [] return (x:rest)) more = option True (eof >> return False) loop [] = (unexpected "cannot parse") loop input' = use input' <|> loop (tail input')
It works:
*Main> run (stringLang ["a","bb","bbc"]) "a,bbc" ["a","bbc"]
The error reporting gets a bit strange, and is different between the two versions of polyLang'
*Main> run (polyLang comma (map string ["a","bb","bbc","dd"])) "a,bbc,bb" parse error at (line 1, column 7): unexpected cannot parse or (problem after "bbc") expecting "dd"
*Main> run (polyLang' comma (map string ["a","bb","bbc","d"])) "a,bbc,bb" parse error at (line 1, column 1): unexpected "c", cannot parse, (problem after "bbc"), (problem after "a") or "a" expecting end of input, ",", "dd", "bb" or "bbc"