Troubles understanding Parsec Error Handling

I recently started writing my first application at work in Haskell and it deals with a lot of parsing. Among other things I often have to check for a lot of alternatives for fixed strings (parsing natural language text and people have a lot of ways to abbreviate the same thing in labels). So far I have been doing this basically via choice $ map (try . string) [ "foo", "bar", ... ] This works fine but has two disadvantages, it isn't very fast, in particular when many of the strings start with the same prefix and it also is a bit error prone since it breaks when you place a prefix of another string earlier in the list. My attempt at a solution was to use the bytestring-trie package for a little utility function that basically parses one character at a time, checks if the string parsed so far is in the trie and then calls itself recursively with the trie starting with that string. My attempt at that so far looks like this: (dependencies bytestring-trie, utf8-string and parsec 3) import qualified Data.ByteString.UTF8 as UTF8 import qualified Data.Trie as Trie import Text.Parsec import Text.Parsec.Text (GenParser) anyOf :: [String] -> GenParser u String anyOf l = try $ anyOf' t "" where t = Trie.fromList $ zip (map UTF8.fromString l) (repeat ()) anyOf' :: Trie.Trie () -> String -> GenParser u String anyOf' t s = try $ do c <- lookAhead $ anyChar let newS = s ++ [ c ] in case Trie.submap (UTF8.fromString newS) t of emptyT | Trie.null emptyT -> case Trie.member (UTF8.fromString s) t of True -> return s False -> unexpected $ show newS++", expecting one of "++show l restT -> do _ <- anyChar try $ anyOf' restT newS A successful example usage would be: parseTest (do; r1 <- anyOf ["Hello", "Hallo", "Foo", "HallofFame"]; r2 <- string "bla"; return (r1, r2)) "Hallobla" which results in ("Hallo","bla") (the extra string parser is there so errors in parsing too much are not hidden). An error would result .e.g. from parseTest (do; r1 <- anyOf ["Hello", "Hallo", "Foo", "HallofFame"]; r2 <- string "bla"; return (r1, r2)) "Hallofbla" which prints this: parse error at (line 1, column 8):unknown parse error And my question about this is made up of two parts 1. Why doesn't it print my "unexpected" message but instead says unknown parse error 2. Why is the location in the text off (I would expect it to fail at column 6 (first character beyond the result it could return) or 7 (first character that makes the string no prefix of any acceptable string) I am afraid my knowledge of Parsec internals is a bit too limited, some Google queries showed no similar problems and no obvious places in the Parsec source code to check for the answer to the first question in particular and I suspect the second is closely related to the first. Thanks for reading through my question and I hope someone knows the answer or at least some clues on where i might find it. Matthias Hoermann P.S.: I am hoping this time this works, last time it was rejected because google sends with @googlemail.com instead of @gmail.com for some reason.

Hi Matthias,
On Wed, May 30, 2012 at 1:36 PM, Matthias Hörmann
parseTest (do; r1 <- anyOf ["Hello", "Hallo", "Foo", "HallofFame"]; r2 <- string "bla"; return (r1, r2)) "Hallofbla"
which prints this:
parse error at (line 1, column 8):unknown parse error
And my question about this is made up of two parts
1. Why doesn't it print my "unexpected" message but instead says unknown parse error 2. Why is the location in the text off (I would expect it to fail at column 6 (first character beyond the result it could return) or 7 (first character that makes the string no prefix of any acceptable string)
What version of parsec 3 are you using? In version 3.1.1, I get (using Text.Parsec.String instead of Text.Parsec.Text): parse error at (line 1, column 1): unexpected "Hallofb", expecting one of ["Hello","Hallo","Foo","HallofFame"] which is what I would have expected, bearing in mind that 'try p' pretends that it hasn't consumed input when 'p' fails. I don't think you need to use 'try' in your 'anyOf' function, but you'll have to change it to handle seeing the end of input if the one-character look-ahead fails. Kevin -- Kevin Charter kevin.charter@acm.org

On Wed, May 30, 2012 at 3:11 PM, Kevin Charter
What version of parsec 3 are you using? In version 3.1.1, I get (using Text.Parsec.String instead of Text.Parsec.Text):
Ah, answered my own question. I gather you're using 3.1.2, since it's the first and so far only version with the Text.Parsec.Text module. Kevin -- Kevin Charter kevin.charter@acm.org

On Wed, May 30, 2012 at 4:18 PM, Kevin Charter
On Wed, May 30, 2012 at 3:11 PM, Kevin Charter
wrote: What version of parsec 3 are you using? In version 3.1.1, I get (using Text.Parsec.String instead of Text.Parsec.Text):
Ah, answered my own question. I gather you're using 3.1.2, since it's the first and so far only version with the Text.Parsec.Text module.
We changed how 'try' handled errors in some cases in between 3.1.1 and 3.1.2. I'll take a look at this. Antoine
Kevin -- Kevin Charter kevin.charter@acm.org
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Hi Antoine and Roman,
On Wed, May 30, 2012 at 4:14 PM, Antoine Latter
We changed how 'try' handled errors in some cases in between 3.1.1 and 3.1.2. I'll take a look at this.
Antoine
Thanks for confirming -- I tried 3.1.2 and got the same result as Matthias. And Roman, thanks for the light-speed patch! I was about to say I had an example that showed the problem might actually have to do with 'lookAhead' rather than 'try', and then I saw your message. Kevin -- Kevin Charter kevin.charter@acm.org

Hello Thanks for the quick help with this. I thought about the idea that lookAhead might be the cause of the positioning bug but then discarded that idea because I thought lookAhead should never lead to an error past wherever the input position is now considering it doesn't consume any input. I am aware of the issue with the error message position and the output, I was still working on improving that when I was puzzled by the fact that the error message I specified wasn't even returned to me. As for try, I believe I need it to make sure the input I consume one character at the time, before I know if I will reach another valid match (or any at all) does not stay consumed when my parser fails. I am still very much in the experimental phase as far as writing Parsec combinators beyond very simple stuff is concerned so I am open for suggestions on how to improve it in a way that doesn't need "try". Thanks again for all the help and especially for the patch. After applying it I do get the error message I specified. I noticed there are still some other problems in the code. In particular it doesn't work as intended in cases like this one: parseTest (do; r1 <- anyOf ["Hello", "Hallo", "Foo", "HallofFame"]; r2 <- string "fbla"; return (r1, r2)) "Hallofbla" where it should (according to my goal) return no parse error but instead accept "Hallo" and allow the string parser to consume the rejected suffix but I will try to fix that. Matthias Hoermann

* Matthias Hörmann
I noticed there are still some other problems in the code. In particular it doesn't work as intended in cases like this one:
parseTest (do; r1 <- anyOf ["Hello", "Hallo", "Foo", "HallofFame"]; r2 <- string "fbla"; return (r1, r2)) "Hallofbla"
where it should (according to my goal) return no parse error but instead accept "Hallo" and allow the string parser to consume the rejected suffix but I will try to fix that.
This looks more like a job for regular expressions. E.g. using the regex-applicative package: > let anyOf = foldr1 (<|>) . map string > let re = (,) <$> anyOf ["Hello", "Hallo", "Foo", "HallofFame"] <*> string "fbla" > "Hallofbla" =~ re Just ("Hallo","fbla") Theoretically regular expressions also do the kind of optimization that you achieve with a trie, but this particular engine doesn't. Nevertheless, it may be a good base for your own engine. -- Roman I. Cheplyaka :: http://ro-che.info/

* Matthias Hörmann
And my question about this is made up of two parts
1. Why doesn't it print my "unexpected" message but instead says unknown parse error 2. Why is the location in the text off (I would expect it to fail at column 6 (first character beyond the result it could return) or 7 (first character that makes the string no prefix of any acceptable string)
Thanks for reporting. This is a regression introduced by me in this patch:
Sun Feb 20 18:24:22 EET 2011 Roman Cheplyaka

On Wed, May 30, 2012 at 5:47 PM, Roman Cheplyaka
With this patch your code prints:
parse error at (line 1, column 7): unexpected "Hallofb", expecting one of ["Hello","Hallo","Foo","HallofFame"]
Hi folks, Roman's patch has been included in the newly-released parsec 3.1.3: http://hackage.haskell.org/package/parsec-3.1.3 Enjoy, Antoine
participants (4)
-
Antoine Latter
-
Kevin Charter
-
Matthias Hörmann
-
Roman Cheplyaka