
I have a couple of problems with attoparsec which I think are related to its always backtrack nature, but maybe there's some other way to solve the same problems. The first is that it's hard to get the right error msg out. For instance, I have a parser that tries to parse a number with an optional type suffix. It's an error if the suffix is unrecognized: p_num :: A.Parser Score.TypedVal p_num = do num <- p_untyped_num suffix <- A.option "" ((:"") <$> A.letter_ascii) case Score.code_to_type suffix of Nothing -> fail $ "p_num expected suffix in [cdsr]: " ++ show suffix Just typ -> return $ Score.Typed typ num However, which error msg shows up depends on the order of the (<|>) alternatives, and in general the global structure of the entire parser, because I think it just backtracks and then picks the last failing backtrack. Even after carefully rearranging all the parsers it seems impossible to get this particular error to bubble up to the top. The thing is, as soon as I see an unexpected suffix I know I can fail entirely right there, with exactly that error msg, but since there's no way to turn off backtracking I think there's no way to do that. The second thing is that I want to lex a single token. I thought I could just parse a term and then see how far I got and then use that index to splitAt the input. But attoparsec doesn't keep track of the current index, so I wrote ((,) <$> p_term <*> A.takeByteString), and then I can use the length of the left over bytestring. But due to backtracking, that changes what p_term will parse. Since takeByteString always succeeds, it will cause p_term to backtrack until it finds some prefix that will match. The result is that instead of failing to parse, "1." will lex as ("1", "."). Since I integrate lexing and parsing (as is common with combinator parsers), and since it seems I can't keep track of byte position with attoparsec, I think I'm out of luck trying to do this the easy way. I think I have to write a separate lexer that tries to have the same behaviour as the parser, but is all separate code. I know attoparsec was designed for speed, and error reporting and source positions are secondary. Am I simply asking too much of it? I originally used parsec, but parsing speed is my main bottleneck, so I don't want to give ground there. Is there a clever way to get attoparsec to do what I want? Or a ByteString or Text parser out there which is fast, but can not backtrack, or keep track of input position? I've heard some good things about traditional alex+happy... of course it would mean a complete rewrite but might be interesting. Has anyone compared the performance of attoparsec vs. alex+happy?