
Yeah, I know this has been discussed a number of times, but I have some concrete questions I haven't seen asked before. And the "parsec 3 is now as fast as parsec 2" thing I've seen around doesn't seem to be true for me. I have an app that does a lot of parsing of small expressions. It's currently parsec2 operating on lots of little Texts (after an unpack, of course). A few parsing functions show up near the top of the profile output, so I thought an obvious improvement would be to parse Text directly and avoid the overhead and garbage of unpacking. Since parsec3 is now supposed to be as fast as parsec2 I thought I would give it a try. Parsec3 is 3.1.0, parsec 2 is 2.1.0.1: parsec2, String: total time = 10.66 secs (533 ticks @ 20 ms) total alloc = 2,340,113,404 bytes (excludes profiling overheads) parsec3, String: (this is just after upgrading the library and editing it to fix breakage from Parser being a type alias now) total time = 13.76 secs (688 ticks @ 20 ms) total alloc = 2,706,625,256 bytes (excludes profiling overheads) parsec3, Text: (wrote a Text instance similar to the one for ByteString, updated imports, no longer unpacking to String) total time = 15.96 secs (798 ticks @ 20 ms) total alloc = 3,338,005,896 bytes (excludes profiling overheads) This is not very encouraging! Especially strange is how Text generates *more* allocation... I'd expect less since it doesn't unpack all the Texts. The parsing functions are no longer at the top of the profile, but there are new 'unParser' and 'parsecMap' and 'parserBind' up at or near the top. 'unParser' just looks like it's unwrapping the Parsec newtype, so I don't fully understand how it's the most expensive, but it's called on every bind so it does get called a lot. There are no obvious super expensive ones, just lots and lots of them that add up. Parsec 3's unParser covers up the parsing function I wrote, so it's now hard to tell what the expensive parsing function actually is. I've seen a few remarks that you can't just throw together parsers and expect them to be fast, you have to profile them, but nothing on how to actually interpret the results of profiling. For instance, here's one of the main expensive parsers: p_unsigned_float :: P.CharParser st Double p_unsigned_float = do i <- P.many P.digit f <- P.option "" (P.char '.' >> P.many1 P.digit) if (null i && null f) then P.pzero else do let int = List.foldl' (\total c -> 10 * total + fromIntegral (Char.digitToInt c)) 0 i frac = foldr (\c total -> (total + fromIntegral (Char.digitToInt c)) / 10) 0 f return (int + frac) There's an obvious problem where I get the digits as a String and then parse that with list functions, but I can't see any way to get parsec to return a chunk of Text. This is roughly how parsec itself parses numbers, in Text.Parsec.Token. So, my current options are either figure out some way to speed up parsec3+Text, revert to parsec2+String and give up, or try an entirely different parsing library. I've heard attoparsec is fast but I'd have to switch to utf8 bytestring which is a big change, and Text seems like the more correct choice anyway. Any ideas or experience?