
On Mon, Dec 27, 2010 at 6:51 AM, Evan Laforge
I've uploaded attoparsec-text and attoparsec-text-enumerator to Hackage. I've written those packages late last week and asked for
Very nice! I'll download this and try it out. Attoparsec has a bit different combinators than parsec so it'll take some rewriting, but it's work I'd have to do anyway to try the bytestring+attoparsec approach.
Well, I tried it... and it's still slower! parsec2, String: (a little faster since last time since I have new computer) total time = 9.10 secs (455 ticks @ 20 ms) total alloc = 2,295,837,512 bytes (excludes profiling overheads) attoparsec-text, Data.Text: total time = 14.72 secs (736 ticks @ 20 ms) total alloc = 2,797,672,844 bytes (excludes profiling overheads) Top consumer in the profile is now Data.Attoparsec.Text.Internal.runParser, followed, several entries later, by bindP, addS, and mysteriously >. Suspicious that parsec was compiled without profiling and hence not incurring profiling overhead since parsec never appears in the profile, I tried running without any profiling flags, but the numbers come about about the same, I guess the prof output has already subtracted profiling overhead. The attoparsec profile output is hard to interpret, it's a huge tree of internal attoparsec functions that are individually cheap but all add up under runParser. runParser itself is simple a newtype accessor so I don't really understand why it's credited with so much time. But there are no clear culprits... my parsers make much use of takeWhile and skipWhile and combinators like <|> and 'many' only occur at the level of complete terms, and are thus called much more rarely. The greater allocation is pretty mysterious. I wasn't able to track it down via heap allocation, the biggest allocator by module that is a parsing module isn't much of an allocator, it peaks at around 350k. Intuition says it should be much less because of using packed Text, but I suppose even the takeWhile combinators have to unpack every character into a Char, so maybe it's even less efficient because at least String can directly reuse the Chars? Actually, I've thought about this problem with haskell libraries before: I have a packed array which I then do a bsearch over. The bsearch generates lots of garbage. I was originally confused but my current guess is that every comparison winds up unpacking the array element, wrapping it in the haskell data type, and then extracting the (boxed) Int from that. An efficient implementation would compare the int in place... perhaps it must inline the comparison and use a 'peek' specialized to just extract the desired int, and then hope that the optimizer figures out how to pass it unboxed. I'll try a few optimizations I can think of. If those fail, I'll try with ByteString, maybe it's a problem with attoparsec-text. If that fails, I'll give up for real and go back to Parsec 2, still the leader in speed. Just in case there's some useful criticism, here's one of the busier parsers: p_unsigned_float :: A.Parser Double p_unsigned_float = do i <- A.takeWhile Char.isDigit f <- A.option "" (A.char '.' >> A.takeWhile1 Char.isDigit) if (Text.null i && Text.null f) then mzero else do case (dec i, dec f) of (Just i', Just f') -> return $ fromIntegral i' + fromIntegral f' / fromIntegral (10 ^ (Text.length f)) _ -> mzero where dec :: Text.Text -> Maybe Int dec s | Text.null s = Just 0 | otherwise = case Text.Read.decimal s of Right (d, rest) | Text.null rest -> Just d _ -> Nothing