Re: [Haskell-cafe] parsec2 vs. parsec3... again

13 Jan 2011

      On Mon, Dec 27, 2010 at 6:51 AM, Evan Laforge  wrote:
...
...
I've uploaded attoparsec-text and attoparsec-text-enumerator to
Hackage.  I've written those packages late last week and asked for
Very nice!  I'll download this and try it out.  Attoparsec has a bit
different combinators than parsec so it'll take some rewriting, but
it's work I'd have to do anyway to try the bytestring+attoparsec
approach.
Well, I tried it... and it's still slower!

parsec2, String: (a little faster since last time since I have new computer)
        total time  =        9.10 secs   (455 ticks @ 20 ms)
        total alloc = 2,295,837,512 bytes  (excludes profiling overheads)

attoparsec-text, Data.Text:
        total time  =       14.72 secs   (736 ticks @ 20 ms)
        total alloc = 2,797,672,844 bytes  (excludes profiling overheads)

Top consumer in the profile is now
Data.Attoparsec.Text.Internal.runParser, followed, several entries
later, by bindP, addS, and mysteriously .  Suspicious that parsec
was compiled without profiling and hence not incurring profiling
overhead since parsec never appears in the profile, I tried running
without any profiling flags, but the numbers come about about the
same, I guess the prof output has already subtracted profiling
overhead.

The attoparsec profile output is hard to interpret, it's a huge tree
of internal attoparsec functions that are individually cheap but all
add up under runParser.  runParser itself is simple a newtype accessor
so I don't really understand why it's credited with so much time.  But
there are no clear culprits... my parsers make much use of takeWhile
and skipWhile and combinators like <|> and 'many' only occur at the
level of complete terms, and are thus called much more rarely.

The greater allocation is pretty mysterious.  I wasn't able to track
it down via heap allocation, the biggest allocator by module that is a
parsing module isn't much of an allocator, it peaks at around 350k.
Intuition says it should be much less because of using packed Text,
but I suppose even the takeWhile combinators have to unpack every
character into a Char, so maybe it's even less efficient because at
least String can directly reuse the Chars?

Actually, I've thought about this problem with haskell libraries
before: I have a packed array which I then do a bsearch over.  The
bsearch generates lots of garbage.  I was originally confused but my
current guess is that every comparison winds up unpacking the array
element, wrapping it in the haskell data type, and then extracting the
(boxed) Int from that.  An efficient implementation would compare the
int in place... perhaps it must inline the comparison and use a 'peek'
specialized to just extract the desired int, and then hope that the
optimizer figures out how to pass it unboxed.

I'll try a few optimizations I can think of.  If those fail, I'll try
with ByteString, maybe it's a problem with attoparsec-text.  If that
fails, I'll give up for real and go back to Parsec 2, still the leader
in speed.

Just in case there's some useful criticism, here's one of the busier parsers:

p_unsigned_float :: A.Parser Double
p_unsigned_float = do
    i <- A.takeWhile Char.isDigit
    f <- A.option "" (A.char '.' >> A.takeWhile1 Char.isDigit)
    if (Text.null i && Text.null f) then mzero else do
    case (dec i, dec f) of
        (Just i', Just f') -> return $ fromIntegral i'
            + fromIntegral f' / fromIntegral (10 ^ (Text.length f))
        _ -> mzero
    where
    dec :: Text.Text -> Maybe Int
    dec s
        | Text.null s = Just 0
        | otherwise = case Text.Read.decimal s of
            Right (d, rest) | Text.null rest -> Just d
            _ -> Nothing