
Yeah, I know this has been discussed a number of times, but I have some concrete questions I haven't seen asked before. And the "parsec 3 is now as fast as parsec 2" thing I've seen around doesn't seem to be true for me. I have an app that does a lot of parsing of small expressions. It's currently parsec2 operating on lots of little Texts (after an unpack, of course). A few parsing functions show up near the top of the profile output, so I thought an obvious improvement would be to parse Text directly and avoid the overhead and garbage of unpacking. Since parsec3 is now supposed to be as fast as parsec2 I thought I would give it a try. Parsec3 is 3.1.0, parsec 2 is 2.1.0.1: parsec2, String: total time = 10.66 secs (533 ticks @ 20 ms) total alloc = 2,340,113,404 bytes (excludes profiling overheads) parsec3, String: (this is just after upgrading the library and editing it to fix breakage from Parser being a type alias now) total time = 13.76 secs (688 ticks @ 20 ms) total alloc = 2,706,625,256 bytes (excludes profiling overheads) parsec3, Text: (wrote a Text instance similar to the one for ByteString, updated imports, no longer unpacking to String) total time = 15.96 secs (798 ticks @ 20 ms) total alloc = 3,338,005,896 bytes (excludes profiling overheads) This is not very encouraging! Especially strange is how Text generates *more* allocation... I'd expect less since it doesn't unpack all the Texts. The parsing functions are no longer at the top of the profile, but there are new 'unParser' and 'parsecMap' and 'parserBind' up at or near the top. 'unParser' just looks like it's unwrapping the Parsec newtype, so I don't fully understand how it's the most expensive, but it's called on every bind so it does get called a lot. There are no obvious super expensive ones, just lots and lots of them that add up. Parsec 3's unParser covers up the parsing function I wrote, so it's now hard to tell what the expensive parsing function actually is. I've seen a few remarks that you can't just throw together parsers and expect them to be fast, you have to profile them, but nothing on how to actually interpret the results of profiling. For instance, here's one of the main expensive parsers: p_unsigned_float :: P.CharParser st Double p_unsigned_float = do i <- P.many P.digit f <- P.option "" (P.char '.' >> P.many1 P.digit) if (null i && null f) then P.pzero else do let int = List.foldl' (\total c -> 10 * total + fromIntegral (Char.digitToInt c)) 0 i frac = foldr (\c total -> (total + fromIntegral (Char.digitToInt c)) / 10) 0 f return (int + frac) There's an obvious problem where I get the digits as a String and then parse that with list functions, but I can't see any way to get parsec to return a chunk of Text. This is roughly how parsec itself parses numbers, in Text.Parsec.Token. So, my current options are either figure out some way to speed up parsec3+Text, revert to parsec2+String and give up, or try an entirely different parsing library. I've heard attoparsec is fast but I'd have to switch to utf8 bytestring which is a big change, and Text seems like the more correct choice anyway. Any ideas or experience?

On Thu, Dec 23, 2010 at 1:01 AM, Evan Laforge
So, my current options are either figure out some way to speed up parsec3+Text, revert to parsec2+String and give up, or try an entirely different parsing library. I've heard attoparsec is fast but I'd have to switch to utf8 bytestring which is a big change, and Text seems like the more correct choice anyway.
Any ideas or experience?
Michael Snoyman wants attoparsec-text as well [1]. [1] http://docs.yesodweb.com/blog/wishlist/ -- Felipe.

On Thu, Dec 23, 2010 at 3:03 PM, Felipe Almeida Lessa
Michael Snoyman wants attoparsec-text as well [1].
It's on my Christmas wishlist too. Johan

On Thu, Dec 23, 2010 at 6:21 PM, Johan Tibell
On Thu, Dec 23, 2010 at 3:03 PM, Felipe Almeida Lessa
wrote: Michael Snoyman wants attoparsec-text as well [1].
It's on my Christmas wishlist too.
Johan
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Since I'm sure everyone is thinking it at this point, I'll just say it: we're all hoping Bryan O'Sullivan saves the day again and writes this package. He wrote both attoparsec *and* text, so if he writes attoparsec-text, it will just be double the awesomeness. So Bryan, please do tell: how many beers (or any other consumable) will it take to get you to write it? I'll start up the collection fund, and throw in a six pack ;). Michael

On Thu, 2010-12-23 at 18:38 +0200, Michael Snoyman wrote:
On Thu, Dec 23, 2010 at 6:21 PM, Johan Tibell
wrote: On Thu, Dec 23, 2010 at 3:03 PM, Felipe Almeida Lessa
wrote: Michael Snoyman wants attoparsec-text as well [1].
It's on my Christmas wishlist too.
Johan
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Since I'm sure everyone is thinking it at this point, I'll just say it: we're all hoping Bryan O'Sullivan saves the day again and writes this package. He wrote both attoparsec *and* text, so if he writes attoparsec-text, it will just be double the awesomeness. So Bryan, please do tell: how many beers (or any other consumable) will it take to get you to write it? I'll start up the collection fund, and throw in a six pack ;).
Michael
I may be wrong but the attoparsec/attoparsec-text would be operating on the same principles. Maybe using typeclass like Data.ListLike would be solution? I'd not quite sure how much would it slow down but it should be possible. More as proof of concept reimplementation of string parser (for real life probably needs INLINE and SPECIALISE):
import Control.Applicative import Control.Monad import Data.Monoid import Data.ListLike as LL
data Result i r = Fail !i [String] String | Partial (i -> Result i r) | Done !i r
newtype Parser i a = Parser { runParser :: forall r. S i -> Failure i r -> Success i a r -> Result i r }
type Failure i r = S i -> [String] -> String -> Result i r type Success i a r = S i -> a -> Result i r
data More = Complete | Incomplete deriving (Eq, Show)
instance Monoid More where mempty = Incomplete mappend Complete _ = Complete mappend _ Complete = Complete mappend _ _ = Incomplete
data S i = S { input :: !i, _added :: !i, more :: !More }
instance Functor (Parser i) where fmap p m = Parser (\st0 f k -> runParser m st0 f (\s a -> k s (p a)))
instance Applicative (Parser i) where pure x = Parser (\st0 _ ks -> ks st0 x) (<*>) = ap
instance Monad (Parser i) where return = pure m >>= g = Parser (\st0 kf ks -> runParser m st0 kf (\s a -> runParser (g a) s kf ks)) fail err = Parser (\st0 kf _ -> kf st0 [] err)
string :: (Eq full, LL.ListLike full item) => full -> Parser full full string s = takeWith (LL.length s) (== s)
takeWith :: (LL.ListLike full item) => Int -> (full -> Bool) -> Parser full full takeWith n p = do ensure n s <- get let (h, t) = LL.splitAt n s if p h then put t >> return h else fail "takeWith"
ensure :: (LL.ListLike full item) => Int -> Parser full () ensure n = Parser $ \st0@(S s0 _a0 _c0) kf ks -> if LL.length s0 >= n then ks st0 () else runParser (demandInput >> ensure n) st0 kf ks
prompt :: LL.ListLike i ii => S i -> (S i -> Result i r) -> (S i -> Result i r) -> Result i r prompt (S s0 a0 _) kf ks = Partial $ \s -> if LL.null s then kf $! S s0 a0 Complete else ks $! S (s0 `mappend` s) (a0 `mappend` s) Incomplete
demandInput :: (LL.ListLike full item) => Parser full () demandInput = Parser $ \st0 kf ks -> if more st0 == Complete then kf st0 ["demandInput"] "not enough bytes" else prompt st0 (\st -> kf st ["demandInput"] "not enough bytes") (`ks` ())
get :: Parser full full get = Parser (\st0 _ ks -> ks st0 (input st0))
put :: full -> Parser full () put s = Parser (\(S _ a0 c0) _ ks -> ks (S s a0 c0) ())

On Thu, Dec 23, 2010 at 12:01 PM, Evan Laforge
Yeah, I know this has been discussed a number of times, but I have some concrete questions I haven't seen asked before. And the "parsec 3 is now as fast as parsec 2" thing I've seen around doesn't seem to be true for me.
[ snip responses ] So it sounds like the consensus is to bite the bullet and try converting to ByteString + attoparsec and see if that helps. Or write attoparsec-text myself, or wait for someone else to do it :) I might not get around to this real soon, but I'll post my results when (if) I do. Thanks for the responses!

Hello!
On Mon, Dec 27, 2010 at 9:41 AM, Evan Laforge
So it sounds like the consensus is to bite the bullet and try converting to ByteString + attoparsec and see if that helps. Or write attoparsec-text myself, or wait for someone else to do it :)
I might not get around to this real soon, but I'll post my results when (if) I do.
I've uploaded attoparsec-text and attoparsec-text-enumerator to Hackage. I've written those packages late last week and asked for comments from attoparsec and attoparsec-enumerator's maintainers. Although both packages weren't stress tested (actually they were very lightly tested), I'm releasing them so that we don't waste efforts duplicating work =). I'll make an announcement later today if no critical bugs are found. Please test it and try to break it =). Bonus points if we give us some numbers about how it compares to your Parsec 2/3 approach. Links: http://hackage.haskell.org/package/attoparsec-text http://hackage.haskell.org/package/attoparsec-text-enumerator Cheers! =D -- Felipe.

I've uploaded attoparsec-text and attoparsec-text-enumerator to Hackage. I've written those packages late last week and asked for
Very nice! I'll download this and try it out. Attoparsec has a bit different combinators than parsec so it'll take some rewriting, but it's work I'd have to do anyway to try the bytestring+attoparsec approach.

On Mon, Dec 27, 2010 at 6:51 AM, Evan Laforge
I've uploaded attoparsec-text and attoparsec-text-enumerator to Hackage. I've written those packages late last week and asked for
Very nice! I'll download this and try it out. Attoparsec has a bit different combinators than parsec so it'll take some rewriting, but it's work I'd have to do anyway to try the bytestring+attoparsec approach.
Well, I tried it... and it's still slower! parsec2, String: (a little faster since last time since I have new computer) total time = 9.10 secs (455 ticks @ 20 ms) total alloc = 2,295,837,512 bytes (excludes profiling overheads) attoparsec-text, Data.Text: total time = 14.72 secs (736 ticks @ 20 ms) total alloc = 2,797,672,844 bytes (excludes profiling overheads) Top consumer in the profile is now Data.Attoparsec.Text.Internal.runParser, followed, several entries later, by bindP, addS, and mysteriously >. Suspicious that parsec was compiled without profiling and hence not incurring profiling overhead since parsec never appears in the profile, I tried running without any profiling flags, but the numbers come about about the same, I guess the prof output has already subtracted profiling overhead. The attoparsec profile output is hard to interpret, it's a huge tree of internal attoparsec functions that are individually cheap but all add up under runParser. runParser itself is simple a newtype accessor so I don't really understand why it's credited with so much time. But there are no clear culprits... my parsers make much use of takeWhile and skipWhile and combinators like <|> and 'many' only occur at the level of complete terms, and are thus called much more rarely. The greater allocation is pretty mysterious. I wasn't able to track it down via heap allocation, the biggest allocator by module that is a parsing module isn't much of an allocator, it peaks at around 350k. Intuition says it should be much less because of using packed Text, but I suppose even the takeWhile combinators have to unpack every character into a Char, so maybe it's even less efficient because at least String can directly reuse the Chars? Actually, I've thought about this problem with haskell libraries before: I have a packed array which I then do a bsearch over. The bsearch generates lots of garbage. I was originally confused but my current guess is that every comparison winds up unpacking the array element, wrapping it in the haskell data type, and then extracting the (boxed) Int from that. An efficient implementation would compare the int in place... perhaps it must inline the comparison and use a 'peek' specialized to just extract the desired int, and then hope that the optimizer figures out how to pass it unboxed. I'll try a few optimizations I can think of. If those fail, I'll try with ByteString, maybe it's a problem with attoparsec-text. If that fails, I'll give up for real and go back to Parsec 2, still the leader in speed. Just in case there's some useful criticism, here's one of the busier parsers: p_unsigned_float :: A.Parser Double p_unsigned_float = do i <- A.takeWhile Char.isDigit f <- A.option "" (A.char '.' >> A.takeWhile1 Char.isDigit) if (Text.null i && Text.null f) then mzero else do case (dec i, dec f) of (Just i', Just f') -> return $ fromIntegral i' + fromIntegral f' / fromIntegral (10 ^ (Text.length f)) _ -> mzero where dec :: Text.Text -> Maybe Int dec s | Text.null s = Just 0 | otherwise = case Text.Read.decimal s of Right (d, rest) | Text.null rest -> Just d _ -> Nothing

On Wed, 2011-01-12 at 18:15 -0800, Evan Laforge wrote:
On Mon, Dec 27, 2010 at 6:51 AM, Evan Laforge
wrote: I've uploaded attoparsec-text and attoparsec-text-enumerator to Hackage. I've written those packages late last week and asked for
Very nice! I'll download this and try it out. Attoparsec has a bit different combinators than parsec so it'll take some rewriting, but it's work I'd have to do anyway to try the bytestring+attoparsec approach.
Well, I tried it... and it's still slower!
parsec2, String: (a little faster since last time since I have new computer) total time = 9.10 secs (455 ticks @ 20 ms) total alloc = 2,295,837,512 bytes (excludes profiling overheads)
attoparsec-text, Data.Text: total time = 14.72 secs (736 ticks @ 20 ms) total alloc = 2,797,672,844 bytes (excludes profiling overheads)
Sorry for asking but just for reference - what is performance of nanoparsec on your machine in this test? Regards

On Thu, Jan 13, 2011 at 12:15 AM, Evan Laforge
Well, I tried it... and it's still slower!
parsec2, String: (a little faster since last time since I have new computer) total time = 9.10 secs (455 ticks @ 20 ms) total alloc = 2,295,837,512 bytes (excludes profiling overheads)
attoparsec-text, Data.Text: total time = 14.72 secs (736 ticks @ 20 ms) total alloc = 2,797,672,844 bytes (excludes profiling overheads)
Interesting.
Just in case there's some useful criticism, here's one of the busier parsers:
p_unsigned_float :: A.Parser Double p_unsigned_float = do i <- A.takeWhile Char.isDigit f <- A.option "" (A.char '.' >> A.takeWhile1 Char.isDigit) if (Text.null i && Text.null f) then mzero else do case (dec i, dec f) of (Just i', Just f') -> return $ fromIntegral i' + fromIntegral f' / fromIntegral (10 ^ (Text.length f)) _ -> mzero where dec :: Text.Text -> Maybe Int dec s | Text.null s = Just 0 | otherwise = case Text.Read.decimal s of Right (d, rest) | Text.null rest -> Just d _ -> Nothing
I've tried creating a benchmark using this code. It's on the recently created attoparsec-text darcs repo [1,2]. There is a 2.7 MiB test file with many numbers to be parsed. The attoparsec-text package was installed using -O (Cabal's default) and the test program was compiled with ghc -hide-package parsec-3.1.0 --make -O2. Using parsers that return the parsed number as a double and then sum everything up, I get the following timings: attoparsec_text_builtin 2,241,038,864 bytes allocated in the heap 46 MB total memory in use (1 MB lost due to fragmentation) MUT time 1.10s ( 1.13s elapsed) GC time 0.15s ( 0.20s elapsed) Total time 1.25s ( 1.32s elapsed) attoparsec_text_laforge 1,281,603,768 bytes allocated in the heap 101 MB total memory in use (2 MB lost due to fragmentation) MUT time 0.58s ( 0.62s elapsed) GC time 0.47s ( 0.54s elapsed) Total time 1.05s ( 1.16s elapsed) parsec_laforge 1,558,621,208 bytes allocated in the heap 47 MB total memory in use (0 MB lost due to fragmentation) MUT time 0.82s ( 0.84s elapsed) GC time 0.46s ( 0.51s elapsed) Total time 1.27s ( 1.35s elapsed) 'attoparsec_text_builtin' uses Data.Attoparsec.Text.double available on the darcs version of the library. It tries to handle more cases, like exponents, and thus it is expected to be slower than your version. 'attoparsec_text_laforge' and 'parsec_laforge' are very similar to the one you gave in your e-mail, but with some modifications (e.g. Text.Read.decimal can't be used with Strings). Using attoparsec-text is faster and allocates less, but for some reason the faster version takes up a lot more memory. As the total memory figures were strange, I created a different version that parses the input but does not create any Doubles. Instead of summing them, the number of Doubles (if they were parsed) is counted. These are the results: attoparsec_text_laforge_discarding 985,843,696 bytes allocated in the heap 25 MB total memory in use (0 MB lost due to fragmentation) MUT time 0.38s ( 0.39s elapsed) GC time 0.07s ( 0.10s elapsed) Total time 0.45s ( 0.49s elapsed) parsec_laforge_discarding double_test.txt +RTS -s 1,471,829,664 bytes allocated in the heap 28 MB total memory in use (0 MB lost due to fragmentation) MUT time 0.66s ( 0.68s elapsed) GC time 0.44s ( 0.46s elapsed) Total time 1.10s ( 1.14s elapsed) Now attoparsec-text is more than twice faster, allocates even less memory and the total memory figures seem right. Bottom line: I think this benchmark doesn't really represent the kind of workload your parser has. Can you reproduce these results on your system? Cheers! =) [1] http://patch-tag.com/r/felipe/attoparsec-text/ [2] http://patch-tag.com/r/felipe/attoparsec-text/snapshot/current/content/prett... -- Felipe.

Now attoparsec-text is more than twice faster, allocates even less memory and the total memory figures seem right.
Bottom line: I think this benchmark doesn't really represent the kind of workload your parser has. Can you reproduce these results on your system?
I spent quite a bit of time trying to reduce this down to a minimal reproduction and getting confusing results. Then I found out that compiling with profiling enabled makes attoparsec slow and parsec fast. When I compile without any profiling, here's what I get, in CPU time: parsec run 1000000 - time: 1.22s - atto bs run 1000000 - time: 0.38s - atto text run 1000000 - time: 0.78s - This looks more like I expect it to. I don't understand the parsec thing... one of the first things I did was recompile and reinstall parsec2, making sure to pass -p to configure, and verify that there is a /usr/local/lib/parsec-2.1.0.1/ghc-6.12.3/libHSparsec-2.1.0.1_p.a. However, on closer inspection, I believe I've found the culprit. Compiling with 'build -v' for attoparsec reveals a ghc cmdline line: '-prof -hisuf p_hi -osuf p_o -auto-all'. Compiling parsec has: '-prof -hisuf p_hi -osuf p_o'. And indeed, attoparsec cabal has 'ghc-prof-options: -auto-all', which parsec's cabal does not. And in fact, parsec3 also has this -auto-all, which both explains why the profile is full of internal functions and why parsec3 was so much slower than parsec2. I'm glad to have finally tracked this down, but unhappy that I spent so much time on it. It seems like a trap waiting to be sprung if various libraries are compiled with their individually specified flags, which have major effects on performance. Maybe I should have noticed, but it seems pretty subtle to me. GHC will refuse to compile non-profiling libs against a profiling build, but doesn't go down to the level of flags. I think my short term solution is going to be remove -auto-all from attoparsec's cabal---I'm not profiling attoparsec and so I don't want my entire profile output to be internal attoparsec functions. But presumably the flag was added there for a reason, so maybe there are people who really want that. Is there a better solution? GHC warns when linking a profiling lib compiled with different profiling flags? A separate .p_auto-all_o suffix? Removal of ghc-prof-options from cabal? A consensus to standardize on a set of flags? BTW, yes my situation is a little different from your test. It's lots and lots of little expressions for a simple language in an in-memory structure that get parsed individually. So I don't care about file reading speed, but I do care about parser startup overhead, since it's lots and lots of little parses. The numbers above are how long it takes to parse "2.34" 1m times.

On Fri, Jan 14, 2011 at 5:54 PM, Evan Laforge
Then I found out that
compiling with profiling enabled makes attoparsec slow and parsec fast.
Yes, the SCC annotations added by GHC have a fairly high cost. I think my short term solution is going to be remove -auto-all from
attoparsec's cabal---I'm not profiling attoparsec and so I don't want my entire profile output to be internal attoparsec functions. But presumably the flag was added there for a reason, so maybe there are people who really want that.
Yes - me :-) I typically turn on profiling for most of my libraries while I think of them as "under development", a period of indefinite length that comes to an end when I deem the performance good enough. None of my libraries has actually hit that point yet :-) This isn't completely without basis. For instance, I made some big speed improvements to attoparsec's very performance-sensitive takeWhile function just the other day, thanks to -auto-all. I might, though, see if there's a way I could enable that flag only for myself (in a way that I wouldn't routinely forget).

This isn't completely without basis. For instance, I made some big speed improvements to attoparsec's very performance-sensitive takeWhile function just the other day, thanks to -auto-all.
I might, though, see if there's a way I could enable that flag only for myself (in a way that I wouldn't routinely forget).
See ~/.cabal/config - I use that to make sure all my packages are installed globally with profiling, and I think it might have enough options to force -auto-all in some way. Thanks, Neil

I think my short term solution is going to be remove -auto-all from attoparsec's cabal---I'm not profiling attoparsec and so I don't want my entire profile output to be internal attoparsec functions. But presumably the flag was added there for a reason, so maybe there are people who really want that.
Yes - me :-)
Yes, certainly it makes sense for someone profiling the library itself, or if I thought your library was too slow and wanted to send a profile in a bug report. But I don't think it makes so much sense for plain users, especially when it's applied only to some libraries. However, this seems to be a fairly common practice... at least parsec3 and text both do this, which is why I initially thought parsec3 was so much slower than parsec2, and attoparsec-text was doubly slow.
I might, though, see if there's a way I could enable that flag only for myself (in a way that I wouldn't routinely forget).
This sounds like a good idea to me.
participants (7)
-
Bryan O'Sullivan
-
Evan Laforge
-
Felipe Almeida Lessa
-
Johan Tibell
-
Maciej Piechotka
-
Michael Snoyman
-
Neil Mitchell