Parsec3 performance issues (in relation to v2)

Just a heads up - i only have a month or so experience with Haskell, so alot of these issues may be my own fault. Anyway, the log file that i'm parsing uses English grammar, and the performance really dropped just by upgrading to Parsec3. I was hoping to use the ByteString support to boost the speed of already slow code, but had no such luck. It basicly went from "Ugh, this is kinda slow" to "Uhhhh i'm gonna go grab a burger and let this melt my CPU" haha. If anything, its probably all the look-ahead the rules have to do to get the context specific stuff right. Some of the code is here: http://hpaste.org/7578 ------------------------------------------------------------- #1 . Parsec 2: total time = 46.44 secs (2322 ticks @ 20 ms) total alloc = 16,376,179,008 bytes (excl. profiling overheads) Parse taking 51.3% time and 65.3% alloc. ------------------------------------------------------------- ------------------------------------------------------------- #2 . Parsec3 (4 times slower, no code changes): total time = 181.08 secs (9054 ticks @ 20 ms) total alloc = 46,002,859,656 bytes (excl. profiling overheads) Text.Parsec.Prim Taking 84.7% time and 86.0% alloc. ------------------------------------------------------------- ------------------------------------------------------------- #3 . Parsec3 but with the whole project converted to ByteString: (8 times slower): total time = 378.22 secs (18911 ticks @ 20 ms) total alloc = 100,051,417,672 bytes (excl. overheads) ------------------------------------------------------------- The third parse probably isn't a great indicator, since i reverted some rule-set optimizations that were causing errors. Plus i ended up packing the parsec String results to ByteStrings to fit in with everything else. I can post the full profiling info if anyone really cares.

Hi
Anyway, the log file that i'm parsing uses English grammar, and the performance really dropped just by upgrading to Parsec3. I was hoping to use the ByteString support to boost the speed of already slow code, but had no such luck. It basicly went from "Ugh, this is kinda slow" to "Uhhhh i'm gonna go grab a burger and let this melt my CPU" haha.
I think it is known that Parsec 3 is slower than Parsec 2, as a result of the increased generality. I know that in the past someone was working on it, but I am not sure if they ever got anywhere. Thanks Neil

"Neil Mitchell"
I think it is known that Parsec 3 is slower than Parsec 2, as a result of the increased generality. I know that in the past someone was working on it, but I am not sure if they ever got anywhere.
I got pretty good performance (IMHO - about 10MB/s, still CPU-bound) using a lazy bytestring tokenizer and Parsec on top of that. Of course, it probably depends on the complexity of the parsing... -k -- If I haven't seen further, it is by standing in the footprints of giants

On Tue, May 13, 2008 5:53 am, Neal Alexander wrote:
I can post the full profiling info if anyone really cares.
Any info is helpful. It's taking a while to get round to things, but the more relevant info we have to hand when we do the easier it is to improve things and the less begging for data we have to do! -- flippa@flippac.org I knew I forgot to pack something - thankfully it was my .sig

Philippa Cowderoy wrote:
On Tue, May 13, 2008 5:53 am, Neal Alexander wrote:
I can post the full profiling info if anyone really cares.
Any info is helpful. It's taking a while to get round to things, but the more relevant info we have to hand when we do the easier it is to improve things and the less begging for data we have to do!
I stripped the code down to just the parsec related stuff and retested it. http://72.167.145.184:8000/parsec_test/Parsec2.prof http://72.167.145.184:8000/parsec_test/Parsec3.prof And the parser with a 9mb (800 kb gziped) sample log file: http://72.167.145.184:8000/parsec_test.tar.gz

On Tue, May 13, 2008 at 9:23 PM, Neal Alexander
I stripped the code down to just the parsec related stuff and retested it.
http://72.167.145.184:8000/parsec_test/Parsec2.prof http://72.167.145.184:8000/parsec_test/Parsec3.prof
And the parser with a 9mb (800 kb gziped) sample log file: http://72.167.145.184:8000/parsec_test.tar.gz
So I've been picking at some ways to speed up Parsec 3. I haven't had much success at this benchmark, but one thing confused me: In my hacked-up version, when I change the monadic type from a "data" declaration to a "newtype" declaration, I get a significant slowdown. In the program posted by Neal, I go from ~43 s with "data" to about 1 minute with a "newtype". Is this expected? I don't really understand why adding an extra layer of indirection should speed things up. -Antoine

Hello Antoine, Wednesday, May 14, 2008, 8:43:47 AM, you wrote:
Is this expected? I don't really understand why adding an extra layer of indirection should speed things up.
adding laziness may improve performance by avoiding calculation of unnecessary stuff or moving into into later stage when it will be immediately consumed -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

On Tue, May 13, 2008 at 9:23 PM, Neal Alexander
I stripped the code down to just the parsec related stuff and retested it.
http://72.167.145.184:8000/parsec_test/Parsec2.prof http://72.167.145.184:8000/parsec_test/Parsec3.prof
And the parser with a 9mb (800 kb gziped) sample log file: http://72.167.145.184:8000/parsec_test.tar.gz
Neal, those two profiling results aren't really comparable, because your Parsec2 profiling doesn't include any cost-centers from the Parsec library - so all of the costs associated with Parsec2 will be assigned to cost-centers in EQ2Parse. I've tried running this profiling on my own computer, and using the same Cabal options for Parsec2 as Parsec3, I never seem to get CAFs from Parsec2 to show up in the profiling result this this test. Can anyone help me out? I configure parsec (2 and 3) as follows: $ runghc Setup.hs configure --prefix=${HOME}/usr --enable-library-profiling --user --enable-optimization and then I build the "Main" as follows: $ ghc --make -O2 -prof -auto Main -package parsec-2.1.0.0 - or - $ ghc --make -O2 -prof -auto Main And I run main as: $ ./Main +RTS -p -RTS When I specify the parsec-2.1.0.0 on the command-line, the Main.prof doesn't include any parsec CAFs. Thanks, Antoine

On Sun, May 18, 2008 at 11:23 AM, Antoine Latter
Neal, those two profiling results aren't really comparable, because your Parsec2 profiling doesn't include any cost-centers from the Parsec library - so all of the costs associated with Parsec2 will be assigned to cost-centers in EQ2Parse.
Self-reply: Or is it possible that Parsec2 is so much faster that it doesn't even show up in the .prof results? + The largest cost center in the Parsec2 results is the EQ2Parse function "until" + The largest cost center in the Parsec3 result is the Parsec function "manyTill", which (in the prof results) is only called by "until" in EQ2Parse. Which is consistent with my previous reasoning, at least. Antoine

Antoine Latter wrote:
On Tue, May 13, 2008 at 9:23 PM, Neal Alexander
wrote: I stripped the code down to just the parsec related stuff and retested it.
http://72.167.145.184:8000/parsec_test/Parsec2.prof http://72.167.145.184:8000/parsec_test/Parsec3.prof
And the parser with a 9mb (800 kb gziped) sample log file: http://72.167.145.184:8000/parsec_test.tar.gz
Neal, those two profiling results aren't really comparable, because your Parsec2 profiling doesn't include any cost-centers from the Parsec library - so all of the costs associated with Parsec2 will be assigned to cost-centers in EQ2Parse.
I've tried running this profiling on my own computer, and using the same Cabal options for Parsec2 as Parsec3, I never seem to get CAFs from Parsec2 to show up in the profiling result this this test.
Can anyone help me out?
I configure parsec (2 and 3) as follows: $ runghc Setup.hs configure --prefix=${HOME}/usr --enable-library-profiling --user --enable-optimization
and then I build the "Main" as follows: $ ghc --make -O2 -prof -auto Main -package parsec-2.1.0.0 - or - $ ghc --make -O2 -prof -auto Main
And I run main as: $ ./Main +RTS -p -RTS
When I specify the parsec-2.1.0.0 on the command-line, the Main.prof doesn't include any parsec CAFs.
Thanks, Antoine
Sorry about that. Hopefully this fixes it: http://72.167.145.184:8000/parsec_test/Parsec2_02.prof

I also redid the profiling for Parsec3 using ByteStrings directly (Its slower than manually unpacking and feeding it a [Char]): http://72.167.145.184:8000/parsec_test/Parsec3_BStr.prof The code for EQ2Parse.hs is identical, aside from changing the type signature of "init" to use the ByteString ParsecT, and removing the line unpack.
participants (6)
-
Antoine Latter
-
Bulat Ziganshin
-
Ketil Malde
-
Neal Alexander
-
Neil Mitchell
-
Philippa Cowderoy