
marlowsd:
On 28/04/2009 17:25, Johannes Waldmann wrote:
Thanks for your comments.
Check whether it is GC-bound by using +RTS -sstderr.
Well yes, it does a lot of GC (there's no way for the compiler to optimize away the list of primes) because that was the point of the example: to confirm (or disprove) that GC hurts parallelism (at the moment).
INIT time 0.00s ( 0.00s elapsed) MUT time 13.23s ( 7.98s elapsed) GC time 14.12s ( 14.11s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 27.35s ( 22.09s elapsed)
%GC time 51.6% (63.9% elapsed)
Try a recent HEAD snapshot if you can, or wait for 6.12.1.
I did with 6.11.20090425 and it coredumps with +RTS -N2 (on x86_64)
That's worrying, but I don't see a core dump here. Here are my results:
GHC 6.11.20090429 -N1:
INIT time 0.00s ( 0.00s elapsed) MUT time 13.52s ( 13.64s elapsed) GC time 21.25s ( 21.23s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 34.76s ( 34.87s elapsed)
GHC 6.11.20090429 -N2 -qg0 -qb:
INIT time 0.00s ( 0.00s elapsed) MUT time 14.40s ( 7.21s elapsed) GC time 18.35s ( 9.22s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 32.75s ( 16.44s elapsed)
which, if I'm not mistaken, is super-linear speedup :-)
Don't forget the -qg0 -qb flags with HEAD, these flags usually give the best parallel GC performance at the moment. For the release this might be the default, I still have to do some more experiments.
I've added this interesting bit of info to the par perf. wiki: http://haskell.org/haskellwiki/Performance/Parallel