
On 25/04/2009 13:31, j.waldmann wrote:
Here is some more data. It seems the behaviour depends on 32/64 bit arch?
#######################################################
waldmann@master:~/tmp$ uname -a Linux master 2.6.18-6-amd64 #1 SMP Fri Dec 12 05:49:32 UTC 2008 x86_64 GNU/Linux
waldmann@master:~/tmp$ time ./Par +RTS -N1 496165411 496165411
real 0m22.580s user 0m22.541s sys 0m0.040s waldmann@master:~/tmp$ time ./Par +RTS -N2 496165411 496165411
real 0m21.259s user 0m26.678s sys 0m0.164s
########################################################
waldmann@box:~/tmp> uname -a Linux box 2.6.27.21-0.1-pae #1 SMP 2009-03-31 14:50:44 +0200 i686 i686 i386 GNU/Linux
waldmann@box:~/tmp> time ./Par +RTS -N1 496165411 496165411
real 0m29.802s user 0m29.670s sys 0m0.028s waldmann@box:~/tmp> time ./Par +RTS -N2 496165411 496165411
real 0m11.219s user 0m14.917s sys 0m0.164s
This is a very strange result: the user time should not *decrease*, but rather should stay the same or increase a bit when adding cores. If your program is GC-bound, then using a 32-bit build will improve performance, simply because it is shoveling half as much memory around. Check whether it is GC-bound by using +RTS -sstderr. Anyway, the current situation is that with GHC 6.10.2 there are a lot of performance quirks and bottlenecks with respect to parallel programs, some of which have been squashed in HEAD. Try a recent HEAD snapshot if you can, or wait for 6.12.1. Cheers, Simon