
Hi, list. This is a long post, sorry. The bottom line: specifying +RTS -Nn where n > number of cores slows the program compiled with 6.12.1 by orders of magnitude under Mac OSX. Turning off parallel garbage collections with -qg resolves the problem. Independent verification is appreciated. I have a very simple program: import Control.Parallel main=do let s = s1 `par` ( s2 `pseq` s1 + s2) s1 = sum [1..10000000] s2 = sum [1..10000005] print s compiled with ghc -O -threaded running it with without any options time ./clocktime produces the following times: real 0m3.779s user 0m3.644s sys 0m0.053s Giving -N option to RTS (or -N2, I have two core macbook) shows decent speedup: 100000060000015 real 0m2.678s user 0m4.762s sys 0m0.043s though increase in processor time by 30% seems suspicious. Things become weird if I increase the number of threads. I couldn't expect any further speedup for this task but what I get is absolutely unexpected: time ./clocktime +RTS -N3 100000060000015 real 0m57.071s user 1m40.910s sys 0m0.921s Almost a minute!! Adding more threads does not help either :) time ./clocktime +RTS -N4 100000060000015 real 1m48.281s user 3m10.832s sys 0m1.590s A bit of playing with Instruments/DTrace shows that runaway garbage collection puts the program on its knees. I the case of -N1 the top time consumers look like this: Self % Total % Self Run % Running % # Self # Samples ms Self Run ms Running Parent % Library Source Path Symbol Name 12.8 12.8 408 clocktime __gmpz_add 10.7 10.7 341 clocktime integer_cmm_plusIntegerzh 10 10 321 clocktime allocateLocal etc. This looks absolutely reasonable. Going to -N2 changes the picture to: Self % Total % Self Run % Running % # Self # Samples ms Self Run ms Running Parent % Library Source Path Symbol Name 15.6 15.6 989 clocktime gcWorkerThread 11.6 11.6 736 clocktime waitForGcThreads 9.5 9.5 603 clocktime scavenge_until_all_done 7.4 7.4 471 clocktime setContextSwitches 6.2 6.2 396 clocktime __gmpz_add -N3 Is awful: Self % Total % Self Run % Running % # Self # Samples ms Self Run ms Running Parent % Library Source Path Symbol Name 51.7 51.7 29514 clocktime gcWorkerThread 25.5 25.5 14591 clocktime scavenge_until_all_done 7.2 7.2 4114 clocktime waitForGcThreads 5.5 5.5 3180 clocktime GarbageCollect 5.2 5.2 2998 clocktime setContextSwitches 0.4 0.4 240 clocktime integer_cmm_plusIntegerzh Pavel