
On Thursday 24 December 2009 09:09:41 you wrote:
The mergesort example given here has a rather nasty problem – it creates sparks for mergeing the two lists at the same time as it creates the lists, so there's always at least one task blocking waiting for it's producers.
I see. So the sparks either contend for locks on the thunks to force that list or they repeat the same work.
This variation on a theme goes roughly twice as fast for me with -N1, and *does* get a speedup from -N<somethinghigher>:
mergesort [] = [] mergesort [x] = [x] mergesort [x,y] = if x < y then [x,y] else [y,x] mergesort xs = (forceList sleft) `par` (forceList sright) `pseq` merge sleft sright where (left,right) = split (length xs `div` 2) xs sleft = mergesort left sright = mergesort right
Note the `pseq` not `par` before the merge.
Can you explain why that helps?
I also added one more short-circuit case – we want to avoid spawning threads for tiny tasks like sorting 1 element lists.
This is something that concerns me. Lots of discussions of parallelism, including the Haskell literature I have read, neglect this critical problem of making sure that more time is spent doing useful work than spawning tasks (sparks). How is this solved in Haskell? Do you just put magic numbers in that work on the machine you're currently using? Also, how much work does a Haskell thread usually have to do to make it worth sparking?
Here's my results on a dual core running OS X:
LappyBob:~ tatd2$ time ./test +RTS -N1 -0.117109518058233
real 0m4.608s user 0m4.400s sys 0m0.189s LappyBob:~ tatd2$ time ./test +RTS -N2 -0.117109518058233
real 0m3.648s user 0m6.360s sys 0m0.220s LappyBob:~ tatd2$ time ./test +RTS -N3 -0.117109518058233
real 0m50.679s user 1m24.235s sys 0m0.620s
The last result is still off the wall, but it's a lot better.
That's another question: why did you get such awful performance for N=3 and I get similar results for N=8 on this 8 core? -- Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?e