
Hi, having tried the 6.10.2rc1 release candidate, I still find that "parMap rnf xs" on a list of thunks xs does not optimally use all available processors. With N the number of cores, I still see that each block of N thunks (say: x_1 and x_2) has to be calculated before (x3 and x4) will be started. Would there be hope that compiling the latest head instead of 2009/03/14 (rc1) gives better results? Note that each x_(k+1) is computationally more demanding than x_k. Gruss, Christian Simon Marlow wrote:
Christian Hoener zu Siederdissen wrote:
when using parMap (or parList and demanding) I see a curious pattern in CPU usage. Running "parMap rnf fib [1..100]" gives the following pattern of used CPUs: 4,3,2,1,4,3,2,1,...
How did you find out which CPU is being used?
The fib function requires roughly two times the time if we go from fib(n) to fib(n+1), meaning that calculating the next element in the list always takes longer than the current. What I would like is a version of parMap that directly takes a free CPU and lets it calculate the next result, giving the usage pattern 4,4,4,4,...
In GHC you don't have any control over which CPU is used to execute a spark. We use dynamic load-balancing, which means the work distribution is essentially random, and will change from run to run.
If you want more explicit control over your work distribution, try using GHC.Conc.forkOnIO.
Also note that the implementation of much of this stuff is changing rapidly, so you might want to try a recent snapshot. Take a look at our paper, if you haven't already:
http://www.haskell.org/~simonmar/papers/multicore-ghc.pdf
Cheers, Simon