Control.Parallel.Strategies.parMap CPU usage

Greetings, when using parMap (or parList and demanding) I see a curious pattern in CPU usage. Running "parMap rnf fib [1..100]" gives the following pattern of used CPUs: 4,3,2,1,4,3,2,1,... The fib function requires roughly two times the time if we go from fib(n) to fib(n+1), meaning that calculating the next element in the list always takes longer than the current. What I would like is a version of parMap that directly takes a free CPU and lets it calculate the next result, giving the usage pattern 4,4,4,4,... Below you find the simple Haskell program, which gives these results, please compile with "ghc --make -threaded -O2 Para.hs" and run on a machine with at least two cores and "./Para +RTS -N2" or better. I am not filing a bug yet as I would prefer to be told that I did it wrong and here is a better way: ... Thanks, Christian (Please assume that later on, "fib" will be replaced by something meaningful ;) # ghc --version # The Glorious Glasgow Haskell Compilation System, version 6.10.1 module Main where import Control.Parallel.Strategies -- parallel computation of fibonacci numbers in slow fib :: Int -> Int fib n | n < 1 = error "n < 1" | n == 1 = 1 | n == 2 = 1 | otherwise = fib (n-1) + fib(n-2) fibs = parMap rnf fib $ [1..100] -- fibs = let fs = map fib $ [1..100] in fs `demanding` (parList rnf fs) main = do mapM_ (putStrLn . show) $ zip [1..] fibs

Christian Hoener zu Siederdissen wrote:
when using parMap (or parList and demanding) I see a curious pattern in CPU usage. Running "parMap rnf fib [1..100]" gives the following pattern of used CPUs: 4,3,2,1,4,3,2,1,...
How did you find out which CPU is being used?
The fib function requires roughly two times the time if we go from fib(n) to fib(n+1), meaning that calculating the next element in the list always takes longer than the current. What I would like is a version of parMap that directly takes a free CPU and lets it calculate the next result, giving the usage pattern 4,4,4,4,...
In GHC you don't have any control over which CPU is used to execute a spark. We use dynamic load-balancing, which means the work distribution is essentially random, and will change from run to run. If you want more explicit control over your work distribution, try using GHC.Conc.forkOnIO. Also note that the implementation of much of this stuff is changing rapidly, so you might want to try a recent snapshot. Take a look at our paper, if you haven't already: http://www.haskell.org/~simonmar/papers/multicore-ghc.pdf Cheers, Simon

Simon Marlow wrote:
Christian Hoener zu Siederdissen wrote:
when using parMap (or parList and demanding) I see a curious pattern in CPU usage. Running "parMap rnf fib [1..100]" gives the following pattern of used CPUs: 4,3,2,1,4,3,2,1,...
How did you find out which CPU is being used?
Sorry for the misunderstanding, the "pattern of used CPUs" is the _counted_number_ of active cores! That means that I am cycling through 4 to 1 active CPUs while there definitively is work that could be done by a core. Essentially, parMap seems to divide the list of thunks into blocks of 4 (or n in -Nn) and finishes each block before going to the next block. This is easy to see by running the program and watching the number of active threads in htop / top.
The fib function requires roughly two times the time if we go from fib(n) to fib(n+1), meaning that calculating the next element in the list always takes longer than the current. What I would like is a version of parMap that directly takes a free CPU and lets it calculate the next result, giving the usage pattern 4,4,4,4,...
In GHC you don't have any control over which CPU is used to execute a spark. We use dynamic load-balancing, which means the work distribution is essentially random, and will change from run to run.
If you want more explicit control over your work distribution, try using GHC.Conc.forkOnIO.
Also note that the implementation of much of this stuff is changing rapidly, so you might want to try a recent snapshot. Take a look at our paper, if you haven't already:
http://www.haskell.org/~simonmar/papers/multicore-ghc.pdf
Cheers, Simon
Hopefully I will find the time to try the latest head and see if the idle-pattern (better name?) persists. Gruss, Christian

Hi, having tried the 6.10.2rc1 release candidate, I still find that "parMap rnf xs" on a list of thunks xs does not optimally use all available processors. With N the number of cores, I still see that each block of N thunks (say: x_1 and x_2) has to be calculated before (x3 and x4) will be started. Would there be hope that compiling the latest head instead of 2009/03/14 (rc1) gives better results? Note that each x_(k+1) is computationally more demanding than x_k. Gruss, Christian Simon Marlow wrote:
Christian Hoener zu Siederdissen wrote:
when using parMap (or parList and demanding) I see a curious pattern in CPU usage. Running "parMap rnf fib [1..100]" gives the following pattern of used CPUs: 4,3,2,1,4,3,2,1,...
How did you find out which CPU is being used?
The fib function requires roughly two times the time if we go from fib(n) to fib(n+1), meaning that calculating the next element in the list always takes longer than the current. What I would like is a version of parMap that directly takes a free CPU and lets it calculate the next result, giving the usage pattern 4,4,4,4,...
In GHC you don't have any control over which CPU is used to execute a spark. We use dynamic load-balancing, which means the work distribution is essentially random, and will change from run to run.
If you want more explicit control over your work distribution, try using GHC.Conc.forkOnIO.
Also note that the implementation of much of this stuff is changing rapidly, so you might want to try a recent snapshot. Take a look at our paper, if you haven't already:
http://www.haskell.org/~simonmar/papers/multicore-ghc.pdf
Cheers, Simon

choener:
Hi,
having tried the 6.10.2rc1 release candidate, I still find that "parMap rnf xs" on a list of thunks xs does not optimally use all available processors. With N the number of cores, I still see that each block of N thunks (say: x_1 and x_2) has to be calculated before (x3 and x4) will be started.
Would there be hope that compiling the latest head instead of 2009/03/14 (rc1) gives better results?
Yes, definitely. The HEAD implements all the `par` improvements described in the recent "multicore runtime" paper, as well as giving detailed runtime statistics on spark use.

Hi,
thank you very much Simon & Don, for the answers.
The latest head gives great results on parallel programs. All cores are
now always at work as I hoped for. So, too, thanks to everybody involved
in the multicore improvements -- they should come very handy. :-)
Thanks again,
Christian
* Don Stewart
choener:
Hi,
having tried the 6.10.2rc1 release candidate, I still find that "parMap rnf xs" on a list of thunks xs does not optimally use all available processors. With N the number of cores, I still see that each block of N thunks (say: x_1 and x_2) has to be calculated before (x3 and x4) will be started.
Would there be hope that compiling the latest head instead of 2009/03/14 (rc1) gives better results?
Yes, definitely. The HEAD implements all the `par` improvements described in the recent "multicore runtime" paper, as well as giving detailed runtime statistics on spark use.
participants (4)
-
Christian Hoener zu Siederdissen
-
Christian Höner zu Siederdissen
-
Don Stewart
-
Simon Marlow