GHC threaded runtimes and pure functions

One of the things I liked about Haskell was the notion of pure functions and the fact that they can be, in theory, automatically parallelized on multicore hardware. I think this will become a huge deal in a few years as cores multiply. My question is simply this: under GHC is this what really happens with, say a mapping over a pure function. Yes, I compiled with --threaded and am using the +RTS -N2 options on my dual core machine. Here's the code I wrote as a speed test. It just doesn't seem any faster with -N2. Using the ps command I found that multiple threads are indeed launched (this is Linux) but all but one show as being in a state of waiting for some event to finish (the ps output flags them all 'Sl'. main = do rg <- getStdGen let rs = take 10000000 $ randomRs (1::Int,100000::Int) rg rs'= map (\n -> n*n) rs print rs'

Hello Gregory, Wednesday, September 16, 2009, 5:17:01 PM, you wrote: no. additional threads are launched for i/o system and, as you requested by -N2 for haskell workload. but ghc don't auto-parallelize your code. it's a bit too hard, since making too much threads (e.g. one for every addition) will make bookkeeping too heavy and it's impossible to automatically deduce how much operations each computation will require. instead, you are provided with 'par' primitive to show compiler explicitly what parts to run in parallel
One of the things I liked about Haskell was the notion of pure functions and the fact that they can be, in theory, automatically parallelized on multicore hardware. I think this will become a huge deal in a few years as cores multiply. My question is simply this: under GHC is this what really happens with, say a mapping over a pure function. Yes, I compiled with --threaded and am using the +RTS -N2 options on my dual core machine. Here's the code I wrote as a speed test. It just doesn't seem any faster with -N2. Using the ps command I found that multiple threads are indeed launched (this is Linux) but all but one show as being in a state of waiting for some event to finish (the ps output flags them all 'Sl'.
main = do rg <- getStdGen let rs = take 10000000 $ randomRs (1::Int,100000::Int) rg rs'= map (\n -> n*n) rs print rs'
-- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

That makes sense. So maybe I should split my mapping into two parallel ones or however many CPUs there are using par.
--- On Wed, 9/16/09, Bulat Ziganshin
One of the things I liked about Haskell was the notion of pure functions and the fact that they can be, in theory, automatically parallelized on multicore hardware. I think this will become a huge deal in a few years as cores multiply. My question is simply this: under GHC is this what really happens with, say a mapping over a pure function. Yes, I compiled with --threaded and am using the +RTS -N2 options on my dual core machine. Here's the code I wrote as a speed test. It just doesn't seem any faster with -N2. Using the ps command I found that multiple threads are indeed launched (this is Linux) but all but one show as being in a state of waiting for some event to finish (the ps output flags them all 'Sl'.
main = do rg <- getStdGen let rs = take 10000000 $ randomRs (1::Int,100000::Int) rg rs'= map (\n -> n*n) rs print rs'
-- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Gregory Propf wrote:
That makes sense. So maybe I should split my mapping into two parallel ones or however many CPUs there are using par.
If you're going to use par, it doesn't really matter how many sparks you create. You just need to avoid creating millions of really tiny sparks. You could create, say, eight and let GHC figure out the rest itself...

Hello Andrew, Wednesday, September 16, 2009, 11:31:22 PM, you wrote:
That makes sense. So maybe I should split my mapping into two parallel ones or however many CPUs there are using par.
If you're going to use par, it doesn't really matter how many sparks you create. You just need to avoid creating millions of really tiny sparks. You could create, say, eight and let GHC figure out the rest itself...
since these are green threads, 1 millisecond sparks should be acceptable and may be even 1 microsecod too. afair, overhead expenses was significantly reduced in ghc 6.12, soon to be released -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Bulat Ziganshin wrote:
Hello Andrew,
Wednesday, September 16, 2009, 11:31:22 PM, you wrote:
If you're going to use par, it doesn't really matter how many sparks you create. You just need to avoid creating millions of really tiny sparks. You could create, say, eight and let GHC figure out the rest itself...
since these are green threads, 1 millisecond sparks should be acceptable and may be even 1 microsecod too.
Of course, how many split seconds it takes depends on the speed of the processor running it. ;-) But you probably don't want to spark, say, one addition operation. (Unless perhaps you're adding *really huge* arbitrary-precision integers or something.) Actually, it might be interesting to benchmark where the balance tips; exactly how much work you need to do for the spark overhead to be worth it. It's likely to vary by GHC version though...
afair, overhead expenses was significantly reduced in ghc 6.12, soon to be released
I've heard similar things. I think I even read a paper about it. (Those GHC guys... always putting out such interesting papers! If it weren't for them, I might actually get some work done...) If you wanted to benchmark anything, it would seem prudent to wait for this. ;-)

gregorypropf:
One of the things I liked about Haskell was the notion of pure functions and the fact that they can be, in theory, automatically parallelized on multicore hardware. I think this will become a huge deal in a few years as cores multiply. My question is simply this: under GHC is this what really happens with, say a mapping over a pure function. Yes, I compiled with --threaded and am using the +RTS -N2 options on my dual core machine. Here's the code I wrote as a speed test. It just doesn't seem any faster with -N2. Using the ps command I found that multiple threads are indeed launched (this is Linux) but all but one show as being in a state of waiting for some event to finish (the ps output flags them all 'Sl'.
main = do rg <- getStdGen let rs = take 10000000 $ randomRs (1::Int,100000::Int) rg rs'= map (\n -> n*n) rs print rs'
GHC doesn't auto-parallelize. You would have to use one of the several fine parallelism constructrs to achieve a speedup. Here's a recent tutorial, http://donsbot.wordpress.com/2009/09/05/defun-2009-multicore-programming-in-... and some background reading, http://donsbot.wordpress.com/2009/09/03/parallel-programming-in-haskell-a-re... -- Don
participants (4)
-
Andrew Coppin
-
Bulat Ziganshin
-
Don Stewart
-
Gregory Propf