Waiting for garbage collection can kill parallelism?

Today I was reading "Parallel Performance Tuning for Haskell" by Jones, Marlow and Singh and wanted to replicate the results for their first case study. The code goes like this: module Main where import Control.Parallel main :: IO () main = print . parSumFibEuler 38 $ 5300 parSumFibEuler :: Int -> Int -> Int parSumFibEuler a b = f `par` (e `pseq` (e + f)) where f = fib a e = sumEuler b fib :: Int -> Int fib 0 = 0 fib 1 = 1 fib n = fib (n - 1) + fib (n - 2) mkList :: Int -> [Int] mkList n = [1..n-1] relprime :: Int -> Int -> Bool relprime x y = gcd x y == 1 euler :: Int -> Int euler n = length (filter (relprime n) (mkList n)) sumEuler :: Int -> Int sumEuler = sum . (map euler) . mkList sumFibEuler :: Int -> Int -> Int sumFibEuler a b = fib a + sumEuler b This is the version shown on page 3 of the paper, after adding the pseq combinator to enforce correct evaluation order. I compile and run it with: ghc -O2 -rtsopts -threaded -eventlog parallel.hs ./parallel +RTS -s -ls -N2 In the paper authors show that this version does in fact perform computation in parallel and that good speedup is achieved. However, when I run the code what happens is that HEC 1 blocks very quickly requesting GC. HEC 0 (if I am correct the one calculating sumEuler) does not interrupt but instead continues the computations until they are finished. Then the GC is performed and the HEC 1 resumes computation. In this way there is no parallelism, because first HEC 0 does all its computations and after first GC HEC 1 does its computation. My question is why this might be happening? I don't expect the results of the paper to be fully reproducible, because the paper is 3 years old and GHC has developed a lot since then. This however looks like a regression of some sort. I would appreciate if anyone could explain why this. Janek
participants (1)
-
Janek S.