Waiting for garbage collection can kill parallelism?

9 Nov 2012

      Today I was reading "Parallel Performance Tuning for Haskell" by Jones, Marlow and Singh and 
wanted to replicate the results for their first case study. The code goes like this:

module Main where
import Control.Parallel

main :: IO ()
main = print . parSumFibEuler 38 $ 5300

parSumFibEuler :: Int -> Int -> Int
parSumFibEuler a b = f `par` (e `pseq` (e + f))
    where f = fib a
          e = sumEuler b

fib :: Int -> Int
fib 0 = 0
fib 1 = 1
fib n = fib (n - 1) + fib (n - 2)

mkList :: Int -> [Int]
mkList n = [1..n-1]

relprime :: Int -> Int -> Bool
relprime x y = gcd x y == 1

euler :: Int -> Int
euler n = length (filter (relprime n) (mkList n))

sumEuler :: Int -> Int
sumEuler = sum . (map euler) . mkList

sumFibEuler :: Int -> Int -> Int
sumFibEuler a b = fib a + sumEuler b

This is the version shown on page 3 of the paper, after adding the pseq combinator to enforce 
correct evaluation order. I compile and run it with:

ghc -O2 -rtsopts -threaded -eventlog parallel.hs
./parallel +RTS -s -ls -N2

In the paper authors show that this version does in fact perform computation in parallel and that 
good speedup is achieved. However, when I run the code what happens is that HEC 1 blocks very 
quickly requesting GC. HEC 0 (if I am correct the one calculating sumEuler) does not interrupt 
but instead continues the computations until they are finished. Then the GC is performed and the 
HEC 1 resumes computation. In this way there is no parallelism, because first HEC 0 does all its 
computations and after first GC HEC 1 does its computation.

My question is why this might be happening? I don't expect the results of the paper to be fully 
reproducible, because the paper is 3 years old and GHC has developed a lot since then. This 
however looks like a regression of some sort. I would appreciate if anyone could explain why 
this.

Janek

Janek S.

tags

participants (1)