Re: [Haskell-cafe] forkIO on multicore

19 Dec 2008

      On Fri, 2008-12-19 at 10:42 -0600, Jake McArthur wrote:
...
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Paul Keir wrote:
...
fibs = 0 : 1 : zipWith (+) fibs (tail fibs)
This is a CAF (Constant Applicative Form). Since it is actually a
constant it is never garbage collected, and is always shared, so each
thread is only calculating it once. You have essentially created a
lookup table.
Though note that with all our obvious suggestions there is still no
speedup:

heavytask m n = putMVar m $! (fibs !! 100000)
  where
    fibs = n : (n+1) : zipWith (+) fibs (tail fibs)

-- so now fibs is not globally shared but is used per-heavytask
-- it is also evaluated by heavy task rather than just putting a thunk
-- into the MVar

main = do ms <- sequence $ replicate 8 newEmptyMVar
          sequence_
            [ forkIO (heavytask m n)
            | (m, n) <- zip ms [0..] ]
          ms' <- mapM takeMVar ms
          mapM_ print ms'

Looking at the GC stats (+RTS -t -RTS) we see that the majority of the
time in this program is spent doing GC and that when we run with -N4 the
time spent doing GC is even higher.

-N1
1.57 MUT (1.60 elapsed), 7.05 GC (7.16 elapsed)
real	0m8.793s

-N2
2.50 MUT (1.49 elapsed), 8.48 GC (7.33 elapsed)
real	0m8.873s

-N4
2.83 MUT (1.56 elapsed), 12.16 GC (7.95 elapsed)
real	0m9.572s

The process monitor indicates that in the -N1 case, one core hits 100%
use for the full 8 seconds.

In the -N2 case one core is hitting 90% utilisation with the other three
cores doing a little work, up to about 40% utilisation. On some runs the
core doing the most work swaps over.

In one run at -N2 I got a segmentation fault.

In the -N4 case, 4 cores hit between 30% and 80% utilisation.

So this benchmark is primarily a stress test of the parallel garbage
collector since it is GC that is taking 75-80% of the time. Note that
the mutator elapsed time goes down slightly with 2 cores compared to 1
however the GC elapsed time goes up slightly.

Duncan

Re: [Haskell-cafe] forkIO on multicore

Duncan Coutts