
On Fri, 2008-12-19 at 10:42 -0600, Jake McArthur wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Paul Keir wrote:
fibs = 0 : 1 : zipWith (+) fibs (tail fibs)
This is a CAF (Constant Applicative Form). Since it is actually a constant it is never garbage collected, and is always shared, so each thread is only calculating it once. You have essentially created a lookup table.
Though note that with all our obvious suggestions there is still no speedup: heavytask m n = putMVar m $! (fibs !! 100000) where fibs = n : (n+1) : zipWith (+) fibs (tail fibs) -- so now fibs is not globally shared but is used per-heavytask -- it is also evaluated by heavy task rather than just putting a thunk -- into the MVar main = do ms <- sequence $ replicate 8 newEmptyMVar sequence_ [ forkIO (heavytask m n) | (m, n) <- zip ms [0..] ] ms' <- mapM takeMVar ms mapM_ print ms' Looking at the GC stats (+RTS -t -RTS) we see that the majority of the time in this program is spent doing GC and that when we run with -N4 the time spent doing GC is even higher. -N1 1.57 MUT (1.60 elapsed), 7.05 GC (7.16 elapsed) real 0m8.793s -N2 2.50 MUT (1.49 elapsed), 8.48 GC (7.33 elapsed) real 0m8.873s -N4 2.83 MUT (1.56 elapsed), 12.16 GC (7.95 elapsed) real 0m9.572s The process monitor indicates that in the -N1 case, one core hits 100% use for the full 8 seconds. In the -N2 case one core is hitting 90% utilisation with the other three cores doing a little work, up to about 40% utilisation. On some runs the core doing the most work swaps over. In one run at -N2 I got a segmentation fault. In the -N4 case, 4 cores hit between 30% and 80% utilisation. So this benchmark is primarily a stress test of the parallel garbage collector since it is GC that is taking 75-80% of the time. Note that the mutator elapsed time goes down slightly with 2 cores compared to 1 however the GC elapsed time goes up slightly. Duncan