
I recently ran into a problem writing a program that I hoped I could speed up by running over multiple CPUs. I want non-haskell users to be able to run the tool and take advantage of multiple CPUs. *But* there is a serious slowdown when the RTS is run with -N and some of the CPUs are already busy. This is already mentioned in the GHC docs, but the problem I experienced was more serious than I expected causing a slowdown of around 2x compared to running with a single CPU. I reproduced the problem with the following code from the haskell wiki: {-# LANGUAGE BangPatterns #-} import Data.Digest.Pure.MD5 import qualified Data.ByteString.Lazy as L import System.Environment import Control.Concurrent import Control.Monad (replicateM_) main = do files <- getArgs str <- newEmptyMVar mapM_ (forkIO . hashAndPrint str) files printNrResults (length files) str printNrResults i var = replicateM_ i (takeMVar var >>= putStrLn) hashAndPrint str f = do bs <- L.readFile f let !h = show $ md5 bs putMVar str (f ++ ": " ++ h) When run on 4 idle CPU cores, I get the following wall clock times: ./run +RTS -N1 : 20.4 sec ./run +RTS -N2 : 11.0 sec ./run +RTS -N4 : 6.7 sec When run on the same 4 core machine, but with 2 cores already busy: ./run +RTS -N1 : 23.5 sec ./run +RTS -N2 : 14.1 sec ./run +RTS -N4 : 57.8 sec <---- Blowout... This is quite a problem in practice when running on a shared server. Is there anything that can be done to address this? (I wrote up a few more details here: http://thunking.drp.id.au/2012/06/slowdown-with-ghc-when-using-multiple.html ) Thanks, -- David Powell