
That's really interesting -- I just tried this.
Compiling not using -threaded: 1.289 seconds Compiling using -threaded, but not running with -N2: 3.403 seconds Compiling using -threaded, and using -N2: 55.072 seconds
I was hoping to see a relative improvement when introducting an opportunity parallelism in the program, so I made a version with two MVars filled at the start. This didn't work out though - perhaps some performance stands to be gained by improving the GHC scheduler wrt cpu / OS thread affinity for the Haskell threads? For the curious: -O2: 7.3 seconds (CPU: 99.7% user) -O2 -threaded: 11.5 seconds (CPU: 95% user, 5% system) -O2 -threaded ... +RTS -N2: killed after 3 minutes (CPUs: 15% user, 20% system) Thats quite a lot of CPU time going to the system. Specs: Linux 2.6.26 (Arch) x86_64 Intel Core 2 Duo 2.5GHz SOURCE Notice the threads still exist when a value of 0 is seen - this is OK as the other value will be terminating ringsize threads earlier and this thread will never be needed again. import Control.Monad import Control.Concurrent import System.Environment ring = 503 new l i = do r <- newEmptyMVar forkIO (thread i l r) return r thread :: Int -> MVar Int -> MVar Int -> IO () thread i l r = go where go = do m <- takeMVar l when (m == 1) (print i) putMVar r $! m - 1 when (m > 0) go main = do val <- return . read . head =<< getArgs a <- newMVar val b <- newMVar val y <- foldM new a [2..ring] forkIO $ thread (ring+1) y b z <- foldM new b [(ring + 2)..(ring + ring)] thread 1 z a