
Sounds like you're paying a x2 cost for loading up the threaded runtime (compare -N1 -threaded with no flags.) Not really sure why, it looks like you're getting killed by GC. Are you sure you want to be doing factorial on Integers? Edward Excerpts from Michael Craig's message of Thu Dec 01 00:50:15 -0500 2011:
I was writing some parallel code (asynchronous database writes for an event logger, but that's besides the point), and it seemed like the parallelized version (i.e. compiled with -threaded -with-rtsopts=-N2) wasn't running fast enough. I boiled it down to a dead-simple test:
import Control.Concurrent import Data.Time.Clock.POSIX import System.Environment
main :: IO () main = do n <- getArgs >>= return . read . head t1 <- getPOSIXTime work n t2 <- getPOSIXTime putStrLn $ show $ t2 - t1 putStrLn $ show $ (fromIntegral n :: Double) / (fromRational . toRational $ t2 - t1)
work :: Integer -> IO () work n = do forkIO $ putStrLn $ seq (fact n) "Done" putStrLn $ seq (fact n) "Done"
fact :: Integer -> Integer fact 1 = 1 fact n = n * fact (n - 1)
(I know this is not the best way to time things but I think it suffices for this test.)
Compiled with ghc --make -O3 test.hs, ./test 500000 runs for 74 seconds. Compiling with ghc --make -O3 -threaded -with-rtsopts=-N, ./test 500000 runs for 82 seconds (and seems to be using 2 cpu cores instead of just 1, on a 4-core machine). What gives?
Mike S Craig