
Hello! I tried to implement the parallel Monte-Carlo method of computing Pi number, using two cores: --PROGRAM module Main where import Random import Data.Ratio import Data.List import System.IO import GHC.Conc main = do putStrLn "pi 1" putStr "n: " hFlush stdout t <- getLine piMonte (read t) >>= (putStrLn . show) piMonte n = do (g1, g2) <- split `fmap` getStdGen let r1 = r (n `div` 2) g1 r2 = r (n `div` 2 + n `mod` 2) g2 in return (ratio (r1 `par` (r2 `pseq` (merge r1 r2)))) where r n g = (length (filter id lAll), n) where l = take n . randomRs (0, 1) inCircle :: Double -> Double -> Bool inCircle a b = a*a + b*b <= 0.25 lAll = zipWith inCircle (l g1) (l g2) (g1, g2) = split g ratio :: (Int, Int) -> Double ratio (a, b) = fromRational (toInteger a % toInteger b * 16) merge (a, b) (c, d) = (a + c, b + d) --END But it uses only on core: C:\>ghc --make -threaded Monte.hs -fforce-recomp [1 of 1] Compiling Main ( Monte.hs, Monte.o ) Linking Monte.exe ... C:\>monte +RTS -N2 -s monte +RTS -N2 -s pi 1 n: 1000000 3.143616 2,766,670,536 bytes allocated in the heap 1,841,300 bytes copied during GC 5,872 bytes maximum residency (1 sample(s)) 23,548 bytes maximum slop 2 MB total memory in use (0 MB lost due to fragmentation) Generation 0: 5285 collections, 5284 parallel, 0.64s, 0.31s elapsed Generation 1: 1 collections, 1 parallel, 0.00s, 0.00s elapsed Parallel GC work balance: 1.00 (454838 / 454676, ideal 2) MUT time (elapsed) GC time (elapsed) Task 0 (worker) : 0.00s ( 9.33s) 0.00s ( 0.00s) Task 1 (worker) : 0.63s ( 9.33s) 0.00s ( 0.00s) Task 2 (worker) : 6.00s ( 9.34s) 0.64s ( 0.31s) Task 3 (worker) : 0.00s ( 9.34s) 0.00s ( 0.00s) SPARKS: 1 (0 converted, 1 pruned) INIT time 0.02s ( 0.00s elapsed) MUT time 6.63s ( 9.34s elapsed) GC time 0.64s ( 0.31s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 7.28s ( 9.66s elapsed) %GC time 8.8% (3.2% elapsed) Alloc rate 416,628,033 bytes per MUT second Productivity 91.0% of total user, 68.6% of total elapsed We see that our one spark is pruned. Why? And another question. I compiled it also with -O: C:\>ghc --make -threaded Monte.hs -O -fforce-recomp [1 of 1] Compiling Main ( Monte.hs, Monte.o ) Linking Monte.exe ... C:\>monte +RTS -N2 -s monte +RTS -N2 -s pi 1 n: 1000000 3.148096 2,642,947,868 bytes allocated in the heap 1,801,952 bytes copied during GC 5,864 bytes maximum residency (1 sample(s)) 18,876 bytes maximum slop 2 MB total memory in use (0 MB lost due to fragmentation) Generation 0: 5077 collections, 5076 parallel, 0.08s, 0.05s elapsed Generation 1: 1 collections, 1 parallel, 0.00s, 0.00s elapsed Parallel GC work balance: 1.00 (445245 / 444651, ideal 2) MUT time (elapsed) GC time (elapsed) Task 0 (worker) : 3.94s ( 14.02s) 0.00s ( 0.00s) Task 1 (worker) : 0.00s ( 14.02s) 0.00s ( 0.00s) Task 2 (worker) : 5.61s ( 14.03s) 0.08s ( 0.05s) Task 3 (worker) : 0.00s ( 14.05s) 0.00s ( 0.00s) SPARKS: 1 (0 converted, 0 pruned) INIT time 0.02s ( 0.02s elapsed) MUT time 9.55s ( 14.03s elapsed) GC time 0.08s ( 0.05s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 9.64s ( 14.09s elapsed) %GC time 0.8% (0.3% elapsed) Alloc rate 276,386,705 bytes per MUT second Productivity 99.0% of total user, 67.7% of total elapsed We see, that with -O, 2 worker threads were doing some job, but overall performance is not better.
From one spark, zero - converted, zero - pruned. Is it a bug?