[Haskell-cafe] Parallel Pi

17 Mar 2010

      Hello!
I tried to implement the parallel Monte-Carlo method of computing Pi
number, using two cores:

--PROGRAM
module Main where

import Random
import Data.Ratio
import Data.List
import System.IO
import GHC.Conc

main = do
  putStrLn "pi 1"
  putStr "n: "
  hFlush stdout
  t <- getLine
  piMonte (read t) >>= (putStrLn . show)

piMonte n = do
  (g1, g2) <- split `fmap` getStdGen
  let r1 = r (n `div` 2) g1
      r2 = r (n `div` 2 + n `mod` 2) g2 in
    return (ratio (r1 `par` (r2 `pseq` (merge r1 r2))))
  where
    r n g = (length (filter id lAll), n)
      where
        l = take n . randomRs (0, 1)
        inCircle :: Double -> Double -> Bool
        inCircle a b = a*a + b*b <= 0.25
        lAll = zipWith inCircle (l g1) (l g2)
        (g1, g2) = split g
    ratio :: (Int, Int) -> Double
    ratio (a, b) = fromRational (toInteger a % toInteger b * 16)
    merge (a, b) (c, d) = (a + c, b + d)
--END

But it uses only on core:

C:\>ghc --make -threaded Monte.hs -fforce-recomp
[1 of 1] Compiling Main             ( Monte.hs, Monte.o )
Linking Monte.exe ...

C:\>monte +RTS -N2 -s
monte +RTS -N2 -s
pi 1
n: 1000000
3.143616
   2,766,670,536 bytes allocated in the heap
       1,841,300 bytes copied during GC
           5,872 bytes maximum residency (1 sample(s))
          23,548 bytes maximum slop
               2 MB total memory in use (0 MB lost due to fragmentation)

  Generation 0:  5285 collections,  5284 parallel,  0.64s,  0.31s elapsed
  Generation 1:     1 collections,     1 parallel,  0.00s,  0.00s elapsed

  Parallel GC work balance: 1.00 (454838 / 454676, ideal 2)

                        MUT time (elapsed)       GC time  (elapsed)
  Task  0 (worker) :    0.00s    (  9.33s)       0.00s    (  0.00s)
  Task  1 (worker) :    0.63s    (  9.33s)       0.00s    (  0.00s)
  Task  2 (worker) :    6.00s    (  9.34s)       0.64s    (  0.31s)
  Task  3 (worker) :    0.00s    (  9.34s)       0.00s    (  0.00s)

  SPARKS: 1 (0 converted, 1 pruned)

  INIT  time    0.02s  (  0.00s elapsed)
  MUT   time    6.63s  (  9.34s elapsed)
  GC    time    0.64s  (  0.31s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time    7.28s  (  9.66s elapsed)

  %GC time       8.8%  (3.2% elapsed)

  Alloc rate    416,628,033 bytes per MUT second

  Productivity  91.0% of total user, 68.6% of total elapsed

We see that our one spark is pruned. Why?

And another question. I compiled it also with -O:

C:\>ghc --make -threaded Monte.hs -O -fforce-recomp
[1 of 1] Compiling Main             ( Monte.hs, Monte.o )
Linking Monte.exe ...

C:\>monte +RTS -N2 -s
monte +RTS -N2 -s
pi 1
n: 1000000
3.148096
   2,642,947,868 bytes allocated in the heap
       1,801,952 bytes copied during GC
           5,864 bytes maximum residency (1 sample(s))
          18,876 bytes maximum slop
               2 MB total memory in use (0 MB lost due to fragmentation)

  Generation 0:  5077 collections,  5076 parallel,  0.08s,  0.05s elapsed
  Generation 1:     1 collections,     1 parallel,  0.00s,  0.00s elapsed

  Parallel GC work balance: 1.00 (445245 / 444651, ideal 2)

                        MUT time (elapsed)       GC time  (elapsed)
  Task  0 (worker) :    3.94s    ( 14.02s)       0.00s    (  0.00s)
  Task  1 (worker) :    0.00s    ( 14.02s)       0.00s    (  0.00s)
  Task  2 (worker) :    5.61s    ( 14.03s)       0.08s    (  0.05s)
  Task  3 (worker) :    0.00s    ( 14.05s)       0.00s    (  0.00s)

  SPARKS: 1 (0 converted, 0 pruned)

  INIT  time    0.02s  (  0.02s elapsed)
  MUT   time    9.55s  ( 14.03s elapsed)
  GC    time    0.08s  (  0.05s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time    9.64s  ( 14.09s elapsed)

  %GC time       0.8%  (0.3% elapsed)

  Alloc rate    276,386,705 bytes per MUT second

  Productivity  99.0% of total user, 67.7% of total elapsed

We see, that with -O, 2 worker threads were doing some job, but
overall performance is not better.
...
From one spark, zero - converted, zero - pruned. Is it a bug?

[Haskell-cafe] Parallel Pi

Artyom Kazak