SMP parallelism gains inferior than expected

21 Feb 2011

      Hi,

I have a program that computes a matrix of Floats of m rows by n columns.
Computing each Float is relatively expensive. Each line is completely
independent of the others, so I thought I'd try some simple SMP parallelism
on this code:

myFun :: FilePath -> IO ()
myFun fp =
  do fs <- readDataDir fp
     let process f = readFile' f >>= parse
         printLine = putStrLn . foldr (\a b -> show a ++ "\t" ++ b) ""
         runDiff l = [ [ diff x y | y <- l ]
                     | (x,i) <- zip l (map getId fs), myFilter i ]
     ps <- mapM process fs
     sequence_ [ printLine x | x <- runDiff ps *`using` parList rdeepseq* ]

So, I'm using parList to evaluate the rows in parallel, and fully evaluating
each row. Here are the timings on a Dual Quad Core AMD 2378 @2.4 GHz,
ghc-6.12.3, parallel-2.2.0.1:

-N      time (ms)
none    1m50
2       1m33
3       1m35
4       1m22
5       1m11
6       1m06
7       1m45

The increase at 7 is justified by the fact that there were two other
processes running. I don't know how to justify the small increase at N3,
though, but that doesn't matter too much. The problem is that I am not
getting the gains I expected (halving at N2, a third at N3, etc.). Is this
the best one can achieve with this implicit parallelism, or am I doing
something wrong? In particular, is the way I'm printing the results at the
end destroying potential parallel gains?

Any insights on this are appreciated.

Thanks,
Pedro

SMP parallelism gains inferior than expected

José Pedro Magalhães