
Hi, I have a program that computes a matrix of Floats of m rows by n columns. Computing each Float is relatively expensive. Each line is completely independent of the others, so I thought I'd try some simple SMP parallelism on this code: myFun :: FilePath -> IO () myFun fp = do fs <- readDataDir fp let process f = readFile' f >>= parse printLine = putStrLn . foldr (\a b -> show a ++ "\t" ++ b) "" runDiff l = [ [ diff x y | y <- l ] | (x,i) <- zip l (map getId fs), myFilter i ] ps <- mapM process fs sequence_ [ printLine x | x <- runDiff ps *`using` parList rdeepseq* ] So, I'm using parList to evaluate the rows in parallel, and fully evaluating each row. Here are the timings on a Dual Quad Core AMD 2378 @2.4 GHz, ghc-6.12.3, parallel-2.2.0.1: -N time (ms) none 1m50 2 1m33 3 1m35 4 1m22 5 1m11 6 1m06 7 1m45 The increase at 7 is justified by the fact that there were two other processes running. I don't know how to justify the small increase at N3, though, but that doesn't matter too much. The problem is that I am not getting the gains I expected (halving at N2, a third at N3, etc.). Is this the best one can achieve with this implicit parallelism, or am I doing something wrong? In particular, is the way I'm printing the results at the end destroying potential parallel gains? Any insights on this are appreciated. Thanks, Pedro