
Hi all, I'm experimenting a bit with the parallelization capabilities of Haskell. What I am trying to do is to process in parallel all the lines of a text file, calculating the edit distance of each of these lines with a given string. This is my testing code: import System.IO import Control.Monad import Control.Parallel import Control.Parallel.Strategies edist :: String -> String -> Int -- edist calculates the edit distance of 2 strings -- see for example http://www.csse.monash.edu.au/~lloyd/tildeFP/Haskell/1998/Edit01/ getLines :: FilePath -> IO [Int] getLines = liftM ((parMap rnf (edist longString)) . lines) . readFile main :: IO () main = do list <- getLines "input.txt" mapM_ ( putStrLn . show ) list I am testing this code in a 2xQuadCore linux (Ubuntu 8.10) machine (8 cores in total). The code has been compiled with ghc --make -threaded mytest.hs I've been trying input files of different lengths, but the more cores I try to use, the worst performance I am getting. Here are some examples: # input.txt -> 10 lines (strings) of ~1200 letters each $ time ./mytest +RTS -N1 > /dev/null real 0m4.775s user 0m4.700s sys 0m0.080s $ time ./mytest +RTS -N4 > /dev/null real 0m6.272s user 0m8.220s sys 0m0.290s $ time ./mytest +RTS -N8 > /dev/null real 0m7.090s user 0m10.960s sys 0m0.400s # input.txt -> 100 lines (strings) of ~1200 letters each $ time ./mytest +RTS -N1 > /dev/null real 0m49.854s user 0m49.730s sys 0m0.120s $ time ./mytest +RTS -N4 > /dev/null real 1m11.303s user 1m36.210s sys 0m1.070s $ time ./mytest +RTS -N8 > /dev/null real 1m19.488s user 2m6.250s sys 0m1.270s What is going wrong in this code? Is this a problem of the "grain size" of the parallelization? Any help / advice would be very welcome, M;