I killed performance of my code with Eval and Strategies

Dear Haskellers, I am reading Simon Marlow's tutorial on parallelism and I have problems with correctly using Eval monad and Strategies. I *thought* I understand them but after writing some code it turns out that obviously I don't because parallelized code is about 20 times slower. Here's a short example (code + criterion benchmarks): {-# LANGUAGE BangPatterns #-} module Main where import Control.Parallel.Strategies import Criterion.Main main :: IO () main = defaultMain [ bench "Seq" $ nf calculateSeq xs , bench "Par" $ nf calculatePar xs ] where xs = [1..16384] calculateSeq :: [Double] -> [Double] calculateSeq [] = [] calculateSeq (x:xs) = (sin . sqrt $ x) : xs calculatePar :: [Double] -> [Double] calculatePar xss = runEval $ go xss where go :: Strategy [Double] go [] = return [] go xs = do lsh <- (rpar `dot` rdeepseq) $ calculateSeq as lst <- go bs return (lsh ++ lst) where !(as, bs) = splitAt 8192 xs Compiling and running with: ghc -O2 -Wall -threaded -rtsopts -fforce-recomp -eventlog evalleak.hs ./evalleak -oreport.html -g +RTS -N2 -ls -s I get: benchmarking Seq mean: 100.5990 us, lb 100.1937 us, ub 101.1521 us, ci 0.950 std dev: 2.395003 us, lb 1.860923 us, ub 3.169562 us, ci 0.950 benchmarking Par mean: 2.233127 ms, lb 2.169669 ms, ub 2.296155 ms, ci 0.950 std dev: 323.5201 us, lb 310.2844 us, ub 344.8252 us, ci 0.950 That's a hopeless result. Looking at the spark allocation everything looks fine: SPARKS: 202 (202 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled) But analyzing eventlog with ThreadScope I see that parallel function spends most of the time doing garbage collection, which suggests that I have a memory leak somewhere. I suspected that problem might be caused by appending two lists together in the parallel implementation, but replacing this with difference lists doesn't help. Changing granularity (e.g. splitAt 512) also brings no improvement. Can anyone point me to what am I doing wrong? Janek PS. This is of course not a real world code - I know that I'd be better of using unboxed data structures for doing computations on Doubles.

On Nov 14, 2012 10:44 PM, "Janek S."
calculateSeq :: [Double] -> [Double] calculateSeq [] = [] calculateSeq (x:xs) = (sin . sqrt $ x) : xs
Do you really mean to calculate the 'sin . sqrt' of just the head of the list, or do you mean: calculateSeq = map (sin . sqrt) ? Bas

Dear Janek,
I am reading Simon Marlow's tutorial on parallelism and I have problems with correctly using Eval monad and Strategies. I *thought* I understand them but after writing some code it turns out that obviously I don't because parallelized code is about 20 times slower. Here's a short example (code + criterion benchmarks):
Actually, (sin . sqrt) is simply too cheap. The overhead of constructing chunks (which have to be constructed on the heap) and concatenating the results far outweighs the cost of computing the list elements. If, for example, you replace sin . sqrt by f defined by f :: Double -> Double f x | x < 10 = x*x | otherwise = sin x * f (x-100) the picture will change. The loss also becomes far less dramatic if you construct the chunks outside of the benchmark: main :: IO () main = defaultMain [ bench "Seq" $ nf (map calculateSeq) xs , bench "Par" $ nf calculatePar xs ] where xs = chunk 2048 [1..16384] f, f' :: Double -> Double f x = sqrt (sin x) f' x | x < 10 = x*x | otherwise = sin x * f' (x-100) calculateSeq :: [Double] -> [Double] calculateSeq [] = [] calculateSeq (x:xs) = f x : calculateSeq xs calculatePar :: [[Double]] -> [[Double]] calculatePar xss = runEval $ parList (rdeepseq . calculateSeq) xss chunk :: Int -> [a] -> [[a]] chunk _ [] = [] chunk n xs = as : chunk n bs where !(as, bs) = splitAt n xs The parallel version (with f = sqrt . sin) is still somewhat slower than the sequential version with -N1 -- probably due to rdeepseq. Best regards, Bertram
participants (3)
-
Bas van Dijk
-
Bertram Felgenhauer
-
Janek S.