
I'm trying to write a parallel quicksort algorithm for lists. This is my original implementation: quickSort [] = [] quickSort (x:xs) = (quickSort small) ⊕ [x] ⊕ (quickSort big) where small = [p | p ← xs, p ≤ x] big = [p | p ← xs, p > x] and the output is: $ ghc -threaded -rtsopts -o quicksort quicksort.hs && ./quicksort +RTS -N2 -RTS [1 of 1] Compiling Main ( quicksort.hs, quicksort.o ) Linking quicksort ... Sorting 1000000 elements... CPU Time: 13290000000000 Time elapsed: 8.929503s ovidiu@asus:~/work/haskell$ ghc -threaded -rtsopts -o quicksort quicksort.hs && ./quicksort +RTS -N2 -RTS Sorting 1000000 elements... CPU Time: 11240000000000 Time elapsed: 7.785293s ovidiu@asus:~/work/haskell$ ghc -threaded -rtsopts -o quicksort quicksort.hs && ./quicksort +RTS -N1 -RTS Sorting 1000000 elements... CPU Time: 6790000000000 Time elapsed: 6.817648s ovidiu@asus:~/work/haskell$ ghc -threaded -rtsopts -o quicksort quicksort.hs && ./quicksort +RTS -N1 -RTS Sorting 1000000 elements... CPU Time: 6980000000000 Time elapsed: 7.006658s ovidiu@asus:~/work/haskell$ ghc -threaded -rtsopts -o quicksort quicksort.hs && ./quicksort +RTS -N1 -RTS Sorting 1000000 elements... CPU Time: 5900000000000 Time elapsed: 5.932236s ...so the conclusion is that using option N1 is faster the N2. This makes sense. Then I tried to parallelize it: First try: ----------------- quickSort [] = [] quickSort (x:xs) = small `pseq` ((quickSort small) ⊕ [x] ⊕ (quickSort big)) where small = [p | p ← xs, p ≤ x] big = [p | p ← xs, p > x] $ ghc -threaded -rtsopts -o quicksort quicksort.hs && ./quicksort +RTS -N2 -RTS [1 of 1] Compiling Main ( quicksort.hs, quicksort.o ) Linking quicksort ... Sorting 1000000 elements... CPU Time: 12020000000000 Time elapsed: 8.29653s This is slower then the non-parallel version Second try: --------------- quickSort [] = [] quickSort (x:xs) = small `par` ((quickSort small) ⊕ [x] ⊕ (quickSort big)) where small = [p | p ← xs, p ≤ x] big = [p | p ← xs, p > x] $ ghc -threaded -rtsopts -o quicksort quicksort.hs && ./quicksort +RTS -N2 -RTS Sorting 1000000 elements... CPU Time: 14750000000000 Time elapsed: 10.772271s Even slower Third try: ------------- quickSort [] = [] quickSort (x:xs) = small `par` (big `par` ((quickSort small) ⊕ [x] ⊕ (quickSort big))) where small = [p | p ← xs, p ≤ x] big = [p | p ← xs, p > x] $ ghc -threaded -rtsopts -o quicksort quicksort.hs && ./quicksort +RTS -N2 -RTS [1 of 1] Compiling Main ( quicksort.hs, quicksort.o ) Linking quicksort ... Sorting 1000000 elements... CPU Time: 134490000000000 Time elapsed: 122.917093s Fourth try: ------------------------ quickSort [] = [] quickSort (x:xs) = small `par` (big `pseq` ((quickSort small) ⊕ [x] ⊕ (quickSort big))) where small = [p | p ← xs, p ≤ x] big = [p | p ← xs, p > x] $ ghc -threaded -rtsopts -o quicksort quicksort.hs && ./quicksort +RTS -N2 -RTS [1 of 1] Compiling Main ( quicksort.hs, quicksort.o ) Linking quicksort ... Sorting 1000000 elements... CPU Time: 12770000000000 Time elapsed: 8.844304s ----------------------------- It seems that I'm unable to make it parallel. What am I doing wrong? Thanks, ovidiu See the full code below: -------------------------------------------------- module Main where import Prelude import Data.List import Data.Time.Clock import System.CPUTime import System.Random import Control.Parallel import Control.Exception (evaluate) import Control.DeepSeq (rnf) import Text.Printf quickSort [] = [] quickSort (x:xs) = small `par` (big `par` ((quickSort small) ⊕ [x] ⊕ (quickSort big))) where small = [p | p ← xs, p ≤ x] big = [p | p ← xs, p > x] randomlist :: Int → StdGen → [Int] randomlist n = take n∘unfoldr (Just∘random) len = 10 ↑ 6 time = do t ← getCurrentTime c ← getCPUTime return (t,c) measure f p = do (t1, c1) ← time evaluate $ rnf $ f p (t2, c2) ← time return (diffUTCTime t2 t1, c2 - c1) main = do seed ← newStdGen let rs = randomlist len seed printf "Sorting %d elements...\n" len (t, cpu) ← measure quickSort rs printf "CPU Time: %dλnTime elapsed: %sλn" cpu (show t)