
Meanwhile I found this chapter
http://book.realworldhaskell.org/read/concurrent-and-multicore-programming.h...
which discusses exactly the parallelization of the quicksort
algorithm.
Also this http://stackoverflow.com/questions/2338850/haskell-as-a-highly-concurrent-se...
which has links to some resources for parallel haskell.
On Wed, Aug 3, 2011 at 1:44 AM, Ovidiu Deac
I'm trying to write a parallel quicksort algorithm for lists.
This is my original implementation:
quickSort [] = [] quickSort (x:xs) = (quickSort small) ⊕ [x] ⊕ (quickSort big) where small = [p | p ← xs, p ≤ x] big = [p | p ← xs, p > x]
and the output is:
$ ghc -threaded -rtsopts -o quicksort quicksort.hs && ./quicksort +RTS -N2 -RTS [1 of 1] Compiling Main ( quicksort.hs, quicksort.o ) Linking quicksort ... Sorting 1000000 elements... CPU Time: 13290000000000 Time elapsed: 8.929503s
ovidiu@asus:~/work/haskell$ ghc -threaded -rtsopts -o quicksort quicksort.hs && ./quicksort +RTS -N2 -RTS Sorting 1000000 elements... CPU Time: 11240000000000 Time elapsed: 7.785293s
ovidiu@asus:~/work/haskell$ ghc -threaded -rtsopts -o quicksort quicksort.hs && ./quicksort +RTS -N1 -RTS Sorting 1000000 elements... CPU Time: 6790000000000 Time elapsed: 6.817648s
ovidiu@asus:~/work/haskell$ ghc -threaded -rtsopts -o quicksort quicksort.hs && ./quicksort +RTS -N1 -RTS Sorting 1000000 elements... CPU Time: 6980000000000 Time elapsed: 7.006658s
ovidiu@asus:~/work/haskell$ ghc -threaded -rtsopts -o quicksort quicksort.hs && ./quicksort +RTS -N1 -RTS Sorting 1000000 elements... CPU Time: 5900000000000 Time elapsed: 5.932236s
...so the conclusion is that using option N1 is faster the N2. This makes sense.
Then I tried to parallelize it:
First try: ----------------- quickSort [] = [] quickSort (x:xs) = small `pseq` ((quickSort small) ⊕ [x] ⊕ (quickSort big)) where small = [p | p ← xs, p ≤ x] big = [p | p ← xs, p > x]
$ ghc -threaded -rtsopts -o quicksort quicksort.hs && ./quicksort +RTS -N2 -RTS [1 of 1] Compiling Main ( quicksort.hs, quicksort.o ) Linking quicksort ... Sorting 1000000 elements... CPU Time: 12020000000000 Time elapsed: 8.29653s
This is slower then the non-parallel version
Second try: --------------- quickSort [] = [] quickSort (x:xs) = small `par` ((quickSort small) ⊕ [x] ⊕ (quickSort big)) where small = [p | p ← xs, p ≤ x] big = [p | p ← xs, p > x]
$ ghc -threaded -rtsopts -o quicksort quicksort.hs && ./quicksort +RTS -N2 -RTS Sorting 1000000 elements... CPU Time: 14750000000000 Time elapsed: 10.772271s
Even slower
Third try: ------------- quickSort [] = [] quickSort (x:xs) = small `par` (big `par` ((quickSort small) ⊕ [x] ⊕ (quickSort big))) where small = [p | p ← xs, p ≤ x] big = [p | p ← xs, p > x]
$ ghc -threaded -rtsopts -o quicksort quicksort.hs && ./quicksort +RTS -N2 -RTS [1 of 1] Compiling Main ( quicksort.hs, quicksort.o ) Linking quicksort ... Sorting 1000000 elements... CPU Time: 134490000000000 Time elapsed: 122.917093s
Fourth try: ------------------------ quickSort [] = [] quickSort (x:xs) = small `par` (big `pseq` ((quickSort small) ⊕ [x] ⊕ (quickSort big))) where small = [p | p ← xs, p ≤ x] big = [p | p ← xs, p > x]
$ ghc -threaded -rtsopts -o quicksort quicksort.hs && ./quicksort +RTS -N2 -RTS [1 of 1] Compiling Main ( quicksort.hs, quicksort.o ) Linking quicksort ... Sorting 1000000 elements... CPU Time: 12770000000000 Time elapsed: 8.844304s -----------------------------
It seems that I'm unable to make it parallel. What am I doing wrong?
Thanks, ovidiu
See the full code below: -------------------------------------------------- module Main where
import Prelude import Data.List import Data.Time.Clock import System.CPUTime import System.Random import Control.Parallel import Control.Exception (evaluate) import Control.DeepSeq (rnf) import Text.Printf
quickSort [] = [] quickSort (x:xs) = small `par` (big `par` ((quickSort small) ⊕ [x] ⊕ (quickSort big))) where small = [p | p ← xs, p ≤ x] big = [p | p ← xs, p > x]
randomlist :: Int → StdGen → [Int] randomlist n = take n∘unfoldr (Just∘random)
len = 10 ↑ 6
time = do t ← getCurrentTime c ← getCPUTime return (t,c)
measure f p = do (t1, c1) ← time evaluate $ rnf $ f p (t2, c2) ← time return (diffUTCTime t2 t1, c2 - c1)
main = do seed ← newStdGen let rs = randomlist len seed
printf "Sorting %d elements...\n" len
(t, cpu) ← measure quickSort rs printf "CPU Time: %dλnTime elapsed: %sλn" cpu (show t)