
I'm trying to write a parallel quicksort algorithm for lists. This is my original implementation: quickSort [] = [] quickSort (x:xs) = (quickSort small) ⊕ [x] ⊕ (quickSort big) where small = [p | p ← xs, p ≤ x] big = [p | p ← xs, p > x] and the output is: $ ghc -threaded -rtsopts -o quicksort quicksort.hs && ./quicksort +RTS -N2 -RTS [1 of 1] Compiling Main ( quicksort.hs, quicksort.o ) Linking quicksort ... Sorting 1000000 elements... CPU Time: 13290000000000 Time elapsed: 8.929503s ovidiu@asus:~/work/haskell$ ghc -threaded -rtsopts -o quicksort quicksort.hs && ./quicksort +RTS -N2 -RTS Sorting 1000000 elements... CPU Time: 11240000000000 Time elapsed: 7.785293s ovidiu@asus:~/work/haskell$ ghc -threaded -rtsopts -o quicksort quicksort.hs && ./quicksort +RTS -N1 -RTS Sorting 1000000 elements... CPU Time: 6790000000000 Time elapsed: 6.817648s ovidiu@asus:~/work/haskell$ ghc -threaded -rtsopts -o quicksort quicksort.hs && ./quicksort +RTS -N1 -RTS Sorting 1000000 elements... CPU Time: 6980000000000 Time elapsed: 7.006658s ovidiu@asus:~/work/haskell$ ghc -threaded -rtsopts -o quicksort quicksort.hs && ./quicksort +RTS -N1 -RTS Sorting 1000000 elements... CPU Time: 5900000000000 Time elapsed: 5.932236s ...so the conclusion is that using option N1 is faster the N2. This makes sense. Then I tried to parallelize it: First try: ----------------- quickSort [] = [] quickSort (x:xs) = small `pseq` ((quickSort small) ⊕ [x] ⊕ (quickSort big)) where small = [p | p ← xs, p ≤ x] big = [p | p ← xs, p > x] $ ghc -threaded -rtsopts -o quicksort quicksort.hs && ./quicksort +RTS -N2 -RTS [1 of 1] Compiling Main ( quicksort.hs, quicksort.o ) Linking quicksort ... Sorting 1000000 elements... CPU Time: 12020000000000 Time elapsed: 8.29653s This is slower then the non-parallel version Second try: --------------- quickSort [] = [] quickSort (x:xs) = small `par` ((quickSort small) ⊕ [x] ⊕ (quickSort big)) where small = [p | p ← xs, p ≤ x] big = [p | p ← xs, p > x] $ ghc -threaded -rtsopts -o quicksort quicksort.hs && ./quicksort +RTS -N2 -RTS Sorting 1000000 elements... CPU Time: 14750000000000 Time elapsed: 10.772271s Even slower Third try: ------------- quickSort [] = [] quickSort (x:xs) = small `par` (big `par` ((quickSort small) ⊕ [x] ⊕ (quickSort big))) where small = [p | p ← xs, p ≤ x] big = [p | p ← xs, p > x] $ ghc -threaded -rtsopts -o quicksort quicksort.hs && ./quicksort +RTS -N2 -RTS [1 of 1] Compiling Main ( quicksort.hs, quicksort.o ) Linking quicksort ... Sorting 1000000 elements... CPU Time: 134490000000000 Time elapsed: 122.917093s Fourth try: ------------------------ quickSort [] = [] quickSort (x:xs) = small `par` (big `pseq` ((quickSort small) ⊕ [x] ⊕ (quickSort big))) where small = [p | p ← xs, p ≤ x] big = [p | p ← xs, p > x] $ ghc -threaded -rtsopts -o quicksort quicksort.hs && ./quicksort +RTS -N2 -RTS [1 of 1] Compiling Main ( quicksort.hs, quicksort.o ) Linking quicksort ... Sorting 1000000 elements... CPU Time: 12770000000000 Time elapsed: 8.844304s ----------------------------- It seems that I'm unable to make it parallel. What am I doing wrong? Thanks, ovidiu See the full code below: -------------------------------------------------- module Main where import Prelude import Data.List import Data.Time.Clock import System.CPUTime import System.Random import Control.Parallel import Control.Exception (evaluate) import Control.DeepSeq (rnf) import Text.Printf quickSort [] = [] quickSort (x:xs) = small `par` (big `par` ((quickSort small) ⊕ [x] ⊕ (quickSort big))) where small = [p | p ← xs, p ≤ x] big = [p | p ← xs, p > x] randomlist :: Int → StdGen → [Int] randomlist n = take n∘unfoldr (Just∘random) len = 10 ↑ 6 time = do t ← getCurrentTime c ← getCPUTime return (t,c) measure f p = do (t1, c1) ← time evaluate $ rnf $ f p (t2, c2) ← time return (diffUTCTime t2 t1, c2 - c1) main = do seed ← newStdGen let rs = randomlist len seed printf "Sorting %d elements...\n" len (t, cpu) ← measure quickSort rs printf "CPU Time: %dλnTime elapsed: %sλn" cpu (show t)

Meanwhile I found this chapter
http://book.realworldhaskell.org/read/concurrent-and-multicore-programming.h...
which discusses exactly the parallelization of the quicksort
algorithm.
Also this http://stackoverflow.com/questions/2338850/haskell-as-a-highly-concurrent-se...
which has links to some resources for parallel haskell.
On Wed, Aug 3, 2011 at 1:44 AM, Ovidiu Deac
I'm trying to write a parallel quicksort algorithm for lists.
This is my original implementation:
quickSort [] = [] quickSort (x:xs) = (quickSort small) ⊕ [x] ⊕ (quickSort big) where small = [p | p ← xs, p ≤ x] big = [p | p ← xs, p > x]
and the output is:
$ ghc -threaded -rtsopts -o quicksort quicksort.hs && ./quicksort +RTS -N2 -RTS [1 of 1] Compiling Main ( quicksort.hs, quicksort.o ) Linking quicksort ... Sorting 1000000 elements... CPU Time: 13290000000000 Time elapsed: 8.929503s
ovidiu@asus:~/work/haskell$ ghc -threaded -rtsopts -o quicksort quicksort.hs && ./quicksort +RTS -N2 -RTS Sorting 1000000 elements... CPU Time: 11240000000000 Time elapsed: 7.785293s
ovidiu@asus:~/work/haskell$ ghc -threaded -rtsopts -o quicksort quicksort.hs && ./quicksort +RTS -N1 -RTS Sorting 1000000 elements... CPU Time: 6790000000000 Time elapsed: 6.817648s
ovidiu@asus:~/work/haskell$ ghc -threaded -rtsopts -o quicksort quicksort.hs && ./quicksort +RTS -N1 -RTS Sorting 1000000 elements... CPU Time: 6980000000000 Time elapsed: 7.006658s
ovidiu@asus:~/work/haskell$ ghc -threaded -rtsopts -o quicksort quicksort.hs && ./quicksort +RTS -N1 -RTS Sorting 1000000 elements... CPU Time: 5900000000000 Time elapsed: 5.932236s
...so the conclusion is that using option N1 is faster the N2. This makes sense.
Then I tried to parallelize it:
First try: ----------------- quickSort [] = [] quickSort (x:xs) = small `pseq` ((quickSort small) ⊕ [x] ⊕ (quickSort big)) where small = [p | p ← xs, p ≤ x] big = [p | p ← xs, p > x]
$ ghc -threaded -rtsopts -o quicksort quicksort.hs && ./quicksort +RTS -N2 -RTS [1 of 1] Compiling Main ( quicksort.hs, quicksort.o ) Linking quicksort ... Sorting 1000000 elements... CPU Time: 12020000000000 Time elapsed: 8.29653s
This is slower then the non-parallel version
Second try: --------------- quickSort [] = [] quickSort (x:xs) = small `par` ((quickSort small) ⊕ [x] ⊕ (quickSort big)) where small = [p | p ← xs, p ≤ x] big = [p | p ← xs, p > x]
$ ghc -threaded -rtsopts -o quicksort quicksort.hs && ./quicksort +RTS -N2 -RTS Sorting 1000000 elements... CPU Time: 14750000000000 Time elapsed: 10.772271s
Even slower
Third try: ------------- quickSort [] = [] quickSort (x:xs) = small `par` (big `par` ((quickSort small) ⊕ [x] ⊕ (quickSort big))) where small = [p | p ← xs, p ≤ x] big = [p | p ← xs, p > x]
$ ghc -threaded -rtsopts -o quicksort quicksort.hs && ./quicksort +RTS -N2 -RTS [1 of 1] Compiling Main ( quicksort.hs, quicksort.o ) Linking quicksort ... Sorting 1000000 elements... CPU Time: 134490000000000 Time elapsed: 122.917093s
Fourth try: ------------------------ quickSort [] = [] quickSort (x:xs) = small `par` (big `pseq` ((quickSort small) ⊕ [x] ⊕ (quickSort big))) where small = [p | p ← xs, p ≤ x] big = [p | p ← xs, p > x]
$ ghc -threaded -rtsopts -o quicksort quicksort.hs && ./quicksort +RTS -N2 -RTS [1 of 1] Compiling Main ( quicksort.hs, quicksort.o ) Linking quicksort ... Sorting 1000000 elements... CPU Time: 12770000000000 Time elapsed: 8.844304s -----------------------------
It seems that I'm unable to make it parallel. What am I doing wrong?
Thanks, ovidiu
See the full code below: -------------------------------------------------- module Main where
import Prelude import Data.List import Data.Time.Clock import System.CPUTime import System.Random import Control.Parallel import Control.Exception (evaluate) import Control.DeepSeq (rnf) import Text.Printf
quickSort [] = [] quickSort (x:xs) = small `par` (big `par` ((quickSort small) ⊕ [x] ⊕ (quickSort big))) where small = [p | p ← xs, p ≤ x] big = [p | p ← xs, p > x]
randomlist :: Int → StdGen → [Int] randomlist n = take n∘unfoldr (Just∘random)
len = 10 ↑ 6
time = do t ← getCurrentTime c ← getCPUTime return (t,c)
measure f p = do (t1, c1) ← time evaluate $ rnf $ f p (t2, c2) ← time return (diffUTCTime t2 t1, c2 - c1)
main = do seed ← newStdGen let rs = randomlist len seed
printf "Sorting %d elements...\n" len
(t, cpu) ← measure quickSort rs printf "CPU Time: %dλnTime elapsed: %sλn" cpu (show t)

On Wed, 2011-08-03 at 10:21 +0300, Ovidiu Deac wrote:
Meanwhile I found this chapter http://book.realworldhaskell.org/read/concurrent-and-multicore-programming.h... which discusses exactly the parallelization of the quicksort algorithm.
So, it seems you found the answer about the interplay of pseq and par, but maybe this is interesting for you as well: http://www.macs.hw.ac.uk/~dsg/gph/language.shtml

Thanks. I understand that par and pseq come with GpH so they are
already included in ghc version 6.
Also distributed GpH sounds interesting, I'll give it a try sometime.
On Wed, Aug 3, 2011 at 10:48 AM, Daniel Seidel
On Wed, 2011-08-03 at 10:21 +0300, Ovidiu Deac wrote:
Meanwhile I found this chapter http://book.realworldhaskell.org/read/concurrent-and-multicore-programming.h... which discusses exactly the parallelization of the quicksort algorithm.
So, it seems you found the answer about the interplay of pseq and par, but maybe this is interesting for you as well:

another page re: parallelism that I've just found:
http://www.haskell.org/haskellwiki/Parallel/Reading
On Wed, Aug 3, 2011 at 11:31 AM, Ovidiu Deac
Thanks. I understand that par and pseq come with GpH so they are already included in ghc version 6.
Also distributed GpH sounds interesting, I'll give it a try sometime.
On Wed, Aug 3, 2011 at 10:48 AM, Daniel Seidel
wrote: On Wed, 2011-08-03 at 10:21 +0300, Ovidiu Deac wrote:
Meanwhile I found this chapter http://book.realworldhaskell.org/read/concurrent-and-multicore-programming.h... which discusses exactly the parallelization of the quicksort algorithm.
So, it seems you found the answer about the interplay of pseq and par, but maybe this is interesting for you as well:
participants (2)
-
Daniel Seidel
-
Ovidiu Deac