New subject: will the real quicksort please stand up? (or: sorting a > million element list)

22 Oct 2007

      It has been noted in a few places that the 2 line quicksort demo in the 
Introduction section of the haskell wiki 

http://www.haskell.org/haskellwiki/Introduction

isn't a "real" quicksort, and wouldn't scale well with longer lists. 
Interested, and wanting to put my recently learned test data generating 
skills to the test, I put this to the test with a little suite for various 
versions of quickcheck I found lying around. 

My patience extends to <3 minutes for waiting for a result, which for my 
tests was about the amount needed for a million integer long list. This 
was definitely faster for the "treesort" below than the naive quicksort 
from the wiki. 

(Prelude sort, which I think is mergesort, just blew the stack.)

I wonder if there are "tricks" or lore that could be applied to get better 
results or insight.

t.

{-import Control.Monad (when)
import Control.Monad.ST
import Data.Array.ST
import Data.Array.IArray
import Data.Array.MArray
import System.IO.Unsafe
import Control.Monad
-}
import Test.QuickCheck hiding (test)
import System.Random
import Data.List

{-
$ time ghci -e 'test treesort 6' quicksort.hs # a "real" quicksort, 
according to roconnor
1000000

real    2m18.702s
user    1m3.296s
sys     0m0.453s
$ time ghci -e 'test qs1 6' quicksort.hs # "naive" (from the wiki intro to 
haskell, not a "real" quicksort
1000000

real    4m18.174s
user    1m57.615s
sys     0m0.665s
$ time ghci -e 'test sort 6' quicksort.hs # mergesort from the prelude
*** Exception: stack overflow

real    0m13.634s
user    0m3.805s
sys     0m0.446s
hartthoma@linuxpt:~/ProjectRepos/learning/quicksort>
-}

-- For interactive development, I wound up working in hugs +S +I, 
-- because ghci in emacs tended to lock up 
-- hugs just segfaults, but at least the computer doesn't lock.
-- by the way, is there a way to get ghci to behave similarly to hugs? 
(restricted memory mode?)

-- t1 and t2 are about the same speed for n = 4, got bored of waiting for 
n great.
-- naive, "not really" quicksort according to recent reddit discussion.

test sortf n = genlist n >>= return . ( length . sortf ) 

-- naive
qs1 []     = []
qs1 (x:xs) = qs1 (filter (< x) xs) ++ [x] ++ qs1 (filter (>= x) xs)

-- roconnor claims that "real" quicksort in haskell is really treesort.
-- from http://programming.reddit.com/info/2h0j2/comments
--I'm talking about a deforested version of tree sort. In tree sort you 
build a binary search tree by adding the head of the list to the root of 
the tree, and then building binary search trees for those items less than 
the root, and those items greater than the root.
qs2 l = treeSort l
treeSort l = foldTree (\x lft rht -> lft++[x]++rht) [] (mkTree l)

-- After building the search tree, it is turned into a list by folding \x 
lft rht -> lft++[x]++rht.
foldTree f g Leaf = g
foldTree f g (Node x l r) = f x (foldTree f g l) (foldTree f g r)

mkTree [] = Leaf
mkTree (x:xs) = Node x (mkTree (filter (<= x) xs)) (mkTree (filter (x <) 
xs))

-- If you deforest this algorithm (removing the intermediate tree 
structure) you are left with
-- treeSort' [] = []
-- treeSort' (x:xs) = treeSort' (filter (<= x) xs) ++ [x] ++ treeSort' 
(filter (x <) xs)

-- for testing
genlist n = rgen ( vector $ 10^n ) :: IO [Int]
rgenIntList = rgen (arbitrary :: Gen [Int]) :: IO [Int]
rgen gen = do sg <- newStdGen
              return $ generate 10000 sg gen

data Tree a = Leaf | Node a (Tree a) (Tree a)

---

This e-mail may contain confidential and/or privileged information. If you 
are not the intended recipient (or have received this e-mail in error) 
please notify the sender immediately and destroy this e-mail. Any 
unauthorized copying, disclosure or distribution of the material in this 
e-mail is strictly forbidden.

will the real quicksort please stand up? (or: sorting a > million element list)

Thomas Hartman

Thomas Hartman

jerzy.karczmarczuk＠info.unicaen.fr

Ketil Malde

Brent Yorgey

Bernie Pope

Bertram Felgenhauer

Bertram Felgenhauer

Claus Reinke

tags

participants (7)