will the real quicksort please stand up? (or: sorting a > million element list)

It has been noted in a few places that the 2 line quicksort demo in the Introduction section of the haskell wiki http://www.haskell.org/haskellwiki/Introduction isn't a "real" quicksort, and wouldn't scale well with longer lists. Interested, and wanting to put my recently learned test data generating skills to the test, I put this to the test with a little suite for various versions of quickcheck I found lying around. My patience extends to <3 minutes for waiting for a result, which for my tests was about the amount needed for a million integer long list. This was definitely faster for the "treesort" below than the naive quicksort from the wiki. (Prelude sort, which I think is mergesort, just blew the stack.) I wonder if there are "tricks" or lore that could be applied to get better results or insight. t. {-import Control.Monad (when) import Control.Monad.ST import Data.Array.ST import Data.Array.IArray import Data.Array.MArray import System.IO.Unsafe import Control.Monad -} import Test.QuickCheck hiding (test) import System.Random import Data.List {- $ time ghci -e 'test treesort 6' quicksort.hs # a "real" quicksort, according to roconnor 1000000 real 2m18.702s user 1m3.296s sys 0m0.453s $ time ghci -e 'test qs1 6' quicksort.hs # "naive" (from the wiki intro to haskell, not a "real" quicksort 1000000 real 4m18.174s user 1m57.615s sys 0m0.665s $ time ghci -e 'test sort 6' quicksort.hs # mergesort from the prelude *** Exception: stack overflow real 0m13.634s user 0m3.805s sys 0m0.446s hartthoma@linuxpt:~/ProjectRepos/learning/quicksort> -} -- For interactive development, I wound up working in hugs +S +I, -- because ghci in emacs tended to lock up -- hugs just segfaults, but at least the computer doesn't lock. -- by the way, is there a way to get ghci to behave similarly to hugs? (restricted memory mode?) -- t1 and t2 are about the same speed for n = 4, got bored of waiting for n great. -- naive, "not really" quicksort according to recent reddit discussion. test sortf n = genlist n >>= return . ( length . sortf ) -- naive qs1 [] = [] qs1 (x:xs) = qs1 (filter (< x) xs) ++ [x] ++ qs1 (filter (>= x) xs) -- roconnor claims that "real" quicksort in haskell is really treesort. -- from http://programming.reddit.com/info/2h0j2/comments --I'm talking about a deforested version of tree sort. In tree sort you build a binary search tree by adding the head of the list to the root of the tree, and then building binary search trees for those items less than the root, and those items greater than the root. qs2 l = treeSort l treeSort l = foldTree (\x lft rht -> lft++[x]++rht) [] (mkTree l) -- After building the search tree, it is turned into a list by folding \x lft rht -> lft++[x]++rht. foldTree f g Leaf = g foldTree f g (Node x l r) = f x (foldTree f g l) (foldTree f g r) mkTree [] = Leaf mkTree (x:xs) = Node x (mkTree (filter (<= x) xs)) (mkTree (filter (x <) xs)) -- If you deforest this algorithm (removing the intermediate tree structure) you are left with -- treeSort' [] = [] -- treeSort' (x:xs) = treeSort' (filter (<= x) xs) ++ [x] ++ treeSort' (filter (x <) xs) -- for testing genlist n = rgen ( vector $ 10^n ) :: IO [Int] rgenIntList = rgen (arbitrary :: Gen [Int]) :: IO [Int] rgen gen = do sg <- newStdGen return $ generate 10000 sg gen data Tree a = Leaf | Node a (Tree a) (Tree a) --- This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.

another point: "deforested" treesort is slower. hartthoma@linuxpt:~/ProjectRepos/learning/quicksort>time ghc -e "test treeSort' 6" quicksort 1000000 real 4m3.615s user 1m59.525s sys 0m0.587s The commented indicated that -- If you deforest this algorithm (removing the intermediate tree structure) you are left with treeSort' [] = [] treeSort' (x:xs) = treeSort' (filter (<= x) xs) ++ [x] ++ treeSort' (filter (x <) xs) So.. my take home lesson is that deforestation isn't a performance neutral thing to do. Assuming the comment is correct. (I don't consider myself qualified to judge.) t. --- This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.

Thomas Hartman writes:
another point: "deforested" treesort is slower. The commented indicated that
-- If you deforest this algorithm (removing the intermediate tree structure) you are left with treeSort' [] = [] treeSort' (x:xs) = treeSort' (filter (<= x) xs) ++ [x] ++ treeSort' (filter (x <) xs)
So.. my take home lesson is that deforestation isn't a performance neutral thing to do. Assuming the comment is correct.
Well this was just the removal of the auxiliary *tree* formulae, but such
recursive concatenation isn't anything which should be popularized...
I don't want to start an analysis of known issues, but if you WANT to have
*similar* algorithms relatively efficient, there are two things to do:
1. Avoid two pass filtering.
2. Avoid unecessary (++), with an accumulator. For example:
qsort l = qs l [] where -- pa is the one-pass partition
qs (x:xs) ac = pa xs [] [] where
pa (y:ys) s b | x

jerzy.karczmarczuk@info.unicaen.fr writes:
1. Avoid two pass filtering. 2. Avoid unecessary (++), with an accumulator. For example:
Also, I find that 3. Accumulate equal elements, too pa (y:ys) s e b = case compare x y of ... to be a good choice. Otherwise, quicksort easily grows towards quadratic if you have many multiples of the same value. I think this is more common than the other major pitfall, sorting an already sorted list. (But perhaps the three-way case based on 'compare' is more expensive than the two-way (<) test?) -k -- If I haven't seen further, it is by standing in the footprints of giants

I wonder if there are "tricks" or lore that could be applied to get better results or insight.
t.
Just a quick note, I doubt ghc -e does any optimizations. You'd probably get better results by putting these tests in files and compiling with -O2. In particular I wonder whether that would stop the Prelude sort from blowing the stack. -Brent

On 23/10/2007, at 8:09 AM, Thomas Hartman wrote:
(Prelude sort, which I think is mergesort, just blew the stack.)
GHC uses a "bottom up" merge sort these days. It starts off by creating a list of singletons, then it repeatedly merges adjacent pairs of lists until there is only one list left. I was teaching sorting to my first year students, and I also bumped into the stack overflow at one million elements, using GHC's merge sort. I have been meaning to look into the cause of this, but my suspicion is that strictness (or lack thereof) might be an issue. Cheers, Bernie.

Bernie Pope wrote:
On 23/10/2007, at 8:09 AM, Thomas Hartman wrote:
(Prelude sort, which I think is mergesort, just blew the stack.)
GHC uses a "bottom up" merge sort these days.
It starts off by creating a list of singletons, then it repeatedly merges adjacent pairs of lists until there is only one list left.
I was teaching sorting to my first year students, and I also bumped into the stack overflow at one million elements, using GHC's merge sort.
I have been meaning to look into the cause of this, but my suspicion is that strictness (or lack thereof) might be an issue.
Interesting. One of the functions in GHC's merge sort implementation is merge :: (a -> a -> Ordering) -> [a] -> [a] -> [a] merge cmp xs [] = xs merge cmp [] ys = ys merge cmp (x:xs) (y:ys) = case x `cmp` y of GT -> y : merge cmp (x:xs) ys _ -> x : merge cmp xs (y:ys) Swapping the first two lines appears to fix the problem. I'm not sure why. The generated core only differs in the order of two cases for the second and third argument of 'merge', so it comes down to the precise STG semantics. Can anybody explain the difference? Bertram

[note, the thread is almost a month old] Bernie Pope wrote:
On 23/10/2007, at 8:09 AM, Thomas Hartman wrote:
(Prelude sort, which I think is mergesort, just blew the stack.)
GHC uses a "bottom up" merge sort these days.
It starts off by creating a list of singletons, then it repeatedly merges adjacent pairs of lists until there is only one list left.
I was teaching sorting to my first year students, and I also bumped into the stack overflow at one million elements, using GHC's merge sort.
I think I got to the bottom of this. Consider the following snippet: sort $ (take (10^6) [1..]) The argument of the sort is a list with 10^6 unevaluated elements, [a=1, b=1+a, c=1+b, d=1+c, ...]. Now it turns out that merge sort as implemented in the base library compares the two last elements of the list first. This causes the evaluation of an expression that is approximately 10^6 applications of (+) deep. And that's where you get the stack overflow. [1] Thomas Hartman's example is of a similar nature, it also produces a list of unevaluated terms where each term depends on the value of the previous one. The modification that I proposed in http://www.haskell.org/pipermail/haskell-cafe/2007-October/033617.html has the effect of comparing the first two elements first. I actually believe that this is a reasonable change, because it's more likely to work out fine. But it'll produce a stack overflow on sort $ (reverse (take (10^6) [1..])) instead, which doesn't cause problems currently. The root problem is the creation of deep unevaluated expressions. Bertram [1] Note that sort [1..10^6] works just fine because [1..10^6] produces a list of fully evaluated values, as it compares each list element to the upper bound when it is generated.

It has been noted in a few places that the 2 line quicksort demo in the Introduction section of the haskell wiki
http://www.haskell.org/haskellwiki/Introduction
isn't a "real" quicksort, and wouldn't scale well with longer lists. Interested, and wanting to put my recently learned test data generating skills to the test, I put this to the test with a little suite for various versions of quickcheck I found lying around.
be sure to take a look at Augustsson's version as well. it used to come with hbc utilities (have we lost those?), here's a url to old-space: http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/CONTRIB/libraries/hbc/src/... claus
participants (7)
-
Bernie Pope
-
Bertram Felgenhauer
-
Brent Yorgey
-
Claus Reinke
-
jerzy.karczmarczuk@info.unicaen.fr
-
Ketil Malde
-
Thomas Hartman