
Hi all I am curious as to why the List.sort implementation in GHC is a quicksort algorithm rather than an algorithm that guarantees n log n time in the worst case? I have attached a mergesort implementation along with a few scripts to time it's performance, the results of which are shown below (* means it didn't finish successfully - in all cases this was due to a stack overflow). I am also curious as to the cause of the last failure (this is using the attached mergesort to sort random_list = take 100000 $ randoms $ mkStdGen 100) as it manages with sorted = [1 .. 100000] and revsorted = reverse [1 .. 100000]. Is this because of some optimisations being done for the (rev)sorted cases or is it a problem with randoms? If I heap profile the random_list case with only 10000 then I see random_list peaks at using about 2.5M of memory, whereas in the same program using List.sort it uses only 100k. Input style Input length Sort data Sort alg User time stdin 10000 random_list sort 2.82 stdin 10000 random_list mergesort 2.96 stdin 10000 sorted sort 31.37 stdin 10000 sorted mergesort 1.90 stdin 10000 revsorted sort 31.21 stdin 10000 revsorted mergesort 1.88 stdin 100000 random_list sort * stdin 100000 random_list mergesort * stdin 100000 sorted sort * stdin 100000 sorted mergesort * stdin 100000 revsorted sort * stdin 100000 revsorted mergesort * func 10000 random_list sort 0.31 func 10000 random_list mergesort 0.91 func 10000 sorted sort 19.09 func 10000 sorted mergesort 0.15 func 10000 revsorted sort 19.17 func 10000 revsorted mergesort 0.16 func 100000 random_list sort 3.85 func 100000 random_list mergesort * func 100000 sorted sort 5831.47 func 100000 sorted mergesort 2.23 func 100000 revsorted sort 5872.34 func 100000 revsorted mergesort 2.24 Any comments gratefully received Thanks Ian

I am curious as to why the List.sort implementation in GHC is a quicksort algorithm rather than an algorithm that guarantees n log n time in the worst case?
I'm not going to defend quicksort here, but your question reminded me of another one, asked a while ago on comp.lang.functional (actually, the topic seems to come up regularly, as a search for quicksort in c.l.f shows;-). This particular poster suggested that choosing a random pivot instead of the first element might make the infamous showcase quicksort a lot less elegant. As your measurements target precisely this weakness of typical functional quicksort implementations (see the c.l.f discussions for other problems), I thought I'd have a go at the challenge. It turns out that one doesn't even need the source code for sort to make the change!-) Could you perhaps include one of the following randomised-input sort variants in your measurements? The idea is to take a list of randoms, the list of inputs, and presort the input list by the random indices, so that selecting the first element as pivot in the following main sort actually takes a random element (the presort uses randomised input by construction). That's two well-behaved quicksorts instead of a single, often misbehaving one (well-behaved only on average, of course). Should do a lot better, on average, though it will still lose against other sorts, not to mention sorts tailored to what you know about your input list. Just for fun;-) Claus
module RSort (rsortIO,randomiseIO,rsort,randomise) where
import List import Random
getRandoms n = newStdGen >>= return . randomRs (1,n)
randomiseIO l = do rs <- getRandoms $ length l return $ map snd $ sortBy compareIdx $ zip rs l where compareIdx (i,_) (j,_) = i `compare` j
rsortIO l = randomiseIO l >>= return . sort
randomise l = do map snd $ sortBy compareIdx $ zip rs l where n = length l rs = take n $ randomRs (1,n) $ mkStdGen 100 compareIdx (i,_) (j,_) = i `compare` j
rsort l = sort $ randomise l

On Sun, May 12, 2002 at 09:22:02PM +0100, Claus Reinke wrote:
randomise l = do map snd $ sortBy compareIdx $ zip rs l where n = length l rs = take n $ randomRs (1,n) $ mkStdGen 100 compareIdx (i,_) (j,_) = i `compare` j
rsort l = sort $ randomise l
This algorithm doesn't have the stable property, but it can be easily adapted with:
srandomise l = do map fst $ map snd $ sortBy compareIdx $ zip rs $ zip l ([1..] :: Integer) where n = length l rs = take n $ randomRs (1,n) $ mkStdGen 100 compareIdx (i,_) (j,_) = i `compare` j
rssort l = sort $ srandomise l
Out of curiousity I also tried:
sxrandomise l = do map fst $ map snd $ sortBy compareIdx $ zip rs $ zip l ([1..] :: Integer) where rs = randoms $ mkStdGen 100 compareIdx (i,_) (j,_) = i `compare` j
rsxsort l = sort $ sxrandomise l
Of course if you use 100 as the seed then rsxsort does poorly on the random_list test as it first sorts the input and then sorts the sorted input, giving stdin 10000 random_list rsxsort 21.90 func 10000 random_list rsxsort 27.21 The results using 200 as a seed are shown below (I removed stdin/100000 as they were all *s again). mergesort has the edge, but you'd expect that as r*sort aren't going to get perfect splits. r*sort should do better if there were lots of repeated inputs. Input style Input length Sort data Sort alg User time stdin 10000 random_list sort 2.85 stdin 10000 random_list rsort 2.98 stdin 10000 random_list rssort 3.04 stdin 10000 random_list rsxsort 3.18 stdin 10000 random_list mergesort 3.02 stdin 10000 sorted sort 31.09 stdin 10000 sorted rsort 2.05 stdin 10000 sorted rssort 2.09 stdin 10000 sorted rsxsort 2.20 stdin 10000 sorted mergesort 1.88 stdin 10000 revsorted sort 31.17 stdin 10000 revsorted rsort 2.04 stdin 10000 revsorted rssort 2.10 stdin 10000 revsorted rsxsort 2.17 stdin 10000 revsorted mergesort 1.85 func 10000 random_list sort 0.30 func 10000 random_list rsort 0.94 func 10000 random_list rssort 1.05 func 10000 random_list rsxsort 1.14 func 10000 random_list mergesort 0.91 func 10000 sorted sort 19.28 func 10000 sorted rsort 0.30 func 10000 sorted rssort 0.37 func 10000 sorted rsxsort 0.44 func 10000 sorted mergesort 0.16 func 10000 revsorted sort 19.30 func 10000 revsorted rsort 0.30 func 10000 revsorted rssort 0.37 func 10000 revsorted rsxsort 0.48 func 10000 revsorted mergesort 0.15 func 100000 random_list sort 3.97 func 100000 random_list rsort * func 100000 random_list rssort * func 100000 random_list rsxsort * func 100000 random_list mergesort * func 100000 sorted sort 5873.15 func 100000 sorted rsort 5.25 func 100000 sorted rssort 5.66 func 100000 sorted rsxsort 6.29 func 100000 sorted mergesort 2.18 func 100000 revsorted sort 5828.57 func 100000 revsorted rsort 5.13 func 100000 revsorted rssort 5.57 func 100000 revsorted rsxsort 6.45 func 100000 revsorted mergesort 2.24
participants (2)
-
Claus Reinke
-
Ian Lynagh