
I am curious as to why the List.sort implementation in GHC is a quicksort algorithm rather than an algorithm that guarantees n log n time in the worst case?
I'm not going to defend quicksort here, but your question reminded me of another one, asked a while ago on comp.lang.functional (actually, the topic seems to come up regularly, as a search for quicksort in c.l.f shows;-). This particular poster suggested that choosing a random pivot instead of the first element might make the infamous showcase quicksort a lot less elegant. As your measurements target precisely this weakness of typical functional quicksort implementations (see the c.l.f discussions for other problems), I thought I'd have a go at the challenge. It turns out that one doesn't even need the source code for sort to make the change!-) Could you perhaps include one of the following randomised-input sort variants in your measurements? The idea is to take a list of randoms, the list of inputs, and presort the input list by the random indices, so that selecting the first element as pivot in the following main sort actually takes a random element (the presort uses randomised input by construction). That's two well-behaved quicksorts instead of a single, often misbehaving one (well-behaved only on average, of course). Should do a lot better, on average, though it will still lose against other sorts, not to mention sorts tailored to what you know about your input list. Just for fun;-) Claus
module RSort (rsortIO,randomiseIO,rsort,randomise) where
import List import Random
getRandoms n = newStdGen >>= return . randomRs (1,n)
randomiseIO l = do rs <- getRandoms $ length l return $ map snd $ sortBy compareIdx $ zip rs l where compareIdx (i,_) (j,_) = i `compare` j
rsortIO l = randomiseIO l >>= return . sort
randomise l = do map snd $ sortBy compareIdx $ zip rs l where n = length l rs = take n $ randomRs (1,n) $ mkStdGen 100 compareIdx (i,_) (j,_) = i `compare` j
rsort l = sort $ randomise l