
"Serge D. Mechveliani"
But sortBy' (compare) [1 .. n]
costs too much, even for n = 11000. It costs (on worst data) many times more than mergeSort.
Yes, but why do you want to sort sorted data? I think the multiple value cost, i.e. that sortBy (compare) (take n $ repeat 1) is equally expensive, is a bigger problem. Okay, probably because it caught me unawares, but also because the mantra is "quicksort is pessimal on sorted data". But if mergesort (or heapsort for that matter) can be made to behave nicely, I think that's a good alternative. I haven't run numbers, but I was under the impression that mergesort was quite a bit slower than quicksort in the expected case. I, for one, am sorting expected, not worst-case, data :-) <gripe> What's this obsession with worst-case behaviour anyway? While I have linear-time algorithms I could use, I'm using one that's linear expected, quadratic worst-case -- but with better cache behaviour. And why not? There are O(2^n) possible inputs, who cares about the almost none that are pessimal? And that's cache as in the six-orders-of-magnitude access time difference between RAM vs. disk, not the relatively low cost of L2 cache misses. > One solution I've seen suggested, is to use quicksort to a depth of c log n (for some c), and fall back to mergesort thereafter. Or to pick a random pivot, rather than the first element. BTW, I'm fully in favor of keeping an insertion (or other) sort around that behaves nicely for sorted/almost sorted data, as a separate function available for the discriminating programmer. Okay, that was kind of rambling, to sum up: 1. The default sorting should, in order of approximate preference 1. scale well (probably means O(n log n)) 2. scale beyond available RAM 3. be fast (i.e. have low constant overhead) ?. be stable (always a nice property) 2. Other sorts should be provided for special cases that the programmer might know about, and where a different algorithm could be a win, possibly: - short sequences (bubble?) - sequences of sequences (radix?) - almost sorted/reverse sorted sequences (bubble/insertion?) - limited range (bucket?) -kzm -- If I haven't seen further, it is by standing in the footprints of giants