RE: Weird profiling behaviour

The default definition of sortBy uses insertion sort
I have vague recollection of the wisdom of this choice being questioned
And now I think I'm about question it as well...
5.02 uses quicksort, but 5.04 will use mergesort instead which has much more predictable performance behaviour. Cheers, Simon

Simon, | 5.02 uses quicksort, but 5.04 will use mergesort | instead which has much more predictable performance | behaviour. What implementation of mergesort are you using? (Could you send me code?) I found that all implementations of mergesort I tried perform badly on large list (like 10000 elements). GHC blows out of stack in that case, but not GHC's current built-in sort function. I assume you also realize that quicksort behaves more lazy than mergesort, which might be the reason for the above. I guess there should be a module from which one can pick the best known implementation of a particular sorting algorithm. Regards, /Koen.

G'day all. On Wed, Jun 26, 2002 at 04:06:50PM +0200, Koen Claessen wrote:
What implementation of mergesort are you using? (Could you send me code?)
I don't know for sure, but I'd be willing to bet it's Carsten Holst's merge sort with Andy Gill's partition function. If you have the GHC sources handy, you'll find it in ghc/compiler/utils/Util.lhs and the function is called mergeSort (note the capitalisation). There are a number of variations for various alternate orderings in there too. If you don't have the GHC source and don't want to download it for some reason, I've temporarily put a copy of the relevant file at: http://alicorna.net/Util.lhs This will be removed in a few days. Cheers, Andrew Bromage

"Simon Marlow"
5.02 uses quicksort,
That's funny, since I see quadratic scaling, I must be hitting worst case both times? 'sort' and 'sortBy' *are* implemented in the same way, right? -kzm -- If I haven't seen further, it is by standing in the footprints of giants

Ketil Z. Malde wrote:
5.02 uses quicksort,
That's funny, since I see quadratic scaling, I must be hitting worst case both times? 'sort' and 'sortBy' *are* implemented in the same way, right?
Implementations of QuickSort on lists usually take the easy option of using the head of the list as the threshold value for partitioning. As a consequence QuickSort does indeed degenerate to quadratic cost in the worst case. Also, curiously enough, it could just as well be the problem that your int-sorting phase has too *little* sorting to do, as this common version of quickSort degenerates both for in-order and reverse-order inputs. Regards Colin R

Colin Runciman
Also, curiously enough, it could just as well be the problem that your int-sorting phase has too *little* sorting to do, as this common version of quickSort degenerates both for in-order and reverse-order inputs.
*lights go on* Of course! While I have about 90K values to sort, it's only a range from 0 to about 5-600, and a less than even distribution at that. (I must be a lot denser than I thought. Colin, if you ever happen to pass by, do let me know, I think I owe you a beer.) Okay: bucket sort; does anybody know of a nice bucket sort I can rip off? :-) (Actually, while I haven't done the math or the tests to say for sure, I suspect a trivial mod to QS where equals are kept in a separate list might do just fine. Would that be a sensible modification to put in the standard libraries, I wonder?) Thanks again! -kzm -- If I haven't seen further, it is by standing in the footprints of giants

On Thursday 27 June 2002 12:10, Ketil Z. Malde wrote:
*lights go on*
Of course! While I have about 90K values to sort, it's only a range from 0 to about 5-600, and a less than even distribution at that. (I must be a lot denser than I thought. Colin, if you ever happen to pass by, do let me know, I think I owe you a beer.)
Why don't you implement a decent quicksort (like randomized quicksort) that
works on an array?
--
Eray Ozkural
participants (6)
-
Andrew J Bromage
-
Colin Runciman
-
Eray Ozkural
-
ketil@ii.uib.no
-
Koen Claessen
-
Simon Marlow