
Andrew J Bromage
I, for one, am sorting expected, not worst-case, data :-) What's this obsession with worst-case behaviour anyway?
The best algorithm to use is the one which exploits known facts about the data. The converse is: The worst algorithm to use is the one whose worst case behaviour happens on the sort of data you want to feed it.
Okay. (As an aside, and for Num type classes, have anybody tried calculating the average, and using that as a pivot? I mean, I know we really want the median, but the average is at least available?)
So if, for example, you know that your data will cause expected-case behaviour on average, more power to you.
Isn't that what "expected-case" means? :-) I was going to run through the statistics to work out the expected running time for a quick sort (i.e. sorting random data); I wrestled a bit with it, and while I'm fairly sure it's pretty close to n log n, the proof is at the moment left as an excercise for the reader... No matter, let's look at string searching: In theory, Boyers-Moore or Knuth-Morris-Pratt are faster than a naive method (ie. just using something like find pat = or . map (pat `isPrefixOf`) . tails This is worst-case O(n*m), since you may have to traverse the whole pattern each time before deciding it's not a prefix of a tail, and moving on. The competition is O(n+m) which sounds a lot better. But for random data, the chance of a match in the first character is equal to the alphabet size, two matches is alpsz², and so on. Most of the time, we only traverse one or two places before finding a mismatch, and since the algorithm is so much simpler than the alternatives, it's generally faster (I implemented KMP and measured, and I think you need *highly* repetitive data for it to be worth the trouble) -kzm -- If I haven't seen further, it is by standing in the footprints of giants