Re: Proposal: Add Data.List.sortNub and sortNubBy

14 Mar 2007

      Hi
...
Huh? nub is quadratic, sort is (n log n) and nubSorted linear.
That was my first guess. Turns out for lots of cases its not quite an
accurate reflection of time taken.

Consider a list where a reasonable proportion of elements are the
same, i.e. a [Char], where we can expect that there are probably < 100
distinct characters, if we are working with text.

Given: out = nub in

nub will be O(#in * #out). Sort will be O(#in log #in). Where #out is
much smaller than log #in, nub will win.

For a large list which is being generated "lazily" - i.e. from a file,
lots of other ways etc, sort is tail strict, which also makes the
memory use substantially worse.

I'll get complete benchmarks in a range of situations sometime later.

Thanks

Neil

Re: Proposal: Add Data.List.sortNub and sortNubBy

Neil Mitchell