
13 Mar
2007
13 Mar
'07
9:08 p.m.
Hi
Huh? nub is quadratic, sort is (n log n) and nubSorted linear.
That was my first guess. Turns out for lots of cases its not quite an accurate reflection of time taken. Consider a list where a reasonable proportion of elements are the same, i.e. a [Char], where we can expect that there are probably < 100 distinct characters, if we are working with text. Given: out = nub in nub will be O(#in * #out). Sort will be O(#in log #in). Where #out is much smaller than log #in, nub will win. For a large list which is being generated "lazily" - i.e. from a file, lots of other ways etc, sort is tail strict, which also makes the memory use substantially worse. I'll get complete benchmarks in a range of situations sometime later. Thanks Neil