Re: [Haskell-cafe] ordNub

14 Oct 2013

      ...
Niklas Hambüchen  writes:
...
On 13/10/13 21:42, AntC wrote:
...
If you use the Set library, that fact may be very visible!
Because Set re-sequences the whole list, as per its Ord instance.
But List.nub preserves the list sequence
(except for omitting duplicates).
I mean *exactly* what you say here.
ordNub behaves has the same behaviour as nub, while (Set.toList .
Set.fromList) doesn't.
That's great, thank you.
...
...
[BTW I am still less than convinced that overall a Set-based ordNub is 
significantly more efficient. I suspect it depends on how big is your 
list.]
What do you mean?
ordNub is clearly in a different complexity class, ...
Yes, I'm not disputing that.
...
... and the benchmarks that I provided show not only this,
but also that ordNub is *always* faster than nub,
even for singleton lists.
Thanks Niklas, I hadn't spotted those benchmarks back in July.

I'm surprised at that result for singletons 
(and for very small numbers of elements which are in fact each different).

Especially because List's `nub` uses `filter` ==> fold, which should be 
tail-recursive.

It seems to me that for small numbers, the Set-based approach still 
requires comparing each element to each other. Plus there's the overhead 
for building the Set and inserting each element into it -- where `insert` 
again walks the Set to find the insertion point.

Then here's a further possible optimisation, instead of making separate 
calls to `member` and `insert`:
* Make a single call to
    insert' :: (Ord a) => a -> Set a -> (Bool, Set a)
* The Bool returns True if already a member.
* Else returns an updated Set in the snd, with the element inserted.