
Neil Mitchell wrote:
There are 12 instances in Hoogle, for example. If profiling later shows it to be a problem, I'd fix it - but I can't ever actually remember that being the case. Premature optimisation is the root of all evil.
I'm sure most of the community would agree with you, but I have to say that if the consequence of this philosophy is horrors like this in the standard libs, it should come as no surprise that Haskell has a reputation for being "slow". What else is lurking there I wonder? What's really bad is that the terrible performance isn't even documented. It may be obvious, but it should still be documented. Has anybody even the remotest clue why darcs is (apparently) so slow? Maybe the community itself should share some of the blame for this. Like it wasn't obvious to me that the uses of nub I saw in darcs could rely on very short lists (<=1 element :-)
Having nubOrd is a great idea, I even proposed it previously, but people disagreed with me. Propose it and add it by all means.
Like I said, I'm not proposing it, as it doesn't seem to possible to implement it efficiently using anything else in the standard libs. You could do nubOrd (but not nubOrdBy) using Data.Set. But there are 2 problems.. 1- This not only introduces a cyclic dependency between modules, but also packages. I'm not sure how well various compilers and cabal would deal with this between them, but I'm not optimistic. 2- Data.Set is not obviously the best underlying implementation (in fact it is obviously not the best underlying implementation, this and Data.Map etc really should be pensioned off to hackage along with the rest of the badly documented, unreliable, inefficient and unstable junk :-) So I still think they should be deprecated. It seems like the least bad option if we can agree that their use should be strongly discouraged.
So could nub and nubBy be deprecated and an appropriate health warning added to the Haddock?
No. They should say O(n^2) in the haddock documentation, but O(n^2) /= useless.
I would say it is if there are many obvious O(n.(log n)) or better implementations that can be used in in 99%+ of cases. I mean so obvious that users can quite easily roll their own in 3 lines of code or less. Regards -- Adrian Hey