I think ordNub in Data.Set, hashNub in Data.HashSet or somewhere, and bitsNub in Data.Bits (for FiniteBits things).
tldr: nub is abnormally slow, we shouldn't use it, but we do.
As you might know, Data.List.nub is O(n²). (*)
As you might not know, almost *all* practical Haskell projects use it,
and that in places where an Ord instance is given, e.g. happy, Xmonad,
ghc-mod, Agda, darcs, QuickCheck, yesod, shake, Cabal, haddock, and 600
more (see https://github.com/nh2/haskell-ordnub).
I've taken the Ord-based O(n * log n) implementation from yi using a Set:
ordNub :: (Ord a) => [a] -> [a]
ordNub l = go empty l
where
go _ [] = []
go s (x:xs) = if x `member` s then go s xs
else x : go (insert x s) xs
and put benchmarks on
http://htmlpreview.github.io/?https://github.com/nh2/haskell-ordnub/blob/1f0a2c94a/report.html
(compare `nub` vs `ordNub`).
`ordNub` is not only in a different complexity class, but even seems to
perform better than nub for very small numbers of actually different
list elements (that's the numbers before the benchmark names).
(The benchmark also shows some other potential problem: Using a state
monad to keep the set instead of a function argument can be up to 20
times slower. Should that happen?)
What do you think about ordNub?
I've seen a proposal from 5 years ago about adding a *sort*Nub function
started by Neil, but it just died.
(*) The mentioned complexity is for the (very common) worst case, in
which the number of different elements in the list grows with the list
(alias you don't have an N element list with always only 5 different
things inside).
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe