Re: [Haskell-cafe] Time consumption nub

18 Jul 2007


      Arie Groeneveld wrote:
...
Hi,
Wondering about time space consuming:  'nub' vs 'map head.group.sort'
Consider:
ry = [1..10000] ++ replicate 13 5 ++ replicate 21 34
*Main> length . nub $ ry
10000
(5.18 secs, 1050000 bytes)
*Main> length . map head . group . sort $ ry
10000
(0.03 secs, 6293384 bytes)
Time     space
nub     ---      +
fnub    +++      -
+ is better ;-)
Thanks
@@i=arie
nub is working on unsorted input.

If you want (nub.sort) then the best thing to use is a merge sort that discards 
duplicates as it works.  Copying and modifying GHC's Data.List.sort code:
...
-- stolen from http://darcs.haskell.org/packages/base/Data/List.hs
-- with 'merge' changed to discard duplicates.
...
nsort l = mergesort compare l
nsortBy cmp l = mergesort compare l
mergesort :: (a -> a -> Ordering) -> [a] -> [a]
mergesort cmp = mergesort' cmp . map wrap
mergesort' :: (a -> a -> Ordering) -> [[a]] -> [a]
mergesort' cmp [] = []
mergesort' cmp [xs] = xs
mergesort' cmp xss = mergesort' cmp (merge_pairs cmp xss)
merge_pairs :: (a -> a -> Ordering) -> [[a]] -> [[a]]
merge_pairs cmp [] = []
merge_pairs cmp [xs] = [xs]
merge_pairs cmp (xs:ys:xss) = merge cmp xs ys : merge_pairs cmp xss
merge :: (a -> a -> Ordering) -> [a] -> [a] -> [a]
merge cmp xs [] = xs
merge cmp [] ys = ys
merge cmp (x:xs) (y:ys)
 = case x `cmp` y of
        GT -> y : merge cmp (x:xs)   ys
        LT -> x : merge cmp    xs (y:ys)
        EQ -> x : merge cmp    xs    ys
wrap :: a -> [a]
wrap x = [x]
Then you can use nsort or nsortBy, which benchmark (with -O2) as slightly faster 
than (map head . group . sort)

Cheers,
   Chris

Re: [Haskell-cafe] Time consumption nub

haskell＠list.mightyreason.com