
Arie Groeneveld wrote:
Hi,
Wondering about time space consuming: 'nub' vs 'map head.group.sort'
Consider:
ry = [1..10000] ++ replicate 13 5 ++ replicate 21 34
*Main> length . nub $ ry 10000 (5.18 secs, 1050000 bytes)
*Main> length . map head . group . sort $ ry 10000 (0.03 secs, 6293384 bytes)
Time space nub --- + fnub +++ -
+ is better ;-)
Thanks
@@i=arie
nub is working on unsorted input. If you want (nub.sort) then the best thing to use is a merge sort that discards duplicates as it works. Copying and modifying GHC's Data.List.sort code:
-- stolen from http://darcs.haskell.org/packages/base/Data/List.hs -- with 'merge' changed to discard duplicates.
nsort l = mergesort compare l nsortBy cmp l = mergesort compare l
mergesort :: (a -> a -> Ordering) -> [a] -> [a] mergesort cmp = mergesort' cmp . map wrap
mergesort' :: (a -> a -> Ordering) -> [[a]] -> [a] mergesort' cmp [] = [] mergesort' cmp [xs] = xs mergesort' cmp xss = mergesort' cmp (merge_pairs cmp xss)
merge_pairs :: (a -> a -> Ordering) -> [[a]] -> [[a]] merge_pairs cmp [] = [] merge_pairs cmp [xs] = [xs] merge_pairs cmp (xs:ys:xss) = merge cmp xs ys : merge_pairs cmp xss
merge :: (a -> a -> Ordering) -> [a] -> [a] -> [a] merge cmp xs [] = xs merge cmp [] ys = ys merge cmp (x:xs) (y:ys) = case x `cmp` y of GT -> y : merge cmp (x:xs) ys LT -> x : merge cmp xs (y:ys) EQ -> x : merge cmp xs ys
wrap :: a -> [a] wrap x = [x]
Then you can use nsort or nsortBy, which benchmark (with -O2) as slightly faster than (map head . group . sort) Cheers, Chris