
Hi, Wondering about time space consuming: 'nub' vs 'map head.group.sort' Consider: ry = [1..10000] ++ replicate 13 5 ++ replicate 21 34 *Main> length . nub $ ry 10000 (5.18 secs, 1050000 bytes) *Main> length . map head . group . sort $ ry 10000 (0.03 secs, 6293384 bytes) Time space nub --- + fnub +++ - + is better ;-) Thanks @@i=arie

AG> Wondering about time space consuming: 'nub' vs 'map AG> head.group.sort' Prelude> :t Data.List.nub Data.List.nub :: (Eq a) => [a] -> [a] Prelude> :t Data.List.sort Data.List.sort :: (Ord a) => [a] -> [a] nub uses less information than sort, so it MUST be slower.

Arie Groeneveld wrote:
Hi,
Wondering about time space consuming: 'nub' vs 'map head.group.sort'
Consider:
ry = [1..10000] ++ replicate 13 5 ++ replicate 21 34
*Main> length . nub $ ry 10000 (5.18 secs, 1050000 bytes)
*Main> length . map head . group . sort $ ry 10000 (0.03 secs, 6293384 bytes)
Time space nub --- + fnub +++ -
+ is better ;-)
Thanks
@@i=arie
nub is working on unsorted input. If you want (nub.sort) then the best thing to use is a merge sort that discards duplicates as it works. Copying and modifying GHC's Data.List.sort code:
-- stolen from http://darcs.haskell.org/packages/base/Data/List.hs -- with 'merge' changed to discard duplicates.
nsort l = mergesort compare l nsortBy cmp l = mergesort compare l
mergesort :: (a -> a -> Ordering) -> [a] -> [a] mergesort cmp = mergesort' cmp . map wrap
mergesort' :: (a -> a -> Ordering) -> [[a]] -> [a] mergesort' cmp [] = [] mergesort' cmp [xs] = xs mergesort' cmp xss = mergesort' cmp (merge_pairs cmp xss)
merge_pairs :: (a -> a -> Ordering) -> [[a]] -> [[a]] merge_pairs cmp [] = [] merge_pairs cmp [xs] = [xs] merge_pairs cmp (xs:ys:xss) = merge cmp xs ys : merge_pairs cmp xss
merge :: (a -> a -> Ordering) -> [a] -> [a] -> [a] merge cmp xs [] = xs merge cmp [] ys = ys merge cmp (x:xs) (y:ys) = case x `cmp` y of GT -> y : merge cmp (x:xs) ys LT -> x : merge cmp xs (y:ys) EQ -> x : merge cmp xs ys
wrap :: a -> [a] wrap x = [x]
Then you can use nsort or nsortBy, which benchmark (with -O2) as slightly faster than (map head . group . sort) Cheers, Chris

I tried this version,
however....
nsortBy cmp l = mergesort compare l should be:
nsortBy cmp l = mergesort cmp l
Thanks Sorry, with this version I meant:
nsort l = mergesort compare l nsortBy cmp l = mergesort compare l
mergesort :: (a -> a -> Ordering) -> [a] -> [a] mergesort cmp = mergesort' cmp . map wrap
mergesort' :: (a -> a -> Ordering) -> [[a]] -> [a] mergesort' cmp [] = [] mergesort' cmp [xs] = xs mergesort' cmp xss = mergesort' cmp (merge_pairs cmp xss)
merge_pairs :: (a -> a -> Ordering) -> [[a]] -> [[a]] merge_pairs cmp [] = [] merge_pairs cmp [xs] = [xs] merge_pairs cmp (xs:ys:xss) = merge cmp xs ys : merge_pairs cmp xss
merge :: (a -> a -> Ordering) -> [a] -> [a] -> [a] merge cmp xs [] = xs merge cmp [] ys = ys merge cmp (x:xs) (y:ys) = case x `cmp` y of GT -> y : merge cmp (x:xs) ys LT -> x : merge cmp xs (y:ys) EQ -> x : merge cmp xs ys
wrap :: a -> [a] wrap x = [x]
participants (3)
-
Arie Groeneveld
-
haskell@list.mightyreason.com
-
Miguel Mitrofanov