Create a list without duplicates from a list with duplicates

Hallo! Let's suppose I have a list [a,b,c,d,c,d]. I'd like to write a function that returns a new list without duplicates (in the example [a,b,c,d]). How can I do that? What is the most general way? I'd like to use the same function for a list of Int or String or some other user defined data type. Thank for your attention!

On 8 Feb 2008, news@lyra.net wrote:
Hallo!
Let's suppose I have a list [a,b,c,d,c,d]. I'd like to write a function that returns a new list without duplicates (in the example [a,b,c,d]). How can I do that? What is the most general way? I'd like to use the same function for a list of Int or String or some other user defined data type.
Look at Data.List: nub :: (Eq a) => [a] -> [a] nub = nubBy (==) nubBy :: (a -> a -> Bool) -> [a] -> [a] nubBy eq [] = [] nubBy eq (x:xs) = x : nubBy eq (filter (\ y -> not (eq x y)) xs) Jed

2008/2/8 Jed Brown
Look at Data.List:
nub :: (Eq a) => [a] -> [a] nub = nubBy (==)
nubBy :: (a -> a -> Bool) -> [a] -> [a] nubBy eq [] = [] nubBy eq (x:xs) = x : nubBy eq (filter (\ y -> not (eq x y)) xs)
And then there's also sort :: (Ord a) => [a] -> [a] which should have better performance, O(n log n) against O(n²) I guess, but of course will change the order of the elements. If you really don't mind the order at all, you could also use Data.Set in the first place. Cheers, -- Felipe.

As noted, (Data.Set.toList . Data.Set.fromList) is the best traditional solution if you don't care about order (or Data.Set.toAscList for a sorted result). If order is important, the new bijective Data.Bimap class http://code.haskell.org/~scook0/haddock/bimap/Data-Bimap.html may be your best bet (I haven't yet tried it myself). Meanwhile, here is a hand-rolled solution to order-preserving nubbing:
import Data.List(groupBy,sortBy,sort) import Data.Maybe(listToMaybe)
efficientNub :: (Ord a) => [a] -> [a] efficientNub = flip zip [0..] -- carry along index >>> sort -- sort by value, then index >>> groupBy equalFsts -- group adjacent equal values >>> map head -- keep only primus inter pares >>> sortBy compareSnds -- sort by index >>> map fst -- discard index
where equalFsts (x1,y1) (x2,y2) = x1 == x2 compareSnds (x1,y1) (x2,y2) = compare y1 y2 x >>> y = y . x
There is a hidden proof obligation here: Exercise: Prove that (groupBy equalFsts >>> map head) is a total function, using the defintion of groupBy from Data.List: groupBy :: (a -> a -> Bool) -> [a] -> [[a]] groupBy _ [] = [] groupBy eq (x:xs) = (x:ys) : groupBy eq zs where (ys,zs) = span (eq x) xs Felipe Lessa wrote:
2008/2/8 Jed Brown
: Look at Data.List:
nub :: (Eq a) => [a] -> [a] nub = nubBy (==)
nubBy :: (a -> a -> Bool) -> [a] -> [a] nubBy eq [] = [] nubBy eq (x:xs) = x : nubBy eq (filter (\ y -> not (eq x y)) xs)
And then there's also
sort :: (Ord a) => [a] -> [a]
which should have better performance, O(n log n) against O(n²) I guess, but of course will change the order of the elements. If you really don't mind the order at all, you could also use Data.Set in the first place.
Cheers,
------------------------------------------------------------------------
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Dan Weston wrote:
Meanwhile, here is a hand-rolled solution to order-preserving nubbing:
import Data.List(groupBy,sortBy,sort) import Data.Maybe(listToMaybe)
efficientNub :: (Ord a) => [a] -> [a] efficientNub = flip zip [0..] -- carry along index >>> sort -- sort by value, then index >>> groupBy equalFsts -- group adjacent equal values >>> map head -- keep only primus inter pares >>> sortBy compareSnds -- sort by index >>> map fst -- discard index
where equalFsts (x1,y1) (x2,y2) = x1 == x2 compareSnds (x1,y1) (x2,y2) = compare y1 y2 x >>> y = y . x
I would try something like efficientNub = catMaybes . snd . mapAccumR f empty where f s x | member x s = (s, Nothing) | otherwise = (insert x s, x) that is, threading the Set of already given results through. Tillmann

On Sat, Feb 9, 2008 at 7:36 AM, Dan Weston
If order is important, the new bijective Data.Bimap class http://code.haskell.org/~scook0/haddock/bimap/Data-Bimap.html may be your best bet (I haven't yet tried it myself).
Let me try: nub :: (Ord a) => [a] -> [a] nub = map snd . Data.Bimap.toAscList . Data.Bimap.fromList . reverse . zip [1..]
nub "hello, world!" "helo, wrd!"
Without the call to (reverse), this would still be an order-preserving nub, except that it would preserve the relative order of the *last* occurrence of each element. Actually, this makes me wonder whether fromList's behaviour should be changed, and whether I should add a "non-clobbering" variant of insert. Stuart

For Bimap is there anything like Data.Map.insertWithKey ? Stuart Cook wrote:
On Sat, Feb 9, 2008 at 7:36 AM, Dan Weston
wrote: If order is important, the new bijective Data.Bimap class http://code.haskell.org/~scook0/haddock/bimap/Data-Bimap.html may be your best bet (I haven't yet tried it myself).
Let me try:
nub :: (Ord a) => [a] -> [a] nub = map snd . Data.Bimap.toAscList . Data.Bimap.fromList . reverse . zip [1..]
nub "hello, world!" "helo, wrd!"
Without the call to (reverse), this would still be an order-preserving nub, except that it would preserve the relative order of the *last* occurrence of each element. Actually, this makes me wonder whether fromList's behaviour should be changed, and whether I should add a "non-clobbering" variant of insert.
Stuart

On Sun, Feb 10, 2008 at 12:19 AM, ChrisK
For Bimap is there anything like Data.Map.insertWithKey ?
No. I wanted to implement the insertWith family, but it wasn't clear to me what should happen if the value produced by the user's function already exists, bound to something else. One possibility might be to do nothing (or error) if that happens, making it the user's responsibility to not do such things. Whatever the case may be, it should be possible for client code to define such an operation, with no more than a constant-factor overhead. Perhaps I should wait and see if people find themselves doing this before guessing at an implementation. (Incidentally, almost everything in Map that's missing from Bimap is missing either because it doesn't make sense, or because of corner cases like this one.) Stuart

On Fri, 8 Feb 2008, news@lyra.net wrote:
Hallo!
Let's suppose I have a list [a,b,c,d,c,d]. I'd like to write a function that returns a new list without duplicates (in the example [a,b,c,d]). How can I do that? What is the most general way? I'd like to use the same function for a list of Int or String or some other user defined data type.
List.nub I assume you couldn't find this function due its cryptic name. Don't worry, many others failed before.

Of course the most *general* way requires an Eq constraint:
List.nub :: Eq a => [a] -> [a]
But there are better functions (already mentioned) with the less general Ord constraint. Int and String are instances of Ord. "some other user defined data type" probably is too, but if you mean "any other user defined data type", even Eq may not be satisfied, e.g.
data NotEvenEq a = FuncsNotEq (a->a)
The hammer you use depends on what you're hammering on. Dan Henning Thielemann wrote:
On Fri, 8 Feb 2008, news@lyra.net wrote:
Hallo!
Let's suppose I have a list [a,b,c,d,c,d]. I'd like to write a function that returns a new list without duplicates (in the example [a,b,c,d]). How can I do that? What is the most general way? I'd like to use the same function for a list of Int or String or some other user defined data type.
List.nub
participants (8)
-
ChrisK
-
Dan Weston
-
Felipe Lessa
-
Henning Thielemann
-
Jed Brown
-
news@lyra.net
-
Stuart Cook
-
Tillmann Rendel