Expand the Data.Set API

There are a few rather conspicuously missing features: 1. A way to take the intersection of a list of sets. This shouldn't really be a big deal, and it already exists for unions, but the intersection version should probably sort the sets by size before folding, or otherwise try to do something smart. 2. A way to insert an element if an == one is not already present (insert replaces an existing one with a new one). Currently, as far as I can tell, this can only be done using either if e `member` s then s else insert e s which potentially descends the tree twice for no reason or s `union` singleton e which is documented as being O(|s|+1), although I wouldn't be shocked if the documentation were too pessimistic in this case. 3. A way to delete an element and simultaneously find out whether it was in the set. David Feuer

On Thu, Mar 5, 2015 at 5:59 PM, David Feuer
There are a few rather conspicuously missing features:
1. A way to take the intersection of a list of sets. This shouldn't really be a big deal, and it already exists for unions, but the intersection version should probably sort the sets by size before folding, or otherwise try to do something smart.
Incidentally, having "unions" but not "intersections" is also true of IntSet, Map, and IntMap. Adding it to all would probably be good for consistency if nothing else, unless there's some reason not to.
3. A way to delete an element and simultaneously find out whether it was in the set.
Likewise here, I believe, though I may be forgetting something. I'm pretty sure your second point is only missing-but-useful for Data.Set, though. - C.

Hi, Am Donnerstag, den 05.03.2015, 17:59 -0500 schrieb David Feuer:
1. A way to take the intersection of a list of sets. This shouldn't really be a big deal, and it already exists for unions, but the intersection version should probably sort the sets by size before folding, or otherwise try to do something smart.
+1
2. A way to insert an element if an == one is not already present (insert replaces an existing one with a new one). Currently, as far as I can tell, this can only be done using either
if e `member` s then s else insert e s which potentially descends the tree twice for no reason
or
s `union` singleton e
which is documented as being O(|s|+1), although I wouldn't be shocked if the documentation were too pessimistic in this case.
I’m doubtful about that one. This is only needed if == is not „the right equality“, in which case a Set might be the wrong data type. Do we want to advocate that style? Also, it will make everyone wonder „do I need insert or theOtherInsert“, making the API a bit less concise and quick to grasp. Since "s `union` singleton e" does the right thing with almost the right performance, maybe that’s sufficient?
3. A way to delete an element and simultaneously find out whether it was in the set.
+1, possibly with the type a -> Set a -> Maybe (a, Set a) to get the remaining set as well. Greetings, Joachim -- Joachim “nomeata” Breitner mail@joachim-breitner.de • http://www.joachim-breitner.de/ Jabber: nomeata@joachim-breitner.de • GPG-Key: 0xF0FBF51F Debian Developer: nomeata@debian.org

Regarding (2), I was thinking about situations where sharing is an issue.
Even if == is the right equality, you may want to settle on a single copy.
That would also suggest a version of member that returns a Maybe instead of
a Bool.
On Mar 6, 2015 3:42 AM, "Joachim Breitner"
Hi,
Am Donnerstag, den 05.03.2015, 17:59 -0500 schrieb David Feuer:
1. A way to take the intersection of a list of sets. This shouldn't really be a big deal, and it already exists for unions, but the intersection version should probably sort the sets by size before folding, or otherwise try to do something smart.
+1
2. A way to insert an element if an == one is not already present (insert replaces an existing one with a new one). Currently, as far as I can tell, this can only be done using either
if e `member` s then s else insert e s which potentially descends the tree twice for no reason
or
s `union` singleton e
which is documented as being O(|s|+1), although I wouldn't be shocked if the documentation were too pessimistic in this case.
I’m doubtful about that one.
This is only needed if == is not „the right equality“, in which case a Set might be the wrong data type. Do we want to advocate that style?
Also, it will make everyone wonder „do I need insert or theOtherInsert“, making the API a bit less concise and quick to grasp.
Since "s `union` singleton e" does the right thing with almost the right performance, maybe that’s sufficient?
3. A way to delete an element and simultaneously find out whether it was in the set.
+1, possibly with the type a -> Set a -> Maybe (a, Set a) to get the remaining set as well.
Greetings, Joachim
-- Joachim “nomeata” Breitner mail@joachim-breitner.de • http://www.joachim-breitner.de/ Jabber: nomeata@joachim-breitner.de • GPG-Key: 0xF0FBF51F Debian Developer: nomeata@debian.org
_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

On Fri, 6 Mar 2015, Joachim Breitner wrote:
2. A way to insert an element if an == one is not already present (insert replaces an existing one with a new one). Currently, as far as I can tell, this can only be done using either
if e `member` s then s else insert e s which potentially descends the tree twice for no reason
or
s `union` singleton e
which is documented as being O(|s|+1), although I wouldn't be shocked if the documentation were too pessimistic in this case.
I’m doubtful about that one.
This is only needed if == is not „the right equality“, in which case a Set might be the wrong data type. Do we want to advocate that style?
We should use Map for element types that contain data which is irrelevant to the order.
Since "s `union` singleton e" does the right thing with almost the right performance, maybe that’s sufficient?
I guess he prefered something like runtime log |s| + 1.
3. A way to delete an element and simultaneously find out whether it was in the set.
+1, possibly with the type a -> Set a -> Maybe (a, Set a) to get the remaining set as well.
This would be consistent to minView and maxView. Until there is such a delete function, splitMember might be used in a work-around.

On 2015-03-05 at 23:59:23 +0100, David Feuer wrote:
There are a few rather conspicuously missing features:
[...]
2. A way to insert an element if an == one is not already present (insert replaces an existing one with a new one). Currently, as far as I can tell, this can only be done using either
if e `member` s then s else insert e s which potentially descends the tree twice for no reason
or
s `union` singleton e
which is documented as being O(|s|+1), although I wouldn't be shocked if the documentation were too pessimistic in this case.
3. A way to delete an element and simultaneously find out whether it was in the set.
Maybe (2) and (3) can be combined, c.f. https://github.com/gregorycollins/hashtables/issues/6#issuecomment-37523794 there's related tickets in unordered-containers too: https://github.com/tibbe/unordered-containers/issues/93 (even though those are for associative arrays, it'd be good if we could come up with a coherent story for these operations rather than ending up with subtly differently named/typed operations -- assuming it makes sense to find a uniform scheme) Cheers, hvr

On Thu, Mar 05, 2015 at 05:59:23PM -0500, David Feuer wrote:
1. A way to take the intersection of a list of sets. This shouldn't really be a big deal, and it already exists for unions, but the intersection version should probably sort the sets by size before folding, or otherwise try to do something smart.
I would guess the major reason intersection is more awkward than union is that the intersection of the empty list is not defined (there's no identity for the intersection operation), whereas for unions that problem doesn't exist. I don't think that's a dealbreaker -- foldr1 exists, after all -- but it does open a discussion about whether we want to error on the empty list or force the user to provide a first element or something.

What would be use-cases for (2)? As Joachim pointed out, for any reasonable
data type inserting an equal element should have no difference.
For (3) I'd be in favor of
alterF :: (Functor f, Ord a) => a -> (Bool -> f Bool) -> Set a -> f
(Set a)
(with any reasonable name) which'd allow to examine a set and possibly
modify it in one traversal. It should be smart enough not to modify the set
at all if the output of a given function is the same as its input. And it'd
also fit with lens. In particular, query+delete could be then expressed as
memberDelete :: (Ord a) => a -> Set a -> (Bool, Set a)
memberDelete k = alterF k (flip (,) False)
Petr
čt 5. 3. 2015 v 23:59 odesílatel David Feuer
There are a few rather conspicuously missing features:
1. A way to take the intersection of a list of sets. This shouldn't really be a big deal, and it already exists for unions, but the intersection version should probably sort the sets by size before folding, or otherwise try to do something smart.
2. A way to insert an element if an == one is not already present (insert replaces an existing one with a new one). Currently, as far as I can tell, this can only be done using either
if e `member` s then s else insert e s which potentially descends the tree twice for no reason
or
s `union` singleton e
which is documented as being O(|s|+1), although I wouldn't be shocked if the documentation were too pessimistic in this case.
3. A way to delete an element and simultaneously find out whether it was in the set.
David Feuer _______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

On Mar 17, 2015 4:57 PM, "Petr Pudlák"
What would be use-cases for (2)? As Joachim pointed out, for any
reasonable data type inserting an equal element should have no difference. I don't feel strongly about it; it was just a thought.
For (3) I'd be in favor of
alterF :: (Functor f, Ord a) => a -> (Bool -> f Bool) -> Set a -> f (Set a)
That seems quite reasonable.
participants (8)
-
Ben Millwood
-
Casey McCann
-
David Feuer
-
Henning Thielemann
-
Herbert Valerio Riedel
-
Joachim Breitner
-
John Wiegley
-
Petr Pudlák