Proposal: add Int indexing functions to Data.Set

newer
Re: Proposal: Tidy up and export...

Luis Casillas

29 Apr 2011 29 Apr '11

4:08 a.m.

Hi everybody. First time proposer here, so bear with me. While I was looking at the Haddock documentation for Data.Map, I noticed that it has a section with "Indexed" operations that address the map with Int indexes: http://www.haskell.org/ghc/docs/latest/html/libraries/containers/Data-Map.ht... The relevant functions' signatures are these: lookupIndex :: Ord k => k -> Map k a -> Maybe Int findIndex :: Ord k => k -> Map k a -> Int elemAt :: Int -> Map k a -> (k, a) updateAt :: (k -> a -> Maybe a) -> Int -> Map k a -> Map k a deleteAt :: Int -> Map k a -> Map k a Now, the Data.Set module, which is a near carbon-copy of Data.Map, does not have any counterparts to these functions. My proposal is to add such counterparts to Data.Set (I'm writing some code that could benefit from them). The proposed functions' signatures: lookupIndex :: Ord a => a -> Set a -> Maybe Int findIndex :: Ord a => a -> Set a -> Int elemAt :: Int -> Set a -> a deleteAt :: Int -> Set a -> Set a I attach a patch to add these to Data.Set. Note that the implementation of the Set functions is basically a copy-paste-and-edit of their counterparts in Data.Map. That seems to be the convention in those two modules; Data.Set is really just like Data.Map but with no "value" field in the datatype's constructor. Since I am a noob, I mistakenly created a Trac ticket before figuring out the correct library proposal process (#5162, which promptly got closed; my apologies). There have been a couple of responses there, which I will summarize here: 1. Concern over the efficiency of the code in my patch (Trac comment by tibbe). At my present skill level with Haskell, frankly, the most intelligent response I can give to this is that I'm copying the Data.Map implementation from darcs to the best of my ability, and that the most I hope is to make the Set versions no worse than the Map versions. 2. Concern that Int-based indexing breaks the set abstraction by exposing a specific ordering (Trac comment by Malcolm Wallace). My assumption when I decided to submit this patch was that if it's OK for Data.Map to have these operations, then it should also be ok for Data.Set to have them. But Malcolm's concern makes me wonder if some people will find the original Data.Map functions objectionable and dislike the idea of "spreading the infection" to Data.Set. Discussion period: 2 weeks.

Attachments:

Data-Set.hs.patch (application/octet-stream — 4.6 KB)

Show replies by date

Ivan Lazar Miljenovic

29 Apr 29 Apr

4:16 a.m.

On 29 April 2011 14:08, Luis Casillas wrote:

...

2. Concern that Int-based indexing breaks the set abstraction by exposing a specific ordering (Trac comment by Malcolm Wallace). My assumption when I decided to submit this patch was that if it's OK for Data.Map to have these operations, then it should also be ok for Data.Set to have them. But Malcolm's concern makes me wonder if some people will find the original Data.Map functions objectionable and dislike the idea of "spreading the infection" to Data.Set.

I think I agree with Malcolm here; the internals of Data.Set shouldn't be revealed IMHO. -- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com IvanMiljenovic.wordpress.com

Luis Casillas

5:33 a.m.

El abr 28, 2011, a las 9:16 p.m., Ivan Lazar Miljenovic escribió:

...

On 29 April 2011 14:08, Luis Casillas wrote:

...
2. Concern that Int-based indexing breaks the set abstraction by exposing a specific ordering (Trac comment by Malcolm Wallace). My assumption when I decided to submit this patch was that if it's OK for Data.Map to have these operations, then it should also be ok for Data.Set to have them. But Malcolm's concern makes me wonder if some people will find the original Data.Map functions objectionable and dislike the idea of "spreading the infection" to Data.Set.

I think I agree with Malcolm here; the internals of Data.Set shouldn't be revealed IMHO.

Would either of the following modifications make the proposal more palatable? 1. Stressing in the functions' documentation that the assignment of Ints does not have to reflect the order of the elements. The examples would also be rewritten not to make any ordering assumptions. 2. Modifying the implementation to scramble the order of the indexes relative to the order of the set elements. This would discourage people from implicitly coding to the element order. I'm not at all committed to having the order of the indexes reflect the order of the set members or any specific internal property of the set implementation. I'm happy with any one-to-one, reversible mapping between sets and Ints in the range [0..(size-1)]. I'm also happy if the mapping changes between library versions (or heck, even between different invocations of the same program).

Ivan Lazar Miljenovic

5:42 a.m.

On 29 April 2011 15:33, Luis Casillas wrote:

...

El abr 28, 2011, a las 9:16 p.m., Ivan Lazar Miljenovic escribió:

...
On 29 April 2011 14:08, Luis Casillas wrote:

...
2. Concern that Int-based indexing breaks the set abstraction by exposing a specific ordering (Trac comment by Malcolm Wallace). My assumption when I decided to submit this patch was that if it's OK for Data.Map to have these operations, then it should also be ok for Data.Set to have them. But Malcolm's concern makes me wonder if some people will find the original Data.Map functions objectionable and dislike the idea of "spreading the infection" to Data.Set.

I think I agree with Malcolm here; the internals of Data.Set shouldn't be revealed IMHO.

Would either of the following modifications make the proposal more palatable?

1. Stressing in the functions' documentation that the assignment of Ints does not have to reflect the order of the elements. The examples would also be rewritten not to make any ordering assumptions.

2. Modifying the implementation to scramble the order of the indexes relative to the order of the set elements. This would discourage people from implicitly coding to the element order.

I'm not at all committed to having the order of the indexes reflect the order of the set members or any specific internal property of the set implementation. I'm happy with any one-to-one, reversible mapping between sets and Ints in the range [0..(size-1)]. I'm also happy if the mapping changes between library versions (or heck, even between different invocations of the same program).

Well, before we talk about modifications, etc.: is there a _need_ for this kind of indexing in a Set? -- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com IvanMiljenovic.wordpress.com

Luis Casillas

6:08 a.m.

El abr 28, 2011, a las 10:42 p.m., Ivan Lazar Miljenovic escribió:

...

Well, before we talk about modifications, etc.: is there a _need_ for this kind of indexing in a Set?

Well, to take a step back, the reasoning that led me to make this proposal is really no better than this: 1. Map already has analogues to the Set functions I'm proposing. 2. There is some good reason why Map should have those functions. 3. Whatever reason that is, it also applies to Set. 4. Ergo, Set should have these functions. It looks to me like you disagree with (2) or (3), and you're the second person to do so (out of three who have responded so far). If the majority of responses turn out like this, well, then the case for my proposal is just not as strong as I assumed it would be.

Ivan Lazar Miljenovic

6:50 a.m.

On 29 April 2011 16:08, Luis Casillas wrote:

...

El abr 28, 2011, a las 10:42 p.m., Ivan Lazar Miljenovic escribió:

...
Well, before we talk about modifications, etc.: is there a _need_ for this kind of indexing in a Set?

Well, to take a step back, the reasoning that led me to make this proposal is really no better than this:

1. Map already has analogues to the Set functions I'm proposing. 2. There is some good reason why Map should have those functions. 3. Whatever reason that is, it also applies to Set. 4. Ergo, Set should have these functions.

It looks to me like you disagree with (2) or (3), and you're the second person to do so (out of three who have responded so far). If the majority of responses turn out like this, well, then the case for my proposal is just not as strong as I assumed it would be.

Actually, I primarily disagree with 2) ;-) Maybe we need an alternate proposal: remove index-based functions from Data.Map... -- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com IvanMiljenovic.wordpress.com

Balazs Komuves

9:34 a.m.

Isn't for example "elemAt k someMapOrSet" equivalent (but much faster) to "(toList someMapOrSet) !! k", and similarly for all the other index functions? So how does it break any abstraction? While I have to agree (referring to the old discussion) that the Map/Set API is far from perfect, I'm much more in favour of exposing efficient functionality than removing useful stuff... Regards, Balazs Concern that Int-based indexing breaks the set abstraction by exposing a

...

specific ordering (Trac comment by Malcolm Wallace). My assumption when I decided to submit this patch was that if it's OK for Data.Map to have these operations, then it should also be ok for Data.Set to have them.

On Fri, Apr 29, 2011 at 8:50 AM, Ivan Lazar Miljenovic < ivan.miljenovic@gmail.com> wrote:

...

On 29 April 2011 16:08, Luis Casillas wrote:

...
El abr 28, 2011, a las 10:42 p.m., Ivan Lazar Miljenovic escribió:

...
Well, before we talk about modifications, etc.: is there a _need_ for this kind of indexing in a Set?

Well, to take a step back, the reasoning that led me to make this

proposal is really no better than this:

...
1. Map already has analogues to the Set functions I'm proposing. 2. There is some good reason why Map should have those functions. 3. Whatever reason that is, it also applies to Set. 4. Ergo, Set should have these functions.

It looks to me like you disagree with (2) or (3), and you're the second

person to do so (out of three who have responded so far). If the majority of responses turn out like this, well, then the case for my proposal is just not as strong as I assumed it would be.

Actually, I primarily disagree with 2) ;-)

Maybe we need an alternate proposal: remove index-based functions from Data.Map...

-- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com IvanMiljenovic.wordpress.com

_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

Wolfram Kahl

2:26 p.m.

On Fri, Apr 29, 2011 at 11:34:33AM +0200, Balazs Komuves wrote:

...

Isn't for example "elemAt k someMapOrSet" equivalent (but much faster) to "(toList someMapOrSet) !! k",

+1 (If this specification (probably better with toAscList) is not documented, it should be.)

...

and similarly for all the other index functions? So how does it break any abstraction?

While I have to agree (referring to the old discussion) that the Map/Set API is far from perfect, I'm much more in favour of exposing efficient functionality than removing useful stuff...

+1 Wolfram

wren ng thornton

30 Apr 30 Apr

9:34 p.m.

On 4/29/11 2:08 AM, Luis Casillas wrote:

...

El abr 28, 2011, a las 10:42 p.m., Ivan Lazar Miljenovic escribió:

...
Well, before we talk about modifications, etc.: is there a _need_ for this kind of indexing in a Set?

Well, to take a step back, the reasoning that led me to make this proposal is really no better than this:

1. Map already has analogues to the Set functions I'm proposing. 2. There is some good reason why Map should have those functions. 3. Whatever reason that is, it also applies to Set. 4. Ergo, Set should have these functions.

It looks to me like you disagree with (2) or (3), and you're the second person to do so (out of three who have responded so far). If the majority of responses turn out like this, well, then the case for my proposal is just not as strong as I assumed it would be.

I'm on the side of disagreeing with (2)/(3). Perhaps there's a good reason for them to be there, but I've yet to see it. They look to me like the kind of clutter that tends to accrue in the interfaces you see in OO and imperative code; they may have been pragmatic once for somebody, but they don't make sense to the abstraction being provided by the API (and they don't appear to make enough sense to alter the abstraction to include them either). -- Live well, ~wren

Jan-Willem Maessen

29 Apr 29 Apr

3:05 p.m.

On Fri, Apr 29, 2011 at 1:42 AM, Ivan Lazar Miljenovic wrote:

...

On 29 April 2011 15:33, Luis Casillas wrote:

...
El abr 28, 2011, a las 9:16 p.m., Ivan Lazar Miljenovic escribió:

...
On 29 April 2011 14:08, Luis Casillas wrote:

...
2. Concern that Int-based indexing breaks the set abstraction by exposing a specific ordering (Trac comment by Malcolm Wallace). My assumption when I decided to submit this patch was that if it's OK for Data.Map to have these operations, then it should also be ok for Data.Set to have them. But Malcolm's concern makes me wonder if some people will find the original Data.Map functions objectionable and dislike the idea of "spreading the infection" to Data.Set.

I think I agree with Malcolm here; the internals of Data.Set shouldn't be revealed IMHO.

Would either of the following modifications make the proposal more palatable?

1. Stressing in the functions' documentation that the assignment of Ints does not have to reflect the order of the elements. The examples would also be rewritten not to make any ordering assumptions.

2. Modifying the implementation to scramble the order of the indexes relative to the order of the set elements. This would discourage people from implicitly coding to the element order. ...

Well, before we talk about modifications, etc.: is there a _need_ for this kind of indexing in a Set?

I believe there is not. Moreover, in order to realize efficient Int indexing, we need to commit a word of storage to hold subtree size on every interior tree node in Data.Map, Data.Set, Data.IntSet, and Data.IntMap. Without extra interior storage, these tree indexing and list indexing are comparably slow (O(n)). My understanding is these functions were included in Data.Map *because we could*: the implementation uses size-balanced trees, so the tree size had to be kept at each node for balancing purposes, and therefore it was easy to implement indexing. As a happy side effect, this also makes it easy to maintain O(1) size information. However, indexing precludes the use of implementations that *don't* cache subtree sizes on a large fraction of interior nodes (it is already a bad match for IntSet and IntMap). There are other techniques for making size O(1) without imposing per-node overhead, if that is desired. It would be a shame to rule out alternative implementations in order to support an API that nobody needs. I'd support removing these functions absent a compelling use case (ie one that would not be better expressed as indexing by key). Does anyone have one? -Jan-Willem Maessen

Balazs Komuves

3:30 p.m.

However, indexing precludes the use of implementations that *don't*

...

cache subtree sizes on a large fraction of interior node

The Data.Map API also precludes hash-map implementations, should we then remove 90% of the functionality because of that?

...

I'd support removing these functions absent a compelling use case (ie one that would not be better expressed as indexing by key). Does anyone have one?

Does splitting at the middle for parallel processing qualify as a compelling use case? (though for that purpose maybe it would be better to expose splitting the tree structure, since it is balanced anyway, and also unsafely joining two trees). My experience with the Map/Set API is that there are lots of use cases one doesn't think of in advance. Balazs On Fri, Apr 29, 2011 at 5:05 PM, Jan-Willem Maessen wrote:

...

...
On 29 April 2011 15:33, Luis Casillas wrote:

...
El abr 28, 2011, a las 9:16 p.m., Ivan Lazar Miljenovic escribió:

...
On 29 April 2011 14:08, Luis Casillas wrote:

...
2. Concern that Int-based indexing breaks the set abstraction by

exposing a specific ordering (Trac comment by Malcolm Wallace). My assumption when I decided to submit this patch was that if it's OK for Data.Map to have these operations, then it should also be ok for Data.Set to have them. But Malcolm's concern makes me wonder if some people will find

...
...
...
I think I agree with Malcolm here; the internals of Data.Set shouldn't be revealed IMHO.

Would either of the following modifications make the proposal more

On Fri, Apr 29, 2011 at 1:42 AM, Ivan Lazar Miljenovic wrote: the original Data.Map functions objectionable and dislike the idea of "spreading the infection" to Data.Set. palatable?

...
...
1. Stressing in the functions' documentation that the assignment of Ints

does not have to reflect the order of the elements. The examples would also be rewritten not to make any ordering assumptions.

...
2. Modifying the implementation to scramble the order of the indexes

relative to the order of the set elements. This would discourage people from implicitly coding to the element order.

...
...

Well, before we talk about modifications, etc.: is there a _need_ for this kind of indexing in a Set?

I believe there is not. Moreover, in order to realize efficient Int indexing, we need to commit a word of storage to hold subtree size on every interior tree node in Data.Map, Data.Set, Data.IntSet, and Data.IntMap. Without extra interior storage, these tree indexing and list indexing are comparably slow (O(n)).

My understanding is these functions were included in Data.Map *because we could*: the implementation uses size-balanced trees, so the tree size had to be kept at each node for balancing purposes, and therefore it was easy to implement indexing. As a happy side effect, this also makes it easy to maintain O(1) size information.

However, indexing precludes the use of implementations that *don't* cache subtree sizes on a large fraction of interior nodes (it is already a bad match for IntSet and IntMap). There are other techniques for making size O(1) without imposing per-node overhead, if that is desired. It would be a shame to rule out alternative implementations in order to support an API that nobody needs.

I'd support removing these functions absent a compelling use case (ie one that would not be better expressed as indexing by key). Does anyone have one?

-Jan-Willem Maessen

_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

Luis Casillas

4:30 p.m.

El abr 29, 2011, a las 8:05 a.m., Jan-Willem Maessen escribió:

...

I'd support removing these functions absent a compelling use case (ie one that would not be better expressed as indexing by key). Does anyone have one?

Well, I have a use case for these functions in the case of Set. I'm a bit wary of arguing for the inclusion on nothing but my personal use case, because as I mentioned in another message, I just assumed that desirability of such functions functions in Data.Map would be uncontroversial. I.e., when it comes to modifying the interface of a standard library, I think arguments like "make it uniform with Data.Map" are more persuasive than "I could use it for this." At this point, I'm slowly being convinced that the original Data.Map functions are extraneous. Still, I will mention my use case for the Data.Set functions, and let people make up their own minds. I'm working on a project where the most natural implementation is to use the members of runtime-constructed sets to index the elements of an array. The use of sets to represent the index sets is independently motivated, because I need to operate on these sets using operations like union, intersection, difference, mapping and filtering, while preserving the member uniqueness property. The lookupIndex, findIndex and elemAt functions would give me this set member/array index translation "for free." There are other ways to construct such a mapping, but I think they suffer in comparison: 1. I could construct a Map assigning an index to each element, but then the inverse mapping is linear time. 2. That could be solved by using a pair of maps (one from elem to Int, and an inverse map from Int to elem), but since I fundamentally just want to treat these maps as sets, I need to implement analogues of basic Set operations like union, intersection, difference, map, filter and so on, that would need to reconstruct the element <--> index mappings as needed. I realize that this is not difficult at all, but (a) it's presumably slower, (b) again, I mistakenly assumed that the Data.Map index functions were uncontroversial to start with. 3. An alternative would be to implement and use a data type that guaranteed uniqueness, ordering, Int indexing and the usual set operations. You might call such a type or class OrderedSet or IndexedSet; but to implement it, you might end up just copy-pasting Data.Set + the indexing functions as the implementation, because the implementation in Data.Set sure looks like a natural way of implementing an ordered, indexed set. I think this really distills the issue down to this. There is a family of three potential, related abstractions: plain set / \ / \ ordered set indexed set A plain set just provides uniqueness. An ordered set provides uniqueness and preserves some ordering predicate on the elements. An indexed set assigns a unique Int index in the range [0..size-1] to each element, and provides translation between members and indexes. The Set module provides an implementation of an efficient ordered set, that can easily be extended to also be an indexed set. But many people think that "plain set" should be the only contract that Data.Set is bound to implement. I can't say that I disagree with that--but I also just want indexed sets... A more ambitious proposal would be this: 1. Introduce classes to represent the three abstractions in question. 2. Have Data.Set be an instance of all three. 3. Don't expose the ordered an indexed operations directly from Data.Set; force the users to go through the appropriate class. This allows flexibility for Data.Set to later switch to an implementation that doesn't provide order or indexes; the original implementation would have to be kept as an alternative for people who want the extra features. This is a lot more ambitious of a proposal than I was hoping for my first time around, I will say.

Wolfram Kahl

4:56 p.m.

On Fri, Apr 29, 2011 at 09:30:24AM -0700, Luis Casillas wrote:

...

I'm working on a project where the most natural implementation is to use the members of runtime-constructed sets to index the elements of an array. The use of sets to represent the index sets is independently motivated, because I need to operate on these sets using operations like union, intersection, difference, mapping and filtering, while preserving the member uniqueness property.

The lookupIndex, findIndex and elemAt functions would give me this set member/array index translation "for free." There are other ways to construct such a mapping, but I think they suffer in comparison:

My use case, interfacing with a BDD-based library for fast implementations of relation-algebraic operations, has the same characteristics and motivations. Similar to what has been mentioned previously by others, I will choose the data structure for the API it exports together with the API's specification and performance characteristics. Even with the same specification for a common core API, there is space for different implementations that offer different API extensions and/or different performance characteristics. Module APIs are not a very useful abstraction element in Haskell; if we want the Data.Map (or Data.Set) API to be open for other implementations, we should make it a class instead. Since Data.Set wraps an implementation that is known to support these index operations more efficiently than is possible to implement by using that API, and these operations do have uses, they should be exported. For more discussion of design considerations of implementations versus class interfaces, see for example Edison: @Article{Okasaki-2001, author = {Chris Okasaki}, title = {An Overview of {Edison}}, journal = ENTCS, year = 2001, volume = 41, number = 1, pages = {60--73}, DOIURL = {http://dx.doi.org/10.1016/S1571-0661(05)80546-8}, DOI = {10.1016/S1571-0661(05)80546-8}, note = {(Original version in Haskell Workshop, September 2000, pp.\null{} 34--45)}, abstract = {Edison is a library of functional data structures implemented in Haskell. It supports three main families of abstractions: sequences, collections (e.g., sets and priority queues), and associative collections (e.g., finite maps). This paper summarizes the design of Edison, with particular attention to how that design is influenced by details of Haskell.} } <ShamelessPlug> Somewhat related: http://www.cas.mcmaster.ca/~kahl/Haskell/ModuleTools/ @InProceedings{Kahl-2009_TFP, author = {Wolfram Kahl}, title = {Haskell Module Tools for Liberating Type Class Design}, crossref = {TFP2009}, pages = {129--144}, chapter = {9}, WKsource = {svn/WK/doc/wk/HASKELL/Restricted.lhs}, abstract = {Design of Haskell type class hierarchies for complex purposes, including for standard containers, is a non-trivial exercise, and evolution of such designs is additionally hampered by the large overhead of connecting to existing implementations. We systematically discuss this overhead, and propose a tool solution, implemented using the GHC API, to automate its generation.} } @Book{TFP2009, title = {Trends in Functional Programming, {TFP 2009}}, booktitle = {Trends in Functional Programming, {TFP 2009}}, year = 2010, editor = {Zolt\'an Horv{\'a}th and Vikt\'oia Zs{\'o}k and Peter Achten and Pieter Koopman}, address = {UK}, publisher = {Intellect}, URL = {http://www.intellectbooks.co.uk/books/view-Book,id=4740/} } </ShamelessPlug> Wolfram

wren ng thornton

30 Apr 30 Apr

9:45 p.m.

On 4/29/11 12:30 PM, Luis Casillas wrote:

...

I think this really distills the issue down to this. There is a family of three potential, related abstractions:

plain set / \ / \ ordered set indexed set

A plain set just provides uniqueness. An ordered set provides uniqueness and preserves some ordering predicate on the elements. An indexed set assigns a unique Int index in the range [0..size-1] to each element, and provides translation between members and indexes.

The Set module provides an implementation of an efficient ordered set, that can easily be extended to also be an indexed set. But many people think that "plain set" should be the only contract that Data.Set is bound to implement. I can't say that I disagree with that--but I also just want indexed sets...

So why not just implement your "set" with a map from elements to their indices? Maps maintain uniqueness of keys, and Data.Map maintains ordering too. It wouldn't be as clean as the proposed functions, but from your description of the problem, it's not clear to me that using sets of proxies for array indices is really the cleanest approach in the first place. -- Live well, ~wren

Luis Casillas

1 May 1 May

12:46 a.m.

El abr 30, 2011, a las 2:45 p.m., wren ng thornton escribió:

...

So why not just implement your "set" with a map from elements to their indices? Maps maintain uniqueness of keys, and Data.Map maintains ordering too. It wouldn't be as clean as the proposed functions, but from your description of the problem, it's not clear to me that using sets of proxies for array indices is really the cleanest approach in the first place.

Well, maps from elements to indexes was exactly the solution I had in mind at first, but then I looked at the docs for Data.Map and noticed that it had these Int index functions that did the trick for free, and that they were "missing" from Data.Set. So I thought, hey, maybe can I fix that... :-)

wren ng thornton

30 Apr 30 Apr

9:30 p.m.

On 4/29/11 1:33 AM, Luis Casillas wrote:

...

I'm not at all committed to having the order of the indexes reflect the order of the set members or any specific internal property of the set implementation. I'm happy with any one-to-one, reversible mapping between sets and Ints in the range [0..(size-1)]. I'm also happy if the mapping changes between library versions (or heck, even between different invocations of the same program).

It sounds like what you want is an intern table. That's a pretty common need, though not something I think ought to be pushed onto Data.{Set,Map}. In part because the needs for intern tables can vary by application, so there's no single implementation for everyone. A simple intern table can be done by: module Intern (Intern(), empty, intern, extern) where import qualified Data.Map as M import qualified Data.IntMap as IM data Intern a = Intern { int2elem :: IM.IntMap a , elem2int :: M.Map a Int , nextInt :: !Int } empty = Intern IM.empty M.empty minBound intern a self@(Intern i2a a2i n) = case M.lookup a a2i of Just i -> Just (i, self) Nothing | n == maxBound -> Nothing | otherwise -> Just (n, Intern (IM.insert n a i2a) (M.insert a n a2i) (n+1)) extern i (Intern i2a a2i n) = IM.lookup i i2a Though it should be easy to see the room for variation. E.g., * sometimes you want to store the highest used key, other times the lowest unused key is more useful. * sometimes you want to allow for invalidating old keys. * sometimes you want to allow for recycling old keys ala garbage collection. * generally the key type should be abstract instead of an Int; but how abstract is "abstract"? * sometimes it's helpful to know whether the key returned by 'intern' is freshly allocated or not. * sometimes you'd be better off using Data.HashMap instead of Data.Map; sometimes you'd be better off with a hash-consing trie. ... -- Live well, ~wren

Antoine Latter

29 Apr 29 Apr

2:44 p.m.

On Thu, Apr 28, 2011 at 11:08 PM, Luis Casillas wrote:

...

Hi everybody. First time proposer here, so bear with me.

While I was looking at the Haddock documentation for Data.Map, I noticed that it has a section with "Indexed" operations that address the map with Int indexes:

http://www.haskell.org/ghc/docs/latest/html/libraries/containers/Data-Map.ht...

The relevant functions' signatures are these:

lookupIndex :: Ord k => k -> Map k a -> Maybe Int findIndex :: Ord k => k -> Map k a -> Int elemAt :: Int -> Map k a -> (k, a) updateAt :: (k -> a -> Maybe a) -> Int -> Map k a -> Map k a deleteAt :: Int -> Map k a -> Map k a

Now, the Data.Set module, which is a near carbon-copy of Data.Map, does not have any counterparts to these functions. My proposal is to add such counterparts to Data.Set (I'm writing some code that could benefit from them). The proposed functions' signatures:

lookupIndex :: Ord a => a -> Set a -> Maybe Int findIndex :: Ord a => a -> Set a -> Int elemAt :: Int -> Set a -> a deleteAt :: Int -> Set a -> Set a

I attach a patch to add these to Data.Set. Note that the implementation of the Set functions is basically a copy-paste-and-edit of their counterparts in Data.Map. That seems to be the convention in those two modules; Data.Set is really just like Data.Map but with no "value" field in the datatype's constructor.

Since I am a noob, I mistakenly created a Trac ticket before figuring out the correct library proposal process (#5162, which promptly got closed; my apologies). There have been a couple of responses there, which I will summarize here:

1. Concern over the efficiency of the code in my patch (Trac comment by tibbe). At my present skill level with Haskell, frankly, the most intelligent response I can give to this is that I'm copying the Data.Map implementation from darcs to the best of my ability, and that the most I hope is to make the Set versions no worse than the Map versions.

2. Concern that Int-based indexing breaks the set abstraction by exposing a specific ordering (Trac comment by Malcolm Wallace). My assumption when I decided to submit this patch was that if it's OK for Data.Map to have these operations, then it should also be ok for Data.Set to have them. But Malcolm's concern makes me wonder if some people will find the original Data.Map functions objectionable and dislike the idea of "spreading the infection" to Data.Set.

Discussion period: 2 weeks.

These functions are also missing from IntSet and IntMap. Assuming that there is a use case for such a family of indexing functions, it seems like they should be added to these as well. However if we don't have folks that want to use these functions for sets or IntMaps, I don't think we should add them. It's hard for me to imagine the Data.Map indexing functions making any code clearer - are there cases where they can improve the performance of complex updates? Antoine

Edward Kmett

2 May 2 May

9:14 p.m.

On Fri, Apr 29, 2011 at 10:44 AM, Antoine Latter wrote:

...

...
Hi everybody. First time proposer here, so bear with me.

While I was looking at the Haddock documentation for Data.Map, I noticed

On Thu, Apr 28, 2011 at 11:08 PM, Luis Casillas wrote: that it has a section with "Indexed" operations that address the map with Int indexes:

...
http://www.haskell.org/ghc/docs/latest/html/libraries/containers/Data-Map.ht...

...
The relevant functions' signatures are these:

lookupIndex :: Ord k => k -> Map k a -> Maybe Int findIndex :: Ord k => k -> Map k a -> Int elemAt :: Int -> Map k a -> (k, a) updateAt :: (k -> a -> Maybe a) -> Int -> Map k a -> Map k a deleteAt :: Int -> Map k a -> Map k a

Now, the Data.Set module, which is a near carbon-copy of Data.Map, does

not have any counterparts to these functions. My proposal is to add such counterparts to Data.Set (I'm writing some code that could benefit from them). The proposed functions' signatures:

...
lookupIndex :: Ord a => a -> Set a -> Maybe Int findIndex :: Ord a => a -> Set a -> Int elemAt :: Int -> Set a -> a deleteAt :: Int -> Set a -> Set a

I attach a patch to add these to Data.Set. Note that the implementation

of the Set functions is basically a copy-paste-and-edit of their counterparts in Data.Map. That seems to be the convention in those two modules; Data.Set is really just like Data.Map but with no "value" field in the datatype's constructor.

...
Since I am a noob, I mistakenly created a Trac ticket before figuring out

the correct library proposal process (#5162, which promptly got closed; my apologies). There have been a couple of responses there, which I will summarize here:

...
1. Concern over the efficiency of the code in my patch (Trac comment by

tibbe). At my present skill level with Haskell, frankly, the most intelligent response I can give to this is that I'm copying the Data.Map implementation from darcs to the best of my ability, and that the most I hope is to make the Set versions no worse than the Map versions.

...
2. Concern that Int-based indexing breaks the set abstraction by exposing

a specific ordering (Trac comment by Malcolm Wallace). My assumption when I decided to submit this patch was that if it's OK for Data.Map to have these operations, then it should also be ok for Data.Set to have them. But Malcolm's concern makes me wonder if some people will find the original Data.Map functions objectionable and dislike the idea of "spreading the infection" to Data.Set.

...
Discussion period: 2 weeks.

These functions are also missing from IntSet and IntMap.

Assuming that there is a use case for such a family of indexing functions, it seems like they should be added to these as well.

However if we don't have folks that want to use these functions for sets or IntMaps, I don't think we should add them.

IntMap and IntSet do not provide efficient size operations -- they have very different asymptotics than Map and Set in this regard, O(n) vs O(1). This makes the requested operations quite uselessly expensive there. While I don't like the existence of the Int indexed Map operations in the first place, I'm hardly swayed by an argument that boils down to claiming that because nobody seems to want to port them to a largely unrelated but unsuitable data structure, we shouldn't implement them on another data structure where they _can_ be efficiently implemented. -Edward

Gwern Branwen

29 Apr 29 Apr

6:53 p.m.

On Fri, Apr 29, 2011 at 12:08 AM, Luis Casillas wrote:

...

updateAt :: (k -> a -> Maybe a) -> Int -> Map k a -> Map k a

I took a look at all my local repos (c.h.o, Patch-tag, misc repos, and GitHub since I recently re-ran http://www.gwern.net/haskell/Archiving%20GitHub.html ). A grep* for updateAt (randomly picked one whose name looked like it'd be fairly unique) turned up what seems to be very few actual uses of updateAt, and mostly similarly named functions and various implementations of updateAt. The results are attached if anyone would like to see what actual uses there are in the wild. If anyone wants to see the output for the other functions, I can re-run the grep. * find ~/bin/ -name "*.hs" -exec grep -l updateAt {} \; -exec grep updateAt {} \; > ~/updateat.txt -- gwern http://www.gwern.net

Luis Casillas

9:26 p.m.

On Apr 29, 2011, at 11:53 AM, Gwern Branwen wrote:

...

On Fri, Apr 29, 2011 at 12:08 AM, Luis Casillas wrote:

...
updateAt :: (k -> a -> Maybe a) -> Int -> Map k a -> Map k a

I took a look at all my local repos (c.h.o, Patch-tag, misc repos, and GitHub since I recently re-ran http://www.gwern.net/haskell/Archiving%20GitHub.html ).

A grep* for updateAt (randomly picked one whose name looked like it'd be fairly unique) turned up what seems to be very few actual uses of updateAt, and mostly similarly named functions and various implementations of updateAt.

Thanks Gwern. For the record, though, my patch doesn't add a Set.updateAt operation, because it just didn't make sense. Map.updateAt is about updating the *value* that a key is mapped to, but the elements of a set correspond to the keys of a map.

Bryan O'Sullivan

30 Apr 30 Apr

5:52 p.m.

On Thu, Apr 28, 2011 at 9:08 PM, Luis Casillas wrote:

...

2. Concern that Int-based indexing breaks the set abstraction by exposing a specific ordering (Trac comment by Malcolm Wallace). My assumption when I decided to submit this patch was that if it's OK for Data.Map to have these operations, then it should also be ok for Data.Set to have them.

I agree with Malcolm on this point. The int-indexed functions on Map shouldn't be there in the first place, so I'd be unhappy to have them spread. Those modules already have disagreeably bloated APIs.

5175

Age (days ago)

5178

Last active (days ago)

List overview

Download

20 comments

10 participants

participants (10)

Antoine Latter
Balazs Komuves
Bryan O'Sullivan
Edward Kmett
Gwern Branwen
Ivan Lazar Miljenovic
Jan-Willem Maessen
Luis Casillas
Wolfram Kahl
wren ng thornton