
Hi,
Sorry for the late reply. I've been on vacation and had some catching
up to do the last few days.
On Mon, Jun 13, 2011 at 2:40 PM, Milan Straka
My point was, that there are many combinations you can think of -- IntMap minus IntSet, IntMap minus Map Int, IntMap minus list of Ints, IntMap union Map Int, IntMap intersection IntSet, IntMap intersection Set Int and so on.
With fromSet and fromDistinctAscList, you can handle all of these easily, although with some time penalty. For example the union case, union map (fromSet (const 1) set) seems fine.
The containers library doesn't separate algorithms from data structures, like e.g. STL does. As I wrote in another email it would be nice if we could separate traversal of elements in containers from the algorithms working on the elements (like difference). Unfortunately I don't know how to do this.
I am quite unhappy about exporting differenceKeysSet and other hybrid functions working on different data structures.
But I do not like depending on rules alone either. If we promise that difference map (fromSet set) works without creating intermediate map, then it - works only for GHC - is nontrivial to use -- which functions do really fuse with fromSet?
My biggest problem is "Does difference with IntSet deserves to have a special function implementing it?" I do not believe so. There are many functions that could fuse with fromSet, you could even implement better implementations of e.g. union with Map Int. But how do we measure which combinations are worth it?
We should measure the time difference, if it's small I don't think it's worth it.
For me, right now no functions working with different data structures are worth it. The API is complicated as it is and I do not feel adding cross-structure functions is a good thing to do. I am happy with being able to convert between the structures efficiently.
Johan, what is your opinion?
I'm off the same mind. On a more general note: The containers library has several problems: * Lack of abstraction You cannot write a function that works on different map implementations. You cannot use a Map in a function that expects a Set even though Map could have a O(1) conversion to a set view (most OO languages allow this) We cannot easily change the implementation. Data.Map requires the implementation to be a weight balanced tree due to the presences of functions that work on indexes. If we manage to write a faster map using AVL trees we cannot use it to improve the performance of Data.Map, even though a tiny minority of the users use the index-based functions. * Code duplication There's lots of code duplication in the implementation. I don't know how many times we write the same traversal just to do something slightly different at the lead position. There are algorithms that should work cross data structure (like difference) but doesn't. Unfortunately I don't know how to address either group of problems well. Perhaps with associated data types we can finally write a type class for Map and Set that works well. If we could get foldr/build fusion to fuse left folds we could use lists to implement e.g. difference efficiently by doing: difference s1 s2 = fromAscList . union (toAscList s1) (toAscList s2) which would fused to a "loop" that traverses both sets at the same time, building a new set as we go. Cheers, Johan