Bringing the IntMap API up to par with the Map API

There are a few functions on Maps that could be implemented on IntMaps but aren't: - deleteAt - elemAt - updateAt - findIndex - lookupIndex The above 5 functions can be efficiently implemented on Maps in O(log n) time, as the size of the subtrees are known, but not not on IntMaps. We could still provide O(n) versions for IntMap. What's the use case for indexed lookup in maps? I've never seen it in other languages. - foldlWithKey - foldrWithKey - insertWith' - insertWithKey' - mapKeys - mapKeysMonotonic - mapKeysWith - toDescList I believe these should these should be straightforward to implement. I suggest we add these functions to IntMap so one doesn't it's a "no brainer" to switch from Maps to IntMaps for int keys. Are there any other functions that should be added to both APIs e.g. strict updateWith functions? Cheers, Johan

On 06/08/10 15:59, Johan Tibell wrote:
There are a few functions on Maps that could be implemented on IntMaps but aren't: [...]
Isn't it time to seriously think about defining maps as a type family to avoid this sort of incompatibility once and for all and to enable GHC to automatically use IntMap wherever appropriate? class Map k where data Map k :: * -> * empty ... insert ... instance Map Int where data Map Int v = Data.IntMap.IntMap v ... instance (Ord k) => Map k where data Map k v = Data.Map.Map k v ... Or something similar? Just my 2 cents... //Stephan

On Fri, Aug 6, 2010 at 4:46 PM, Stephan Friedrichs wrote: On 06/08/10 15:59, Johan Tibell wrote: There are a few functions on Maps that could be implemented on IntMaps
but aren't: [...] Isn't it time to seriously think about defining maps as a type family to
avoid this sort of incompatibility once and for all and to enable GHC to
automatically use IntMap wherever appropriate? class Map k where
data Map k :: * -> *
empty ...
insert ... instance Map Int where
data Map Int v = Data.IntMap.IntMap v
... instance (Ord k) => Map k where
data Map k v = Data.Map.Map k v
... Or something similar? Definitely worth researching. I think we should pursue this as a separate
track and fix what we have in the mean time.
Johan

Johan Tibell
On Fri, Aug 6, 2010 at 4:46 PM, Stephan Friedrichs
wrote:
On 06/08/10 15:59, Johan Tibell wrote:
There are a few functions on Maps that could be implemented on IntMaps but aren't: [...]
Isn't it time to seriously think about defining maps as a type family to avoid this sort of incompatibility once and for all and to enable GHC to automatically use IntMap wherever appropriate?
class Map k where data Map k :: * -> * empty ... insert ...
instance Map Int where data Map Int v = Data.IntMap.IntMap v ...
instance (Ord k) => Map k where data Map k v = Data.Map.Map k v ...
Or something similar?
Definitely worth researching. I think we should pursue this as a separate track and fix what we have in the mean time.
I was going to be doing something like this as part of container-classes (but in such a way that even [(a,b)] would be valid). -- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com IvanMiljenovic.wordpress.com

On 06/08/10 19:08, Johan Tibell wrote:
[...]
Definitely worth researching. I think we should pursue this as a separate track and fix what we have in the mean time.
I had a little time for playing with the idea and came up with this: =============================================================== import qualified Data.Map as DMap import qualified Data.IntMap as IMap import Prelude hiding ( lookup ) -- class with very general functions class MapC m k v where data MapImpl m k v :: * null :: MapImpl m k v -> Bool null = (== 0) . size -- default implementation for finite maps size :: MapImpl m k v -> Int empty :: MapImpl m k v insertLookupWithKey :: (k -> v -> v -> v) -> k -> v -> MapImpl m k v -> (Maybe v, MapImpl m k v) alter :: (Maybe v -> Maybe v) -> k -> MapImpl m k v -> MapImpl m k v toList :: MapImpl m k v -> [(k, v)] -- instance declarations covering Data.Map and Data.IntMap instance MapC (IMap.IntMap v) Int v where newtype MapImpl (IMap.IntMap v) Int v = IMapImpl (IMap.IntMap v) size (IMapImpl mp) = IMap.size mp empty = IMapImpl IMap.empty insertLookupWithKey f k v (IMapImpl mp) = let (found, mp') = IMap.insertLookupWithKey f k v mp in (found, IMapImpl mp') alter f k (IMapImpl mp) = IMapImpl (IMap.alter f k mp) toList (IMapImpl mp) = IMap.toList mp instance (Ord k) => MapC (DMap.Map k v) k v where newtype MapImpl (DMap.Map k v) k v = DMapImpl (DMap.Map k v) size (DMapImpl mp) = DMap.size mp empty = DMapImpl DMap.empty insertLookupWithKey f k v (DMapImpl mp) = let (found, mp') = DMap.insertLookupWithKey f k v mp in (found, DMapImpl mp') alter f k (DMapImpl mp) = DMapImpl (DMap.alter f k mp) toList (DMapImpl mp) = DMap.toList mp instance (MapC m k v, Show k, Show v) => Show (MapImpl m k v) where show = show . toList -- functions implemented on top of the type family singleton :: (MapC m k v) => k -> v -> MapImpl m k v singleton k v = insert k v empty insert :: (MapC m k v) => k -> v -> MapImpl m k v -> MapImpl m k v insert = insertWith const insertWith :: (MapC m k v) => (v -> v -> v) -> k -> v -> MapImpl m k v -> MapImpl m k v insertWith f = insertWithKey (const f) insertWithKey :: (MapC m k v) => (k -> v -> v -> v) -> k -> v -> MapImpl m k v -> MapImpl m k v insertWithKey f k v mp = snd $ insertLookupWithKey f k v mp lookup :: (MapC m k v) => k -> MapImpl m k v -> Maybe v lookup k = fst . insertLookupWithKey undefined k undefined findWithDefault :: (MapC m k v) => v -> k -> MapImpl m k v -> v findWithDefault v k = maybe v id . lookup k delete :: (MapC m k v) => k -> MapImpl m k v -> MapImpl m k v delete = update (const Nothing) adjust :: (MapC m k v) => (v -> v) -> k -> MapImpl m k v -> MapImpl m k v adjust f = alter (fmap f) update :: (MapC m k v) => (v -> Maybe v) -> k -> MapImpl m k v -> MapImpl m k v update f k mp = alter (maybe Nothing f) k mp =============================================================== Sorry for the long mail, but it isn't worth opening a repository yet. Comments? :) //Stephan

Hi Stephan, On Fri, Aug 20, 2010 at 2:57 PM, Stephan Friedrichs < deduktionstheorem@web.de> wrote:
Sorry for the long mail, but it isn't worth opening a repository yet. Comments? :)
While we can unify Data.Map and Data.IntMap using associated data types (ATs) but I'm not sure if that unification alone is worth a new library in my opinion. My biggest problem with ATs is that they don't scale well enough to support e.g. unboxing of keys and values for every data type. See http://www.mail-archive.com/glasgow-haskell-users@haskell.org/msg18545.html In my opinion we need to address the issues I raised in that email before we can create a practical containers library using ATs. Cheers, Johan

Hi,
There are a few functions on Maps that could be implemented on IntMaps but aren't:
- deleteAt - elemAt - updateAt - findIndex - lookupIndex
The above 5 functions can be efficiently implemented on Maps in O(log n) time, as the size of the subtrees are known, but not not on IntMaps. We could still provide O(n) versions for IntMap. What's the use case for indexed lookup in maps? I've never seen it in other languages.
Or there is the other alternative -- to drop these functions from Data.Map. These functions are usually not available in other languages and the implementation is essentially forced to store the size of the subtrees in the tree. That is convenient for Adams' balanced trees, but not for AVL trees, red-black trees, tries (IntMap) etc. Milan

On Fri, Aug 6, 2010 at 5:00 PM, Milan Straka
Hi,
There are a few functions on Maps that could be implemented on IntMaps but aren't:
- deleteAt - elemAt - updateAt - findIndex - lookupIndex
The above 5 functions can be efficiently implemented on Maps in O(log n) time, as the size of the subtrees are known, but not not on IntMaps. We could still provide O(n) versions for IntMap. What's the use case for indexed lookup in maps? I've never seen it in other languages.
Or there is the other alternative -- to drop these functions from Data.Map. These functions are usually not available in other languages and the implementation is essentially forced to store the size of the subtrees in the tree. That is convenient for Adams' balanced trees, but not for AVL trees, red-black trees, tries (IntMap) etc.
If we could show that no one uses them (i.e. by surveying Hackage) we could remove them. Johan

Johan Tibell
On Fri, Aug 6, 2010 at 5:00 PM, Milan Straka
wrote: There are a few functions on Maps that could be implemented on IntMaps but aren't:
- deleteAt - elemAt - updateAt - findIndex - lookupIndex
The above 5 functions can be efficiently implemented on Maps in O(log n) time, as the size of the subtrees are known, but not not on IntMaps. We could still provide O(n) versions for IntMap. What's the use case for indexed lookup in maps? I've never seen it in other languages.
Or there is the other alternative -- to drop these functions from Data.Map. These functions are usually not available in other languages and the implementation is essentially forced to store the size of the subtrees in the tree. That is convenient for Adams' balanced trees, but not for AVL trees, red-black trees, tries (IntMap) etc.
If we could show that no one uses them (i.e. by surveying Hackage) we could remove them.
I would consider it a feature of the current data structures that they allow efficient implementations of the sub-interface constituted by these functions. Together with the efficiency characteristics of their other operations, it makes these trees a useful alternative to lists and/or arrays in certain circumstances. For different data structures, it may make no difference whether they are implemented inside or outside their defining modules, but since Data.Map.Map etc. are exported as abstract datatypes, these functions need to be defined inside and exported. (And I have used them.) Wolfram

Or there is the other alternative -- to drop these functions from Data.Map. These functions are usually not available in other languages and the implementation is essentially forced to store the size of the subtrees in the tree. That is convenient for Adams' balanced trees, but not for AVL trees, red-black trees, tries (IntMap) etc.
I don't Data.Map is supposed to be an interface, it's an implementation. I don't think there's anything wrong with a data structure providing operations you can do efficiently in that data structure, even if other data structures can't. In fact, that's often the point of having different data structures. As to making IntMap a drop-in replacement, should it be? If you want to switch to IntMap for performance but need to do some things are going to be inefficient with it, then you should probably either not switch, or come up with a different way of doing things, right?

On Fri, Aug 6, 2010 at 7:44 PM, Evan Laforge
As to making IntMap a drop-in replacement, should it be? If you want to switch to IntMap for performance but need to do some things are going to be inefficient with it, then you should probably either not switch, or come up with a different way of doing things, right?
Right. That applies to some of the functions I listed. The others can be implemented efficiently for IntMaps. Johan

In my opinion, this is the sort of thing that shouldn't take a libraries post. Just go do it! :-) Cheers, Edward

On Fri, Aug 6, 2010 at 3:59 PM, Johan Tibell
There are a few functions on Maps that could be implemented on IntMaps but aren't:
<snip>
We're also missing a bunch of strict versions of traversals functions, like folds. I needed a strict fold the other day and found that there isn't one. Should we just go ahead and add a strict version for all the functions where that makes sense? -- Johan

On Wed, Aug 11, 2010 at 6:18 PM, Johan Tibell
We're also missing a bunch of strict versions of traversals functions, like folds. I needed a strict fold the other day and found that there isn't one. Should we just go ahead and add a strict version for all the functions where that makes sense?
I've created a proposal to add a strict left fold, see separate email. In the process of doing so I've discovered that the whole foo/fooWithKey duplication is unnecessary, at least from a performance perspective. Assuming that you define your folds like so: -- | /O(n)/. Strict version of 'foldlWithKey'. foldlWithKey' :: (b -> k -> a -> b) -> b -> Map k a -> b foldlWithKey' f z0 m = go z0 m where go z _ | z `seq` False = undefined go z Tip = z go z (Bin _ kx x l r) = go (f (go z l) kx x) r {-# INLINE foldlWithKey' #-} the expression M.foldlWithKey' (\n _ v -> n + v) 0 m will be optimized to code that doesn't touch the key field in the `Bin` constructor, just as if you wrote a foldl' function that doesn't pass the key to the combination function. It's probably not feasible to remove the duplication from e.g. the Data.Map API, but it's worth keeping in mind when designing data structure APIs in the future. Cheers, Johan

Johan Tibell wrote:
I've created a proposal to add a strict left fold, see separate email. In the process of doing so I've discovered that the whole foo/fooWithKey duplication is unnecessary, at least from a performance perspective. [...] It's probably not feasible to remove the duplication from e.g. the Data.Map API, but it's worth keeping in mind when designing data structure APIs in the future.
For API designers, note that this is not true for all "map like" structures. As a specific example, trie structures often have much slower *WithKey variants because they have to reconstruct the keys. I forget how much slower my benchmarks were for Data.Trie, but it was certainly significant. Even with inlining and the like I didn't have any luck getting the overhead to automatically fall away when the key isn't used. For something like Data.Map, the API is definitely redundant (though we may wish to keep the simplified versions around in Data.Map.Convenience or the like). And even though Data.IntMap is technically a trie, it'd probably be fine without them too--- since the keys are of fixed size, and can be combined with bit twiddling instead of rearranging memory. -- Live well, ~wren

On Thu, Aug 19, 2010 at 3:41 AM, wren ng thornton < wren@community.haskell.org> wrote:
Johan Tibell wrote:
I've created a proposal to add a strict left fold, see separate email. In the process of doing so I've discovered that the whole foo/fooWithKey duplication is unnecessary, at least from a performance perspective. [...]
It's probably not feasible to remove the duplication from e.g. the Data.Map API, but it's worth keeping in mind when designing data structure APIs in the future.
For something like Data.Map, the API is definitely redundant (though we may wish to keep the simplified versions around in Data.Map.Convenience or the like). And even though Data.IntMap is technically a trie, it'd probably be fine without them too--- since the keys are of fixed size, and can be combined with bit twiddling instead of rearranging memory.
The keys are actually stored in the leaves in the IntMap keys so what I said about Map applies to IntMap as well. -- Johan

Johan Tibell wrote:
On Thu, Aug 19, 2010 at 3:41 AM, wren ng thornton < wren@community.haskell.org> wrote:
Johan Tibell wrote:
I've created a proposal to add a strict left fold, see separate email. In the process of doing so I've discovered that the whole foo/fooWithKey duplication is unnecessary, at least from a performance perspective. [...]
It's probably not feasible to remove the duplication from e.g. the Data.Map API, but it's worth keeping in mind when designing data structure APIs in the future.
For something like Data.Map, the API is definitely redundant (though we may wish to keep the simplified versions around in Data.Map.Convenience or the like). And even though Data.IntMap is technically a trie, it'd probably be fine without them too--- since the keys are of fixed size, and can be combined with bit twiddling instead of rearranging memory.
The keys are actually stored in the leaves in the IntMap keys so what I said about Map applies to IntMap as well.
Ah yes, that's true. (Though even if it weren't...) -- Live well, ~wren
participants (8)
-
Edward Z. Yang
-
Evan Laforge
-
Ivan Lazar Miljenovic
-
Johan Tibell
-
kahl@cas.mcmaster.ca
-
Milan Straka
-
Stephan Friedrichs
-
wren ng thornton