Library proposal: add a Location interface for element-wise operations on Data.Map (#4887)

This is a variant of a suggestion by apfelmus: http://www.haskell.org/pipermail/libraries/2010-September/014510.html To avoid proliferation of variants of element-wise operations, the idea is to split these operations into two phases mediated by a new Location type, so that users can do whatever they like between these phases. Documentation is here: http://code.haskell.org/~ross/containers_doc/Data-Map.html#3 This adds a type and 9 functions to the interface, but makes possible monadic updates and much more. As an illustration, the file MapOps.hs attached to the ticket gives definitions of 30 of the public functions of Data.Map in terms of the new interface. At least in the case of insert, this definition is slightly faster than the current one. Discussion period: 4 weeks (to 4 February)

On Fri, 7 Jan 2011, Ross Paterson wrote:
This is a variant of a suggestion by apfelmus:
http://www.haskell.org/pipermail/libraries/2010-September/014510.html
To avoid proliferation of variants of element-wise operations, the idea is to split these operations into two phases mediated by a new Location type, so that users can do whatever they like between these phases. Documentation is here:
http://code.haskell.org/~ross/containers_doc/Data-Map.html#3
This adds a type and 9 functions to the interface, but makes possible monadic updates and much more. As an illustration, the file MapOps.hs attached to the ticket gives definitions of 30 of the public functions of Data.Map in terms of the new interface. At least in the case of insert, this definition is slightly faster than the current one.
This looks very very cool! Frequently I wanted a certain combination of lookup/delete/update that was not available in Data.Map, but plenty of other combinations instead. :-) Location is really a hole, where a certain element is already removed? Then I find Location as name a bit misleading. I assumed, that a location also lets me peek the particular element. If the meaning remains, how about the name Hole? Otherwise if using Location for reading an element turns out to be useful, the name should remain. (I find Apfelmus' suggestion of Focus also appropriate.) Before giving a final vote, I would like to scan my code for overly complicated accesses to Map, due to missing Location. I am currently downloading containers' darcs repository.

On Fri, Jan 07, 2011 at 10:10:51PM +0100, Henning Thielemann wrote:
Location is really a hole, where a certain element is already removed?
Either a value has been removed, or a key not present has been added, but yes, it's a hole.
Then I find Location as name a bit misleading. I assumed, that a location also lets me peek the particular element. If the meaning remains, how about the name Hole?
It's certainly mnemonic.

+1 +1 +1 +1 =)
On Fri, Jan 7, 2011 at 12:37 PM, Ross Paterson
This is a variant of a suggestion by apfelmus:
http://www.haskell.org/pipermail/libraries/2010-September/014510.html
To avoid proliferation of variants of element-wise operations, the idea is to split these operations into two phases mediated by a new Location type, so that users can do whatever they like between these phases. Documentation is here:
http://code.haskell.org/~ross/containers_doc/Data-Map.html#3
This adds a type and 9 functions to the interface, but makes possible monadic updates and much more. As an illustration, the file MapOps.hs attached to the ticket gives definitions of 30 of the public functions of Data.Map in terms of the new interface. At least in the case of insert, this definition is slightly faster than the current one.
Discussion period: 4 weeks (to 4 February)
_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

+1 Really nice idea!
(The following is just some brainstorming)
Why not store the value in the Location as in:
data Location k a
= Empty !k !(Path k a)
| Full {-# UNPACK #-} !Size !k a !(Path k a) !(Map k a) !(Map k a)
Search doesn't need to return a tuple with a Maybe but simply:
search :: Ord k => k -> Map k a -> Location k a
search k = k `seq` go Root
where
go path Tip = Empty k path
go path (Bin sx kx x l r) = case compare k kx of
LT -> go (LeftBin sx kx x path r) l
GT -> go (RightBin sx kx x l path) r
EQ -> Full sx kx x path l r
We do need a function to retrieve the value:
value :: Localtion k a -> Maybe a
value (Empty _ _) = Nothing
value (Full _ _ v _ _ _) = Just v
min- and maxLocation are also simplified:
minLocation :: Map k a -> Location k a
minLocation = go Root
where
go _path Tip = error "Map.least: empty map"
go path (Bin sx kx x Tip r) = Full sx kx x path Tip r
go path (Bin sx kx x l r) = go (LeftBin sx kx x path r) l
maxLocation :: Map k a -> Location k a
maxLocation = go Root
where
go _path Tip = error "Map.greatest: empty map"
go path (Bin sx kx x l Tip) = Full sx kx x path l Tip
go path (Bin sx kx x l r) = go (RightBin sx kx x l path) r
On Fri, Jan 7, 2011 at 6:37 PM, Ross Paterson
This is a variant of a suggestion by apfelmus:
http://www.haskell.org/pipermail/libraries/2010-September/014510.html
To avoid proliferation of variants of element-wise operations, the idea is to split these operations into two phases mediated by a new Location type, so that users can do whatever they like between these phases. Documentation is here:
http://code.haskell.org/~ross/containers_doc/Data-Map.html#3
This adds a type and 9 functions to the interface, but makes possible monadic updates and much more. As an illustration, the file MapOps.hs attached to the ticket gives definitions of 30 of the public functions of Data.Map in terms of the new interface. At least in the case of insert, this definition is slightly faster than the current one.
Discussion period: 4 weeks (to 4 February)
_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

On Sat, Jan 08, 2011 at 01:07:48AM +0100, Bas van Dijk wrote:
(The following is just some brainstorming)
Why not store the value in the Location as in:
data Location k a = Empty !k !(Path k a) | Full {-# UNPACK #-} !Size !k a !(Path k a) !(Map k a) !(Map k a)
[...] We do need a function to retrieve the value:
value :: Location k a -> Maybe a value (Empty _ _) = Nothing value (Full _ _ v _ _ _) = Just v
That would work well for search, but then index, minLocation and maxLocation would return Locations that value was always mapped to Just something. Extra invariants like that feel wrong to me.

On Sat, 8 Jan 2011, Ross Paterson wrote:
That would work well for search, but then index, minLocation and maxLocation would return Locations that value was always mapped to Just something. Extra invariants like that feel wrong to me.
'index' can be out of range and then return Nothing. minLocation and maxLocation return Nothing if the Map is empty.

On Sat, Jan 08, 2011 at 04:21:54PM +0100, Henning Thielemann wrote:
On Sat, 8 Jan 2011, Ross Paterson wrote:
That would work well for search, but then index, minLocation and maxLocation would return Locations that value was always mapped to Just something. Extra invariants like that feel wrong to me.
'index' can be out of range and then return Nothing. minLocation and maxLocation return Nothing if the Map is empty.
But what is the key in those cases?

Hi,
I think that Henning probably meant that those functions should return a
value of type Maybe (a, Location k a): Nothing when the Map is empty, and
Just (...) otherwise. This seems like a good idea to me. The API looks
nice otherwise.
-Iavor
On Sat, Jan 8, 2011 at 7:28 AM, Ross Paterson
On Sat, Jan 08, 2011 at 04:21:54PM +0100, Henning Thielemann wrote:
On Sat, 8 Jan 2011, Ross Paterson wrote:
That would work well for search, but then index, minLocation and maxLocation would return Locations that value was always mapped to Just something. Extra invariants like that feel wrong to me.
'index' can be out of range and then return Nothing. minLocation and maxLocation return Nothing if the Map is empty.
But what is the key in those cases?
_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

On Sat, Jan 8, 2011 at 1:49 AM, Ross Paterson
On Sat, Jan 08, 2011 at 01:07:48AM +0100, Bas van Dijk wrote:
(The following is just some brainstorming)
Why not store the value in the Location as in:
data Location k a = Empty !k !(Path k a) | Full {-# UNPACK #-} !Size !k a !(Path k a) !(Map k a) !(Map k a)
[...] We do need a function to retrieve the value:
value :: Location k a -> Maybe a value (Empty _ _) = Nothing value (Full _ _ v _ _ _) = Just v
That would work well for search, but then index, minLocation and maxLocation would return Locations that value was always mapped to Just something. Extra invariants like that feel wrong to me.
_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries
You could go with something like the following. However I don't think it's worth the trouble: data Location k a = E !(Empty k a) | F !(Full k a) location :: (Empty k a -> b) -> (Full k a -> b) -> Location k a -> b location f _ (E empty) = f empty location _ g (F full) = g full data Empty k a = Empty !k !(Path k a) data Full k a = Full {-# UNPACK #-} !Size !k a !(Path k a) !(Map k a) !(Map k a) data Path k a = Root | LeftBin {-# UNPACK #-} !Size !k a !(Path k a) !(Map k a) | RightBin {-# UNPACK #-} !Size !k a !(Map k a) !(Path k a) search :: Ord k => k -> Map k a -> Location k a search k = k `seq` go Root where go path Tip = E $ Empty k path go path (Bin sx kx x l r) = case compare k kx of LT -> go (LeftBin sx kx x path r) l GT -> go (RightBin sx kx x l path) r EQ -> F $ Full sx kx x path l r index :: Int -> Map k a -> Full k a index = go Root where STRICT_2_OF_3(go) go _path _i Tip = error "Map.index: out of range" go path i (Bin sx kx x l r) = case compare i size_l of LT -> go (LeftBin sx kx x path r) i l GT -> go (RightBin sx kx x l path) (i-size_l-1) r EQ -> Full sx kx x path l r where size_l = size l minLocation :: Map k a -> Full k a minLocation = go Root where go _path Tip = error "Map.least: empty map" go path (Bin sx kx x Tip r) = Full sx kx x path Tip r go path (Bin sx kx x l r) = go (LeftBin sx kx x path r) l maxLocation :: Map k a -> Full k a maxLocation = go Root where go _path Tip = error "Map.greatest: empty map" go path (Bin sx kx x l Tip) = Full sx kx x path l Tip go path (Bin sx kx x l r) = go (RightBin sx kx x l path) r class Key m where key :: m k a -> k instance Key Empty where key (Empty kx _path) = kx instance Key Full where key (Full _sx kx _x _path _l _r) = kx instance Key Location where key = location key key value :: Full k a -> a value (Full _sx _kx x _path _l _r) = x class Before m where before :: Ord k => m k a -> Map k a instance Before Empty where before (Empty _k path) = buildBefore Tip path instance Before Full where before (Full _sx _kx _x path l _r) = buildBefore l path instance Before Location where before = location before before buildBefore :: Ord k => Map k a -> Path k a -> Map k a buildBefore t Root = t buildBefore t (LeftBin _sx _kx _x path _r) = buildBefore t path buildBefore t (RightBin _sx kx x l path) = buildBefore (join kx x l t) path class After m where after :: Ord k => m k a -> Map k a instance After Empty where after (Empty _k path) = buildAfter Tip path instance After Full where after (Full _sx _kx _x path l _r) = buildAfter l path instance After Location where after = location after after buildAfter :: Ord k => Map k a -> Path k a -> Map k a buildAfter t Root = t buildAfter t (LeftBin _sx kx x path r) = buildAfter (join kx x t r) path buildAfter t (RightBin _sx _kx _x _l path) = buildAfter t path class Assign m where assign :: a -> m k a -> Map k a instance Assign Empty where assign x (Empty k path) = rebuildGT (singleton k x) path instance Assign Full where assign x (Full sx kx _x path l r) = rebuildEQ (Bin sx kx x l r) path instance Assign Location where assign x = location (assign x) (assign x) class Clear m where clear :: m k a -> Map k a instance Clear Empty where clear (Empty _k path) = rebuildEQ Tip path instance Clear Full where clear (Full _sx _kx _x path l r) = rebuildLT (glue l r) path instance Clear Location where clear = location clear clear -- Rebuild the tree the same size as it was, so no rebalancing is needed. rebuildEQ :: Map k a -> Path k a -> Map k a rebuildEQ t Root = t rebuildEQ l (LeftBin sx kx x path r) = rebuildEQ (Bin sx kx x l r) path rebuildEQ r (RightBin sx kx x l path) = rebuildEQ (Bin sx kx x l r) path -- Rebuild the tree one entry smaller than it was, rebalancing as we go. rebuildLT :: Map k a -> Path k a -> Map k a rebuildLT t Root = t rebuildLT l (LeftBin _sx kx x path r) = rebuildLT (balanceR kx x l r) path rebuildLT r (RightBin _sx kx x l path) = rebuildLT (balanceL kx x l r) path -- Rebuild the tree one entry larger than it was, rebalancing as we go. rebuildGT :: Map k a -> Path k a -> Map k a rebuildGT t Root = t rebuildGT l (LeftBin _sx kx x path r) = rebuildGT (balanceL kx x l r) path rebuildGT r (RightBin _sx kx x l path) = rebuildGT (balanceR kx x l r) path Regards, Bas

Ross Paterson
http://code.haskell.org/~ross/containers_doc/Data-Map.html#3
Great! Given that no function takes two Locations at once, a Location can be thought of (and implemented) as a record whose fields are "key", "before", "after", etc. To support this thought, it would be nice if all functions that take a Location as argument take it as the first argument. That seems already the case except in "assign". -- Edit this signature at http://www.digitas.harvard.edu/cgi-bin/ken/sig <INSERT PARTISAN STATEMENT HERE>

Chung-chieh Shan wrote:
Ross Paterson
wrote: http://code.haskell.org/~ross/containers_doc/Data-Map.html#3
Great!
Given that no function takes two Locations at once, a Location can be thought of (and implemented) as a record whose fields are "key", "before", "after", etc. To support this thought, it would be nice if all functions that take a Location as argument take it as the first argument. That seems already the case except in "assign".
Since the Location comes with invariants (the keys in before must be smaller than the keys in after ), the user may not construct it with record syntax, so it doesn't really matter whether one of the exported functions is composed with flip . However, implementing it as a record could be one way to make it faster, if GHC were to implement some kind of absence analysis. Namely, the idea is that the records fields are independent of each other and usually not used all at once. If GHC could detect this, for instance by using strictness analysis, then only a specialized/fused version of the Path needs to be built. Regards, Heinrich Apfelmus -- http://apfelmus.nfshost.com

On Fri, Jan 7, 2011 at 6:37 PM, Ross Paterson
This is a variant of a suggestion by apfelmus:
http://www.haskell.org/pipermail/libraries/2010-September/014510.html
To avoid proliferation of variants of element-wise operations, the idea is to split these operations into two phases mediated by a new Location type, so that users can do whatever they like between these phases. Documentation is here:
http://code.haskell.org/~ross/containers_doc/Data-Map.html#3
This adds a type and 9 functions to the interface, but makes possible monadic updates and much more. As an illustration, the file MapOps.hs attached to the ticket gives definitions of 30 of the public functions of Data.Map in terms of the new interface. At least in the case of insert, this definition is slightly faster than the current one.
I ran the current benchmark suite and my results differ from yours. Here's the baseline for the current API: insert,"317.7258 us" "insertWith empty","326.9462 us" "insertWith update","306.1100 us" "insertWith' empty","330.3835 us" "insertWith' update","295.4943 us" "insertWithKey empty","329.2176 us" "insertWithKey update","306.1306 us" "insertWithKey' empty","330.5804 us" "insertWithKey' update","295.7395 us" "insertLookupWithKey empty","800.2740 us" "insertLookupWithKey update","747.5337 us" "insertLookupWithKey' empty","793.6592 us" "insertLookupWithKey' update","683.3591 us" delete,"156.7483 us" update,"426.2789 us" updateLookupWithKey,"492.0369 us" alter,"302.2354 us" And here's the current API, implemented in terms of Location (i.e. using MapOps.hs). insert,"532.1855 us" "insertWith empty","532.5082 us" "insertWith update","472.3454 us" "insertWith' empty","530.2606 us" "insertWith' update","457.9754 us" "insertWithKey empty","526.2428 us" "insertWithKey update","689.5042 us" "insertWithKey' empty","528.4261 us" "insertWithKey' update","456.7405 us" "insertLookupWithKey empty","532.3916 us" "insertLookupWithKey update","683.7498 us" "insertLookupWithKey' empty","526.8173 us" "insertLookupWithKey' update","459.0202 us" delete,"298.9599 us" update,"441.9685 us" updateLookupWithKey,"441.2167 us" alter,"444.8947 us" It's interest to compare the Location-based API to simply using 'lookup' and 'insert', which is another way to get rid of proliferation of element-wise operations. Here's the results for implementing insertWith' in terms of those two functions: "insertWith' empty","606.2098 us" "insertWith' update","492.5084 us" So the baseline insertWith' is 35% faster than the Location-based insertWith' which is 7% faster than composing lookup and insert, on the "insertWith' update" benchmark. Can we get the Location-based implementation to perform closer to the current implementation? Johan

On Sat, Jan 8, 2011 at 10:05 AM, Johan Tibell
So the baseline insertWith' is 35% faster than the Location-based insertWith' which is 7% faster than composing lookup and insert, on the "insertWith' update" benchmark.
Can we get the Location-based implementation to perform closer to the current implementation?
I tried to improve the performance today without much success (but much staring at Core). I wonder if the extra allocation of Path data constructors is hurting performance. Is it worth adding this Interface for all its coolness? Simply using 'lookup' plus 'insert' is simpler, as powerful, and only 7% slower. Johan

On Wed, Jan 12, 2011 at 10:53 PM, Johan Tibell
Is it worth adding this Interface for all its coolness? Simply using 'lookup' plus 'insert' is simpler, as powerful, and only 7% slower.
If the argument is that this interface complicates the existing Data.Map API why not export it from its own module: Data.Map.Location/Zipper/Cursor? If, in the future, it turns out that this interface is used a lot we can always move it to Data.Map. Bas

On Wed, Jan 12, 2011 at 10:53:02PM +0100, Johan Tibell wrote:
I tried to improve the performance today without much success (but much staring at Core). I wonder if the extra allocation of Path data constructors is hurting performance.
Is it worth adding this Interface for all its coolness? Simply using 'lookup' plus 'insert' is simpler, as powerful, and only 7% slower.
I've also had no success in speeding it up. It's disappointing that having the extra information (whether the key was there or not) doesn't help, but I think I agree.

On Fri, Jan 14, 2011 at 11:06 AM, Ross Paterson
I've also had no success in speeding it up. It's disappointing that having the extra information (whether the key was there or not) doesn't help, but I think I agree.
I think the problem is that you get almost 2x allocation. O(log n) allocation to create the Path and O(log n) allocation to rebuild the tree. Perhaps one could use continuations to create the whole instead of reifying the stack as a Path? We might lose the ability to get the smaller/larger elements but at least we might be able to update the "hole" efficiently? Johan

On Fri, Jan 14, 2011 at 11:58:18AM +0100, Johan Tibell wrote:
On Fri, Jan 14, 2011 at 11:06 AM, Ross Paterson
wrote: I've also had no success in speeding it up. ?It's disappointing that having the extra information (whether the key was there or not) doesn't help, but I think I agree.
I think the problem is that you get almost 2x allocation. O(log n) allocation to create the Path and O(log n) allocation to rebuild the tree. Perhaps one could use continuations to create the whole instead of reifying the stack as a Path? We might lose the ability to get the smaller/larger elements but at least we might be able to update the "hole" efficiently?
That's essentially apfelmus's original suggestion. I believe I tried that but creating the closures seems even slower.

On Fri, Jan 14, 2011 at 6:04 AM, Ross Paterson
On Fri, Jan 14, 2011 at 11:58:18AM +0100, Johan Tibell wrote:
...[Ross couldn't speed things up easily.]
I think the problem is that you get almost 2x allocation. O(log n) allocation to create the Path and O(log n) allocation to rebuild the tree. Perhaps one could use continuations to create the whole instead of reifying the stack as a Path? We might lose the ability to get the smaller/larger elements but at least we might be able to update the "hole" efficiently?
That's essentially apfelmus's original suggestion. I believe I tried that but creating the closures seems even slower.
Er, yes, the proposed data structure defunctionalizes these continuations (which in principle also lets us manipulate them more flexibly). Remember that a closure (especially a complex continuation) is *also* a data structure, folks! -Jan

Ross Paterson wrote:
This is a variant of a suggestion by apfelmus:
http://www.haskell.org/pipermail/libraries/2010-September/014510.html
To avoid proliferation of variants of element-wise operations, the idea is to split these operations into two phases mediated by a new Location type, so that users can do whatever they like between these phases. Documentation is here:
http://code.haskell.org/~ross/containers_doc/Data-Map.html#3
This adds a type and 9 functions to the interface, but makes possible monadic updates and much more. As an illustration, the file MapOps.hs attached to the ticket gives definitions of 30 of the public functions of Data.Map in terms of the new interface.
I like it! How could I not? :D I'm not overly happy with the name "Location", though, because the data type not only represents one location but also the whole map around it. Maybe "Vicinity" is slightly more apt? The other names I could think of ("PinnedMap", "FocussedMap", "FingerMap", "MarkedMap", "Neighborhood", "TreasureTrove") don't seem to be an improvement. Regards, Heinrich Apfelmus -- http://apfelmus.nfshost.com

On Tue, Jan 11, 2011 at 5:20 PM, Heinrich Apfelmus
I'm not overly happy with the name "Location", though, because the data type not only represents one location but also the whole map around it. Maybe "Vicinity" is slightly more apt?
The other names I could think of ("PinnedMap", "FocussedMap", "FingerMap", "MarkedMap", "Neighborhood", "TreasureTrove") don't seem to be an improvement.
I was thinking about "Cursor". Bas

On Tue, Jan 11, 2011 at 8:55 PM, Bas van Dijk
On Tue, Jan 11, 2011 at 5:20 PM, Heinrich Apfelmus
wrote: I'm not overly happy with the name "Location", though, because the data type not only represents one location but also the whole map around it. Maybe "Vicinity" is slightly more apt?
The other names I could think of ("PinnedMap", "FocussedMap", "FingerMap", "MarkedMap", "Neighborhood", "TreasureTrove") don't seem to be an improvement.
I was thinking about "Cursor".
I like it, short and sweet. Johan

On Tue, Jan 11, 2011 at 9:14 PM, Johan Tibell
On Tue, Jan 11, 2011 at 8:55 PM, Bas van Dijk
wrote: I was thinking about "Cursor".
I like it, short and sweet.
Isn't "Zipper" the existing nomenclature for these things? If not
Zipper, then IMO "Cursor" is the best one out the alternatives
proposed.
G
--
Gregory Collins
participants (11)
-
Bas van Dijk
-
Chung-chieh Shan
-
Edward Kmett
-
Felipe Almeida Lessa
-
Gregory Collins
-
Heinrich Apfelmus
-
Henning Thielemann
-
Iavor Diatchki
-
Jan-Willem Maessen
-
Johan Tibell
-
Ross Paterson