Names for general merge tactics in Data.Map

As I described previously, I've come up with general merge functions for Data.Map to replace the problematic mergeWithKey (which I desire to deprecate and remove). I have provisional names for the functions, and also for the "merge tactics" that go with them and their types. I would like to get whatever feedback I can before actually adding these. I'd like to try to make a release including these functions within approximately two weeks, so prompt responses will be appreciated. Note that the names of the "merge tactics" break with the tradition of offering plain functions and WithKey versions. To cut down on the already-large API, only key-passing versions will be provided (unless people object strongly). This has a small potential efficiency cost, but it cuts the API in half. The most general function is generalMergeA :: (Applicative f, Ord k) => WhenMissing f k a c -> WhenMissing f k b c -> WhenMatched f k a b c -> Map k a -> Map k b -> f (Map k c) and the pure version is generalMerge :: Ord k => SimpleWhenMissing k a c -> SimpleWhenMissing k b c -> SimpleWhenMatched k a b c -> Map k a -> Map k b -> Map k c where type SimpleWhenMissing = WhenMissing Identity type SimpleWhenMatched = WhenMatched Identity WhenMissing and WhenMatched are "merge tactics" explaining what to do with elements in various situations. WhenMissing explains what to do with keys present in one map but not the other, while WhenMatched explains what to do with keys present in both maps: WhenMissing f k x z is an abstract representation of a function of type k -> x -> f (Maybe z) WhenMatched f k x y z is an abstract representation of a function of type k -> x -> y -> f (Maybe z) So in principle, we have generalMergeA ~::~ (Applicative f, Ord k) => (k -> a -> f (Maybe c)) -> (k -> b -> f (Maybe c)) -> (k -> a -> b -> f (Maybe c)) -> Map k a -> Map k b -> f (Map k c) By using these abstract merge tactics instead of functions, we gain the ability to perform various optimizations. At present, only the WhenMissing optimizations are performed (they seem to be the most important), but we may choose to add WhenMatched optimizations in the future to enhance sharing. There are various ways to construct the merge tactics. The most general are currently called traverseMaybeMissing :: Applicative f => (k -> x -> f (Maybe y)) -> WhenMissing f k x y and zipWithMaybeAMatched :: Applicative f => (k -> x -> y -> f (Maybe z)) -> WhenMatched f k x y z These directly convert functions into merge tactics with the corresponding types. There are various others that are simpler and/or faster. Generally, the pure tactics have the advantage of being able to process a whole subtree in a single Applicative action. Non-Maybe tactics have the advantage of skipping balance checks. dropMissing and preserveMissing are especially important because they can skip over whole subtrees, either pruning them or leaving them alone. Pure: dropMissing :: Applicative f => WhenMissing f k x y -- dropMissing :: SimpleWhenMissing k x y preserveMissing :: Applicative f => WhenMissing f k x x -- preserveMissing :: SimpleWhenMissing k x x mapMaybeMissing :: Applicative f => (k -> x -> Maybe y) -> WhenMissing f k x y -- mapMaybeMissing :: (k -> x -> Maybe y) -> SimpleWhenMissing k x y mapMissing :: Applicative f => (k -> x -> y) -> WhenMissing f k x y -- mapMissing :: (k -> x -> y) -> SimpleWhenMissing k x y filterMissing :: Applicative f => (k -> x -> Bool) -> WhenMissing f k x x -- filterMissing :: (k -> x -> Bool) -> SimpleWhenMissing k x x zipWithMaybeMatched :: Applicative f => (k -> x -> y -> Maybe z) -> WhenMatched f k x y z -- zipWithMaybeMatched :: (k -> x -> y -> Maybe z) -> SimpleWhenMatched k x y z zipWithMatched :: Applicative f => (k -> x -> y -> z) -> WhenMatched f k x y z -- zipWithMatched :: (k -> x -> y -> z) -> SimpleWhenMatched k x y z Applicative: traverseMissing :: Applicative f => (k -> x -> f y) -> WhenMissing f k x y filterAMissing :: Applicative f => (k -> x -> f Bool) -> WhenMissing f k x x zipWithAMatched :: Applicative f => (k -> x -> y -> f z) -> WhenMatched f k x y z Thanks, David Feuer

My initial reaction to this is that I'd like to see mergeWithKey left
alone, there is existing code using it and it continues to be useful, and
then to have this new API offered in its own module under Data.Map where
the various consumers and providers of WhenMatched and WhenMissing can live
and be documented as a coherent toolkit.
On Thu, Aug 18, 2016 at 1:08 PM David Feuer
As I described previously, I've come up with general merge functions for Data.Map to replace the problematic mergeWithKey (which I desire to deprecate and remove). I have provisional names for the functions, and also for the "merge tactics" that go with them and their types. I would like to get whatever feedback I can before actually adding these. I'd like to try to make a release including these functions within approximately two weeks, so prompt responses will be appreciated.
Note that the names of the "merge tactics" break with the tradition of offering plain functions and WithKey versions. To cut down on the already-large API, only key-passing versions will be provided (unless people object strongly). This has a small potential efficiency cost, but it cuts the API in half.
The most general function is
generalMergeA :: (Applicative f, Ord k) => WhenMissing f k a c -> WhenMissing f k b c -> WhenMatched f k a b c -> Map k a -> Map k b -> f (Map k c)
and the pure version is
generalMerge :: Ord k => SimpleWhenMissing k a c -> SimpleWhenMissing k b c -> SimpleWhenMatched k a b c -> Map k a -> Map k b -> Map k c
where type SimpleWhenMissing = WhenMissing Identity type SimpleWhenMatched = WhenMatched Identity
WhenMissing and WhenMatched are "merge tactics" explaining what to do with elements in various situations. WhenMissing explains what to do with keys present in one map but not the other, while WhenMatched explains what to do with keys present in both maps:
WhenMissing f k x z is an abstract representation of a function of type k -> x -> f (Maybe z)
WhenMatched f k x y z is an abstract representation of a function of type k -> x -> y -> f (Maybe z)
So in principle, we have
generalMergeA ~::~ (Applicative f, Ord k) => (k -> a -> f (Maybe c)) -> (k -> b -> f (Maybe c)) -> (k -> a -> b -> f (Maybe c)) -> Map k a -> Map k b -> f (Map k c)
By using these abstract merge tactics instead of functions, we gain the ability to perform various optimizations. At present, only the WhenMissing optimizations are performed (they seem to be the most important), but we may choose to add WhenMatched optimizations in the future to enhance sharing.
There are various ways to construct the merge tactics. The most general are currently called
traverseMaybeMissing :: Applicative f => (k -> x -> f (Maybe y)) -> WhenMissing f k x y
and
zipWithMaybeAMatched :: Applicative f => (k -> x -> y -> f (Maybe z)) -> WhenMatched f k x y z
These directly convert functions into merge tactics with the corresponding types. There are various others that are simpler and/or faster. Generally, the pure tactics have the advantage of being able to process a whole subtree in a single Applicative action. Non-Maybe tactics have the advantage of skipping balance checks. dropMissing and preserveMissing are especially important because they can skip over whole subtrees, either pruning them or leaving them alone.
Pure:
dropMissing :: Applicative f => WhenMissing f k x y -- dropMissing :: SimpleWhenMissing k x y
preserveMissing :: Applicative f => WhenMissing f k x x -- preserveMissing :: SimpleWhenMissing k x x
mapMaybeMissing :: Applicative f => (k -> x -> Maybe y) -> WhenMissing f k x y -- mapMaybeMissing :: (k -> x -> Maybe y) -> SimpleWhenMissing k x y
mapMissing :: Applicative f => (k -> x -> y) -> WhenMissing f k x y -- mapMissing :: (k -> x -> y) -> SimpleWhenMissing k x y
filterMissing :: Applicative f => (k -> x -> Bool) -> WhenMissing f k x x -- filterMissing :: (k -> x -> Bool) -> SimpleWhenMissing k x x
zipWithMaybeMatched :: Applicative f => (k -> x -> y -> Maybe z) -> WhenMatched f k x y z -- zipWithMaybeMatched :: (k -> x -> y -> Maybe z) -> SimpleWhenMatched k x y z
zipWithMatched :: Applicative f => (k -> x -> y -> z) -> WhenMatched f k x y z -- zipWithMatched :: (k -> x -> y -> z) -> SimpleWhenMatched k x y z
Applicative:
traverseMissing :: Applicative f => (k -> x -> f y) -> WhenMissing f k x y
filterAMissing :: Applicative f => (k -> x -> f Bool) -> WhenMissing f k x x
zipWithAMatched :: Applicative f => (k -> x -> y -> f z) -> WhenMatched f k x y z
Thanks, David Feuer _______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

I like the idea of using separate modules. What would you like them to
be called?
I am open to the idea of leaving mergeWithKey in, but I'm not yet
convinced. Can you give an example of a realistic application of
mergeWithKey that cannot easily be rewritten in terms of generalMerge?
I would normally agree with you that backwards compatibility should be
preserved. However, mergeWithKey has several very serious problems.
1. It allows the user to produce invalid maps. Unlike other functions
that do so, it's not clear that it offers any significant efficiency
benefit (once generalMerge is added).
2. It breaks the map abstraction. A user could invoke mergeWithKey
with arguments that cause it to produce semantically different maps
depending on the way the given map is balanced. For example, a user
could pass an only1 function that deletes the median of the tree it is
passed. I don't know how to document the appropriate restrictions
required to preserve the abstraction.
3. It is very hard to figure out much about what mergeWithKey does by
looking at its name and type. Indeed, I did not understand it from its
documentation either, and had to read the source code.
On Thu, Aug 18, 2016 at 4:19 PM, Eric Mertens
My initial reaction to this is that I'd like to see mergeWithKey left alone, there is existing code using it and it continues to be useful, and then to have this new API offered in its own module under Data.Map where the various consumers and providers of WhenMatched and WhenMissing can live and be documented as a coherent toolkit.
On Thu, Aug 18, 2016 at 1:08 PM David Feuer
wrote: As I described previously, I've come up with general merge functions for Data.Map to replace the problematic mergeWithKey (which I desire to deprecate and remove). I have provisional names for the functions, and also for the "merge tactics" that go with them and their types. I would like to get whatever feedback I can before actually adding these. I'd like to try to make a release including these functions within approximately two weeks, so prompt responses will be appreciated.
Note that the names of the "merge tactics" break with the tradition of offering plain functions and WithKey versions. To cut down on the already-large API, only key-passing versions will be provided (unless people object strongly). This has a small potential efficiency cost, but it cuts the API in half.
The most general function is
generalMergeA :: (Applicative f, Ord k) => WhenMissing f k a c -> WhenMissing f k b c -> WhenMatched f k a b c -> Map k a -> Map k b -> f (Map k c)
and the pure version is
generalMerge :: Ord k => SimpleWhenMissing k a c -> SimpleWhenMissing k b c -> SimpleWhenMatched k a b c -> Map k a -> Map k b -> Map k c
where type SimpleWhenMissing = WhenMissing Identity type SimpleWhenMatched = WhenMatched Identity
WhenMissing and WhenMatched are "merge tactics" explaining what to do with elements in various situations. WhenMissing explains what to do with keys present in one map but not the other, while WhenMatched explains what to do with keys present in both maps:
WhenMissing f k x z is an abstract representation of a function of type k -> x -> f (Maybe z)
WhenMatched f k x y z is an abstract representation of a function of type k -> x -> y -> f (Maybe z)
So in principle, we have
generalMergeA ~::~ (Applicative f, Ord k) => (k -> a -> f (Maybe c)) -> (k -> b -> f (Maybe c)) -> (k -> a -> b -> f (Maybe c)) -> Map k a -> Map k b -> f (Map k c)
By using these abstract merge tactics instead of functions, we gain the ability to perform various optimizations. At present, only the WhenMissing optimizations are performed (they seem to be the most important), but we may choose to add WhenMatched optimizations in the future to enhance sharing.
There are various ways to construct the merge tactics. The most general are currently called
traverseMaybeMissing :: Applicative f => (k -> x -> f (Maybe y)) -> WhenMissing f k x y
and
zipWithMaybeAMatched :: Applicative f => (k -> x -> y -> f (Maybe z)) -> WhenMatched f k x y z
These directly convert functions into merge tactics with the corresponding types. There are various others that are simpler and/or faster. Generally, the pure tactics have the advantage of being able to process a whole subtree in a single Applicative action. Non-Maybe tactics have the advantage of skipping balance checks. dropMissing and preserveMissing are especially important because they can skip over whole subtrees, either pruning them or leaving them alone.
Pure:
dropMissing :: Applicative f => WhenMissing f k x y -- dropMissing :: SimpleWhenMissing k x y
preserveMissing :: Applicative f => WhenMissing f k x x -- preserveMissing :: SimpleWhenMissing k x x
mapMaybeMissing :: Applicative f => (k -> x -> Maybe y) -> WhenMissing f k x y -- mapMaybeMissing :: (k -> x -> Maybe y) -> SimpleWhenMissing k x y
mapMissing :: Applicative f => (k -> x -> y) -> WhenMissing f k x y -- mapMissing :: (k -> x -> y) -> SimpleWhenMissing k x y
filterMissing :: Applicative f => (k -> x -> Bool) -> WhenMissing f k x x -- filterMissing :: (k -> x -> Bool) -> SimpleWhenMissing k x x
zipWithMaybeMatched :: Applicative f => (k -> x -> y -> Maybe z) -> WhenMatched f k x y z -- zipWithMaybeMatched :: (k -> x -> y -> Maybe z) -> SimpleWhenMatched k x y z
zipWithMatched :: Applicative f => (k -> x -> y -> z) -> WhenMatched f k x y z -- zipWithMatched :: (k -> x -> y -> z) -> SimpleWhenMatched k x y z
Applicative:
traverseMissing :: Applicative f => (k -> x -> f y) -> WhenMissing f k x y
filterAMissing :: Applicative f => (k -> x -> f Bool) -> WhenMissing f k x x
zipWithAMatched :: Applicative f => (k -> x -> y -> f z) -> WhenMatched f k x y z
Thanks, David Feuer _______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

On Thu, 18 Aug 2016, David Feuer wrote:
I am open to the idea of leaving mergeWithKey in, but I'm not yet convinced. Can you give an example of a realistic application of mergeWithKey that cannot easily be rewritten in terms of generalMerge? I would normally agree with you that backwards compatibility should be preserved. However, mergeWithKey has several very serious problems.
In the past, it caused problems to install a new 'containers' on an old GHC. Is this problem solved?
1. It allows the user to produce invalid maps. Unlike other functions that do so, it's not clear that it offers any significant efficiency benefit (once generalMerge is added).
How about adding WARNING or DEPRECATED?

On Thu, Aug 18, 2016 at 4:46 PM, Henning Thielemann < lemming@henning-thielemann.de> wrote:
In the past, it caused problems to install a new 'containers' on an old GHC. Is this problem solved?
I believe you should be able to install it in a sandbox without any trouble. AFAIK, the issue is that the GHC API is fixed to whichever version of containers came with GHC, so the types exposed by a new version of containers will not match the ones in the ghc pseudo-package. As long as you're not trying to use the GHC API with the new containers, I believe you should be fine.
1. It allows the user to produce invalid maps. Unlike other functions that do so, it's not clear that it offers any significant efficiency benefit (once generalMerge is added).
How about adding WARNING or DEPRECATED?
DEPRECATED is only appropriate if it's going to be removed. WARNINGs can only be suppressed with a -Wno-warnings-deprecations flag, which makes me a bit reluctant to use them.

I realized that it would probably be helpful to put up the actual documentation for the proposed functions. This can be found at http://treeowl-containers-general-merge.bitballoon.com/data-map-lazy-merge The almost-identical strict API (missing three fairly unimportant contravariant mapping functions that can't currently be made strict in an efficient manner) can be found at http://treeowl-containers-general-merge.bitballoon.com/data-map-strict-merge David Feuer

On Thu, 18 Aug 2016, David Feuer wrote:
I realized that it would probably be helpful to put up the actual documentation for the proposed functions. This can be found at
http://treeowl-containers-general-merge.bitballoon.com/data-map-lazy-merge
The description of 'generalMerge' mentions diffPreserve, diffDrop, diffMapWithKey, but they are not shown in the Haddock page. Since there are pretty many functions around 'generalMerge' I would keep that stuff in one module and also rename 'generalMerge' to 'merge' because the new module provides enough distinction from Data.Map.mergeWithKey. Your solution looks pretty elegant to me but for a definitive judgement I would like to test it in real applications first. I remember some actual and potential uses of 'mergeWithKey' in my code. [1][2][3] It is unfortunate that depending on a newer version of containers also forces me to depend on new GHC versions. Could you provide the Merge modules in a separate package, say containers-merge or containers-compat? Would it be possible to implement the Merge modules using the existing mergeWithKey function? [1] https://github.com/conal/total-map/pull/3/commits/09109b7a045ffea75e50793a69... [2] https://github.com/conal/total-map/pull/5/commits/6fd1cc7b9bef666a69b2d02de3... [3] http://hub.darcs.net/thielema/comfort-graph/browse/src/Data/Graph/Comfort/Ma...

You don't have to depend on new GHC versions unless your package uses the
ghc API, which most don't.
On Aug 20, 2016 3:40 AM, "Henning Thielemann"
On Thu, 18 Aug 2016, David Feuer wrote:
I realized that it would probably be helpful to put up the actual
documentation for the proposed functions. This can be found at
http://treeowl-containers-general-merge.bitballoon.com/data- map-lazy-merge
The description of 'generalMerge' mentions diffPreserve, diffDrop, diffMapWithKey, but they are not shown in the Haddock page.
Since there are pretty many functions around 'generalMerge' I would keep that stuff in one module and also rename 'generalMerge' to 'merge' because the new module provides enough distinction from Data.Map.mergeWithKey.
Your solution looks pretty elegant to me but for a definitive judgement I would like to test it in real applications first. I remember some actual and potential uses of 'mergeWithKey' in my code. [1][2][3] It is unfortunate that depending on a newer version of containers also forces me to depend on new GHC versions. Could you provide the Merge modules in a separate package, say containers-merge or containers-compat? Would it be possible to implement the Merge modules using the existing mergeWithKey function?
[1] https://github.com/conal/total-map/pull/3/commits/09109b7a04 5ffea75e50793a690bd9db4512d540 [2] https://github.com/conal/total-map/pull/5/commits/6fd1cc7b9b ef666a69b2d02de36ff9e98c55cd4b [3] http://hub.darcs.net/thielema/comfort-graph/browse/src/Data/ Graph/Comfort/Map.hs#28

As for a compatibility package, I believe that should be possible. I don't
think mergeWithKey can implement generalMergeA, but I'm pretty sure it can
be written using splitRoot. I'm not personally very interested in writing
this shim, especially since it would only be really useful for packages
using both generalMerge and the GHC API, but I may get around to it at some
point if no one else does it first.
I think you may be right about the name merge rather than generalMerge; if
others agree I'll use that name.
On Aug 20, 2016 3:40 AM, "Henning Thielemann"
On Thu, 18 Aug 2016, David Feuer wrote:
I realized that it would probably be helpful to put up the actual
documentation for the proposed functions. This can be found at
http://treeowl-containers-general-merge.bitballoon.com/data- map-lazy-merge
The description of 'generalMerge' mentions diffPreserve, diffDrop, diffMapWithKey, but they are not shown in the Haddock page.
Since there are pretty many functions around 'generalMerge' I would keep that stuff in one module and also rename 'generalMerge' to 'merge' because the new module provides enough distinction from Data.Map.mergeWithKey.
Your solution looks pretty elegant to me but for a definitive judgement I would like to test it in real applications first. I remember some actual and potential uses of 'mergeWithKey' in my code. [1][2][3] It is unfortunate that depending on a newer version of containers also forces me to depend on new GHC versions. Could you provide the Merge modules in a separate package, say containers-merge or containers-compat? Would it be possible to implement the Merge modules using the existing mergeWithKey function?
[1] https://github.com/conal/total-map/pull/3/commits/09109b7a04 5ffea75e50793a690bd9db4512d540 [2] https://github.com/conal/total-map/pull/5/commits/6fd1cc7b9b ef666a69b2d02de36ff9e98c55cd4b [3] http://hub.darcs.net/thielema/comfort-graph/browse/src/Data/ Graph/Comfort/Map.hs#28

Argh! It probably actually is possible to emulate generalMergeA using
mergeWithKey, although it won't be pretty. I'll see if I can work up a
compat package in case someone needs one, but that probably won't happen
before the next containers release.
On Aug 20, 2016 3:19 PM, "David Feuer"
As for a compatibility package, I believe that should be possible. I don't think mergeWithKey can implement generalMergeA, but I'm pretty sure it can be written using splitRoot. I'm not personally very interested in writing this shim, especially since it would only be really useful for packages using both generalMerge and the GHC API, but I may get around to it at some point if no one else does it first.
I think you may be right about the name merge rather than generalMerge; if others agree I'll use that name.
On Aug 20, 2016 3:40 AM, "Henning Thielemann" < lemming@henning-thielemann.de> wrote:
On Thu, 18 Aug 2016, David Feuer wrote:
I realized that it would probably be helpful to put up the actual
documentation for the proposed functions. This can be found at
http://treeowl-containers-general-merge.bitballoon.com/data- map-lazy-merge
The description of 'generalMerge' mentions diffPreserve, diffDrop, diffMapWithKey, but they are not shown in the Haddock page.
Since there are pretty many functions around 'generalMerge' I would keep that stuff in one module and also rename 'generalMerge' to 'merge' because the new module provides enough distinction from Data.Map.mergeWithKey.
Your solution looks pretty elegant to me but for a definitive judgement I would like to test it in real applications first. I remember some actual and potential uses of 'mergeWithKey' in my code. [1][2][3] It is unfortunate that depending on a newer version of containers also forces me to depend on new GHC versions. Could you provide the Merge modules in a separate package, say containers-merge or containers-compat? Would it be possible to implement the Merge modules using the existing mergeWithKey function?
[1] https://github.com/conal/total-map/pull/3/commits/09109b7a04 5ffea75e50793a690bd9db4512d540 [2] https://github.com/conal/total-map/pull/5/commits/6fd1cc7b9b ef666a69b2d02de36ff9e98c55cd4b [3] http://hub.darcs.net/thielema/comfort-graph/browse/src/Data/ Graph/Comfort/Map.hs#28

On Sat, 20 Aug 2016, David Feuer wrote:
Argh! It probably actually is possible to emulate generalMergeA using mergeWithKey, although it won't be pretty. I'll see if I can work up a compat package in case someone needs one, but that probably won't happen before the next containers release.
I do not use the GHC API, so far. If this is the only problem I would try to install a new containers additionally to the global one.

On 2016-08-18 22:40, David Feuer wrote:
I like the idea of using separate modules. What would you like them to be called?
I am open to the idea of leaving mergeWithKey in, but I'm not yet convinced. Can you give an example of a realistic application of mergeWithKey that cannot easily be rewritten in terms of generalMerge? I would normally agree with you that backwards compatibility should be preserved. However, mergeWithKey has several very serious problems.
1. It allows the user to produce invalid maps. Unlike other functions that do so, it's not clear that it offers any significant efficiency benefit (once generalMerge is added).
2. It breaks the map abstraction. A user could invoke mergeWithKey with arguments that cause it to produce semantically different maps depending on the way the given map is balanced. For example, a user could pass an only1 function that deletes the median of the tree it is passed. I don't know how to document the appropriate restrictions required to preserve the abstraction.
3. It is very hard to figure out much about what mergeWithKey does by looking at its name and type. Indeed, I did not understand it from its documentation either, and had to read the source code.
Maybe it would be better to de-copule this into two proposals, one for adding generalMerge (or whatever it ends up being called #bikeshed), and one for depracting/removing mergeWithKey?
participants (4)
-
Bardur Arantsson
-
David Feuer
-
Eric Mertens
-
Henning Thielemann