Map unionWith generalization

Hans Aberg

27 Jan 2010 27 Jan '10

10:53 a.m.

I need ideally some generalizations of unionWith and unionWithKey, for efficiency matters (i.e. avoiding conversions and traversing the maps more than once). I could use a modification of the code in Map.hs, but then the problem is that the module Map interface does not export the constructors of data Map. So suggestions are welcome. For example, in Map String Integer (sparse representation of monomials) compute the minimum value of all associative pairs with the same key (the gcd); if only one key is present, the absent should be treated as having value 0. So unionWith min xs ys will not work, because unionWith will always apply the identity to the remaining value when one key is missing, whereas it should be sent to 0. So here, one would want: (a -> c) -> (b -> c) -> (a -> b -> c) -> Map k a -> Map k b -> Map k c where the two first functions are applied when the first or second key is missing. Hans

Show replies by date

Jan-Willem Maessen

27 Jan 27 Jan

1:56 p.m.

On Jan 27, 2010, at 5:53 AM, Hans Aberg wrote:

...

I need ideally some generalizations of unionWith and unionWithKey, for efficiency matters (i.e. avoiding conversions and traversing the maps more than once). I could use a modification of the code in Map.hs, but then the problem is that the module Map interface does not export the constructors of data Map. So suggestions are welcome.

For example, in Map String Integer (sparse representation of monomials) compute the minimum value of all associative pairs with the same key (the gcd); if only one key is present, the absent should be treated as having value 0. So unionWith min xs ys will not work, because unionWith will always apply the identity to the remaining value when one key is missing, whereas it should be sent to 0.

So here, one would want: (a -> c) -> (b -> c) -> (a -> b -> c) -> Map k a -> Map k b -> Map k c where the two first functions are applied when the first or second key is missing.

Ah, the swiss army knife function on maps. This particular formulation works well for the application you describe above, where you're completely traversing both maps. The following really grubby variant can be used to implement asymptotically efficient variations of union, intersection, difference, etc., etc: swissArmy :: (Map k a -> Map k c) -> -- Used to process submaps unique to the left map (Map k b -> Map k c) -> -- Used to process submaps unique to the right map (a -> b -> Maybe c) -> -- Used to process a single common entry Map k a -> Map k b -> Map k c Then your function appears to be: -- helper just2 :: (a -> b -> c) -> a -> b -> Maybe c just2 f a b = Just (f a b) swissArmy (fmap (const 0)) (fmap (const 0)) (just2 gcd) Here are unionWith and intersectionWith: unionWith f = swissArmy id id (just2 f) intersectionWith = swissArmy (const empty) (const empty) (just2 f) differenceWith = swissArmy id (const empty) (\a b -> Nothing) When throwing together tree-like data structures, I often start by writing this function; it handles most of the bulk operations I might want to do as one-liners. It's got a messy, ugly type signature, but it does everything you want as efficiently as you want.* -Jan-Willem Maessen * Actually, this is only true if you add the key as an argument to the third function, so that you can also encode unionWithKey etc! I've skipped that here.

Hans Aberg

2:42 p.m.

On 27 Jan 2010, at 14:56, Jan-Willem Maessen wrote:

...

...
So here, one would want: (a -> c) -> (b -> c) -> (a -> b -> c) -> Map k a -> Map k b -> Map k c where the two first functions are applied when the first or second key is missing.

Ah, the swiss army knife function on maps. This particular formulation works well for the application you describe above, where you're completely traversing both maps. The following really grubby variant can be used to implement asymptotically efficient variations of union, intersection, difference, etc., etc:

swissArmy :: (Map k a -> Map k c) -> -- Used to process submaps unique to the left map (Map k b -> Map k c) -> -- Used to process submaps unique to the right map (a -> b -> Maybe c) -> -- Used to process a single common entry Map k a -> Map k b -> Map k c

I'm not sure why you want to throw in functions between maps in the two first arguments. Then there is no requirement that single keys are preserved. But it is a good idea to have a Maybe so that one can remove keys on the fly.

...

Then your function appears to be:

-- helper just2 :: (a -> b -> c) -> a -> b -> Maybe c just2 f a b = Just (f a b)

swissArmy (fmap (const 0)) (fmap (const 0)) (just2 gcd)

Here are unionWith and intersectionWith:

unionWith f = swissArmy id id (just2 f) intersectionWith = swissArmy (const empty) (const empty) (just2 f) differenceWith = swissArmy id (const empty) (\a b -> Nothing)

When throwing together tree-like data structures, I often start by writing this function; it handles most of the bulk operations I might want to do as one-liners. It's got a messy, ugly type signature, but it does everything you want as efficiently as you want.*

My guess is that is when you write things from scratch. I wanted to add these function on top of the module Data.Map, but then that seems not to be possible, as the constructors are not exported. I could make a copy of this file, and then make my own variation. Hans

Jan-Willem Maessen

3:33 p.m.

On Jan 27, 2010, at 9:42 AM, Hans Aberg wrote:

...

On 27 Jan 2010, at 14:56, Jan-Willem Maessen wrote:

...
...
So here, one would want: (a -> c) -> (b -> c) -> (a -> b -> c) -> Map k a -> Map k b -> Map k c where the two first functions are applied when the first or second key is missing.

Ah, the swiss army knife function on maps. This particular formulation works well for the application you describe above, where you're completely traversing both maps. The following really grubby variant can be used to implement asymptotically efficient variations of union, intersection, difference, etc., etc:

swissArmy :: (Map k a -> Map k c) -> -- Used to process submaps unique to the left map (Map k b -> Map k c) -> -- Used to process submaps unique to the right map (a -> b -> Maybe c) -> -- Used to process a single common entry Map k a -> Map k b -> Map k c

I'm not sure why you want to throw in functions between maps in the two first arguments. Then there is no requirement that single keys are preserved.

But it is a good idea to have a Maybe so that one can remove keys on the fly.

A good point. Technically, one should delimit the scope of the first two arguments: data KeyPreservingMapOperation k a b = AlwaysEmpty | Identity | MapMaybeWithKey (k -> a -> Maybe b) perform :: KeyPreservingMapOperation a b -> Map k a -> Map k b perform AlwaysEmpty = const empty perform Identity = id perform (MapMaybeWithKey f) = mapMaybeWithKey f

...

...
When throwing together tree-like data structures, I often start by writing this function; it handles most of the bulk operations I might want to do as one-liners. It's got a messy, ugly type signature, but it does everything you want as efficiently as you want.*

My guess is that is when you write things from scratch.

Yes. On the other hand, I've repeatedly run into the same problem you're describing: a api that doesn't give me an efficient way to perform an operation I know to be very simple. If every map provided an operation like this one, then I can avoid writing my own library "from scratch" when I need it --- especially when "from scratch" means "fork the code and add what I need". So, library implementors: think hard about your "swiss army knife" combinators. You end up with messy functions with gigantic signatures. On the other hand, you can often add a couple of judicious INLINE annotations and remove tons of code from the rest of your library. Then expose them, and mark them as the functions of last resort that they truly are. I bet there's even a fusion framework in here somewhere. -Jan

Hans Aberg

3:54 p.m.

On 27 Jan 2010, at 16:33, Jan-Willem Maessen wrote:

...

...
I'm not sure why you want to throw in functions between maps in the two first arguments. Then there is no requirement that single keys are preserved.

But it is a good idea to have a Maybe so that one can remove keys on the fly.

A good point. Technically, one should delimit the scope of the first two arguments:

data KeyPreservingMapOperation k a b = AlwaysEmpty | Identity | MapMaybeWithKey (k -> a -> Maybe b)

perform :: KeyPreservingMapOperation a b -> Map k a -> Map k b perform AlwaysEmpty = const empty perform Identity = id perform (MapMaybeWithKey f) = mapMaybeWithKey f

I'm thinking about (k -> Maybe a -> Maybe b -> Maybe c) -> Map k a -> Map k b -> Map k c The first two Maybe's tell if the keys are present, the last if one wants it in the resulting map.

...

...
...
When throwing together tree-like data structures, I often start by writing this function; it handles most of the bulk operations I might want to do as one-liners. It's got a messy, ugly type signature, but it does everything you want as efficiently as you want.*

My guess is that is when you write things from scratch.

Yes. On the other hand, I've repeatedly run into the same problem you're describing: a api that doesn't give me an efficient way to perform an operation I know to be very simple. If every map provided an operation like this one, then I can avoid writing my own library "from scratch" when I need it --- especially when "from scratch" means "fork the code and add what I need".

So, library implementors: think hard about your "swiss army knife" combinators. You end up with messy functions with gigantic signatures. On the other hand, you can often add a couple of judicious INLINE annotations and remove tons of code from the rest of your library. Then expose them, and mark them as the functions of last resort that they truly are.

One can transverse the product of keys. Then I'm thinking about (k1 -> k2 -> a -> b -> Maybe c -> Maybe(k, c)) -> Map k1 a -> Map k2 b -> Map k c The first Maybe tells if the key is already present; the second if one wants it inserted. The idea in both cases is to combine the modifying functions into one. This simplifies the interface. Hans

Jan-Willem Maessen

8:29 p.m.

On Jan 27, 2010, at 10:54 AM, Hans Aberg wrote:

...

On 27 Jan 2010, at 16:33, Jan-Willem Maessen wrote:

...
...
I'm not sure why you want to throw in functions between maps in the two first arguments. Then there is no requirement that single keys are preserved.

But it is a good idea to have a Maybe so that one can remove keys on the fly.

A good point. Technically, one should delimit the scope of the first two arguments:

data KeyPreservingMapOperation k a b = AlwaysEmpty | Identity | MapMaybeWithKey (k -> a -> Maybe b)

perform :: KeyPreservingMapOperation a b -> Map k a -> Map k b perform AlwaysEmpty = const empty perform Identity = id perform (MapMaybeWithKey f) = mapMaybeWithKey f

I'm thinking about (k -> Maybe a -> Maybe b -> Maybe c) -> Map k a -> Map k b -> Map k c The first two Maybe's tell if the keys are present, the last if one wants it in the resulting map.

That has the same behavior semantically, but it's no more efficient than performing a unionWith followed by a filter. For example, consider implementing: xs `intersection` singleton k v in this way. With the function given, the complexity is necessarily O(size xs): we must traverse every key/value pair in xs. By contrast, by aggregating the operations on keys that exist only in a single map, we can write functions like intersection with the desired complexity (which is O(log (size xs)) in this case).

...

...
Yes. On the other hand, I've repeatedly run into the same problem you're describing: a api that doesn't give me an efficient way to perform an operation I know to be very simple. If every map provided an operation like this one, then I can avoid writing my own library "from scratch" when I need it --- especially when "from scratch" means "fork the code and add what I need".

So, library implementors: think hard about your "swiss army knife" combinators. You end up with messy functions with gigantic signatures. On the other hand, you can often add a couple of judicious INLINE annotations and remove tons of code from the rest of your library. Then expose them, and mark them as the functions of last resort that they truly are.

One can transverse the product of keys. Then I'm thinking about (k1 -> k2 -> a -> b -> Maybe c -> Maybe(k, c)) -> Map k1 a -> Map k2 b -> Map k c The first Maybe tells if the key is already present; the second if one wants it inserted.

Traversing cross products is a very different operation from zipping in the key space. Again I wouldn't want to mistakenly substitute one for the other! But in this case I think you'll find that you're already well served by the functions that already exist---adding this function doesn't enable you to do anything more efficiently (at least in a big-O sense).

...

The idea in both cases is to combine the modifying functions into one. This simplifies the interface.

Understood, and with a sufficiently smart compiler we might analyze the behavior of the function and do the right thing with the single-function interface in both cases. I have yet to encounter a compiler that is both smart and reliable on this count. That is why I've found it necessary to expose these kinds of functions. -Jan

...

Hans

Hans Aberg

9:17 p.m.

On 27 Jan 2010, at 21:29, Jan-Willem Maessen wrote:

...

...
I'm thinking about (k -> Maybe a -> Maybe b -> Maybe c) -> Map k a -> Map k b -> Map k c The first two Maybe's tell if the keys are present, the last if one wants it in the resulting map.

That has the same behavior semantically, but it's no more efficient than performing a unionWith followed by a filter.

It would not be the complexity, but the constant factor, by not having to transverse it twice. I'm not sure how it works in a functional language, and the trees must be rebalanced, too.

...

For example, consider implementing:

xs `intersection` singleton k v

in this way. With the function given, the complexity is necessarily O(size xs): we must traverse every key/value pair in xs. By contrast, by aggregating the operations on keys that exist only in a single map, we can write functions like intersection with the desired complexity (which is O(log (size xs)) in this case).

That is a good point.

...

...
One can transverse the product of keys. Then I'm thinking about (k1 -> k2 -> a -> b -> Maybe c -> Maybe(k, c)) -> Map k1 a -> Map k2 b -> Map k c The first Maybe tells if the key is already present; the second if one wants it inserted.

Traversing cross products is a very different operation from zipping in the key space. Again I wouldn't want to mistakenly substitute one for the other!

For the first one, think of sums of sparse matrices or polynomials, and the second, products.

...

But in this case I think you'll find that you're already well served by the functions that already exist---adding this function doesn't enable you to do anything more efficiently (at least in a big-O sense).

Yes, I can't implements (-) directly; it must be a combination of (+) and negate, and the 0 must be filtered out in an extra pass. And for gcd of monomials, it is now a combination of lcm, (*) and (/), instead of a single pass. Unfortunately, these operations are used a lot, so getting a smaller constant factor is important.

...

...
The idea in both cases is to combine the modifying functions into one. This simplifies the interface.

Understood, and with a sufficiently smart compiler we might analyze the behavior of the function and do the right thing with the single- function interface in both cases. I have yet to encounter a compiler that is both smart and reliable on this count. That is why I've found it necessary to expose these kinds of functions.

By your example above, there may be a conflict between a general, simple interface, and implementing things like intersections. Hans

Twan van Laarhoven

28 Jan 28 Jan

2:09 a.m.

Hans Aberg wrote:

...

For example, in Map String Integer (sparse representation of monomials) compute the minimum value of all associative pairs with the same key (the gcd); if only one key is present, the absent should be treated as having value 0. So unionWith min xs ys will not work, because unionWith will always apply the identity to the remaining value when one key is missing, whereas it should be sent to 0.

If missing keys represent 0, then wouldn't this work? intersectionWith min xs ys Twan

Hans Aberg

12:25 p.m.

On 28 Jan 2010, at 03:09, Twan van Laarhoven wrote:

...

...
For example, in Map String Integer (sparse representation of monomials) compute the minimum value of all associative pairs with the same key (the gcd); if only one key is present, the absent should be treated as having value 0. So unionWith min xs ys will not work, because unionWith will always apply the identity to the remaining value when one key is missing, whereas it should be sent to 0.

If missing keys represent 0, then wouldn't this work?

intersectionWith min xs ys

Yes, that works for the gcd function using min. Thank you - this function is used a lot, so it is good to have it simple. For the general problem, one still needs a better interface. For (-), though missing keys represent 0, if the first is missing, the second should be negated. And if both are present, one should filter out keys with 0 value. Right now, this is a combination of negate, (+), and filter (/= 0). Hans

5645

Age (days ago)

5646

Last active (days ago)

List overview

Download

8 comments

3 participants

participants (3)

Hans Aberg
Jan-Willem Maessen
Twan van Laarhoven

Map unionWith generalization

tags

participants (3)