Proposal: Remove Semigroup and Monoid instances for Data.Map, Data.IntMap, Data.HashMap

newer
Re: [core libraries] Re: Proposal:...

David Feuer

13 Feb 2018 13 Feb '18

7:33 p.m.

Many people have recognized for years that the Semigroup and Monoid instances for Data.Map, Data.IntMap, and Data.HashMap are not so great. In particular, when the same key is present in both maps, they simply use the value from the first argument, ignoring the other one. This somewhat counter-intuitive behavior can lead to bugs. See, for example, the discussion in Tim Humphries's blog post[*]. I would like do do the following: 1. Deprecate the Semigroup and Monoid instances for Data.Map.Map, Data.IntMap.IntMap, and Data.HashMap.HashMap in the next releases of containers and unordered-containers. 2. Remove the deprecated instances. 3. After another several years (four or five, perhaps?), make a major release of each package in which the instances are replaced with the following: instance (Ord k, Semigroup v) => Semigroup (Map k v) where (<>) = Data.Map.Strict.unionWith (<>) instance (Ord k, Semigroup v) => Monoid (Map k v) where mempty = Data.Map.Strict.empty instance Semigroup v => Semigroup (IntMap v) where (<>) = Data.IntMap.Strict.unionWith (<>) instance Semigroup v => Monoid (IntMap v) where mempty = Data.IntMap.Strict.empty instance (Eq k, Hashable k, Semigroup v) => Semigroup (HashMap k v) where (<>) = Data.HashMap.Strict.unionWith (<>) instance (Eq k, Hashable k, Semigroup v) => Monoid(HashMap k v) where mempty = Data.HashMap.Strict.empty Why do I want the strict versions? That choice may seem a bit surprising, since the data structures are lazy. But the lazy versions really like to leak memory, making them unsuitable for most practical purposes. The big risk: Someone using old code or old tutorial documentation could get subtly wrong behavior without noticing. That is why I have specified an extended period between removing the current instances and adding the desired ones. Alternatives: 1. Remove the instances but don't add the new ones. I fear this may lead others to write their own orphan instances, which may not even all do the same thing. 2. Write separate modules with newtype-wrapped versions of the data structures implementing the desired instances. Unfortunately, this would be annoying to maintain, and also annoying to use--packages using the unwrapped and wrapped versions will use different types. Manually wrapping and unwrapping to make the different types work with each other will introduce lots of potential for mistakes and confusion. Discussion period: three weeks. [*] http://teh.id.au/posts/2017/03/03/map-merge/index.html

Show replies by date

Andrew Martin

13 Feb 13 Feb

7:43 p.m.

I am strongly in favor of this proposal. I've actually proposal the same thing (with additional motivations) on the GHC Trac: https://ghc.haskell.org/trac/ghc/ticket/1460 I also agree the Monoid instance should be strict in the values. I'm not sure that the Monoidless Map phase needs to last four or five years (I was thinking more like two or three years), but that discussion could happen later. On Tue, Feb 13, 2018 at 2:33 PM, David Feuer wrote:

...

Many people have recognized for years that the Semigroup and Monoid instances for Data.Map, Data.IntMap, and Data.HashMap are not so great. In particular, when the same key is present in both maps, they simply use the value from the first argument, ignoring the other one. This somewhat counter-intuitive behavior can lead to bugs. See, for example, the discussion in Tim Humphries's blog post[*]. I would like do do the following:

1. Deprecate the Semigroup and Monoid instances for Data.Map.Map, Data.IntMap.IntMap, and Data.HashMap.HashMap in the next releases of containers and unordered-containers.

2. Remove the deprecated instances.

3. After another several years (four or five, perhaps?), make a major release of each package in which the instances are replaced with the following:

instance (Ord k, Semigroup v) => Semigroup (Map k v) where (<>) = Data.Map.Strict.unionWith (<>) instance (Ord k, Semigroup v) => Monoid (Map k v) where mempty = Data.Map.Strict.empty

instance Semigroup v => Semigroup (IntMap v) where (<>) = Data.IntMap.Strict.unionWith (<>) instance Semigroup v => Monoid (IntMap v) where mempty = Data.IntMap.Strict.empty

instance (Eq k, Hashable k, Semigroup v) => Semigroup (HashMap k v) where (<>) = Data.HashMap.Strict.unionWith (<>) instance (Eq k, Hashable k, Semigroup v) => Monoid(HashMap k v) where mempty = Data.HashMap.Strict.empty

Why do I want the strict versions? That choice may seem a bit surprising, since the data structures are lazy. But the lazy versions really like to leak memory, making them unsuitable for most practical purposes.

The big risk:

Someone using old code or old tutorial documentation could get subtly wrong behavior without noticing. That is why I have specified an extended period between removing the current instances and adding the desired ones.

Alternatives:

1. Remove the instances but don't add the new ones. I fear this may lead others to write their own orphan instances, which may not even all do the same thing.

2. Write separate modules with newtype-wrapped versions of the data structures implementing the desired instances. Unfortunately, this would be annoying to maintain, and also annoying to use--packages using the unwrapped and wrapped versions will use different types. Manually wrapping and unwrapping to make the different types work with each other will introduce lots of potential for mistakes and confusion.

Discussion period: three weeks.

[*] http://teh.id.au/posts/2017/03/03/map-merge/index.html _______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

-- -Andrew Thaddeus Martin

John Wiegley

8:23 p.m.

...

...
...
...
...
"AM" == Andrew Martin writes:

AM> I am strongly in favor of this proposal. I've actually proposal the same AM> thing (with additional motivations) on the GHC Trac: https:// AM> ghc.haskell.org/trac/ghc/ticket/1460 +1 here too -- John Wiegley GPG fingerprint = 4710 CF98 AF9B 327B B80F http://newartisans.com 60E1 46C4 BD1A 7AC1 4BA2

Chris Wong

7:51 p.m.

Hi David, On Feb 14, 2018 08:34, "David Feuer" wrote: Alternatives: 1. Remove the instances but don't add the new ones. I fear this may lead others to write their own orphan instances, which may not even all do the same thing. 2. Write separate modules with newtype-wrapped versions of the data structures implementing the desired instances. Unfortunately, this would be annoying to maintain, and also annoying to use--packages using the unwrapped and wrapped versions will use different types. Manually wrapping and unwrapping to make the different types work with each other will introduce lots of potential for mistakes and confusion. A third alternative is to "poison" the instances by placing TypeError in their constraints. This would prevent users from writing orphan instances, as they would overlap with the poisoned ones. On other Haskell implementations (if we still support them), we can use a private class with no instances for this purpose. See https://hackage.haskell.org/package/base-4.10.1.0/docs/GHC-TypeLits.html#t:T... Chris Discussion period: three weeks. [*] http://teh.id.au/posts/2017/03/03/map-merge/index.html _______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

Mario Blažević

7:54 p.m.

+1. But whatever happened to your proposal from last May? I don't think there were any objections to it. Would the two proposals be combined, or have you decided to drop the previous one? https://mail.haskell.org/pipermail/libraries/2017-May/028036.html On 2017-05-25 12:55 PM, David Feuer wrote:

...

A lot of people have wrappers around Data.Map and Data.IntMap to give them more useful (Semigroup and) Monoid instances. I'd like to add such wrappers to containers. What we need to be able to do that are *names* for the new modules. I can't think of any, so I'm reaching out to the list. Please suggest names! Another question is whether we should take the opportunity of new modules to modernize and streamline the API a bit. I'd like, at least, to separate "safe" from "unsafe" functions, putting the unsafe ones in .Unsafe modules.

On 2018-02-13 02:33 PM, David Feuer wrote:

...

Many people have recognized for years that the Semigroup and Monoid instances for Data.Map, Data.IntMap, and Data.HashMap are not so great. In particular, when the same key is present in both maps, they simply use the value from the first argument, ignoring the other one. This somewhat counter-intuitive behavior can lead to bugs. See, for example, the discussion in Tim Humphries's blog post[*]. I would like do do the following:

1. Deprecate the Semigroup and Monoid instances for Data.Map.Map, Data.IntMap.IntMap, and Data.HashMap.HashMap in the next releases of containers and unordered-containers.

2. Remove the deprecated instances.

3. After another several years (four or five, perhaps?), make a major release of each package in which the instances are replaced with the following:

instance (Ord k, Semigroup v) => Semigroup (Map k v) where (<>) = Data.Map.Strict.unionWith (<>) instance (Ord k, Semigroup v) => Monoid (Map k v) where mempty = Data.Map.Strict.empty

instance Semigroup v => Semigroup (IntMap v) where (<>) = Data.IntMap.Strict.unionWith (<>) instance Semigroup v => Monoid (IntMap v) where mempty = Data.IntMap.Strict.empty

instance (Eq k, Hashable k, Semigroup v) => Semigroup (HashMap k v) where (<>) = Data.HashMap.Strict.unionWith (<>) instance (Eq k, Hashable k, Semigroup v) => Monoid(HashMap k v) where mempty = Data.HashMap.Strict.empty

Why do I want the strict versions? That choice may seem a bit surprising, since the data structures are lazy. But the lazy versions really like to leak memory, making them unsuitable for most practical purposes.

The big risk:

Someone using old code or old tutorial documentation could get subtly wrong behavior without noticing. That is why I have specified an extended period between removing the current instances and adding the desired ones.

Alternatives:

1. Remove the instances but don't add the new ones. I fear this may lead others to write their own orphan instances, which may not even all do the same thing.

2. Write separate modules with newtype-wrapped versions of the data structures implementing the desired instances. Unfortunately, this would be annoying to maintain, and also annoying to use--packages using the unwrapped and wrapped versions will use different types. Manually wrapping and unwrapping to make the different types work with each other will introduce lots of potential for mistakes and confusion.

Discussion period: three weeks.

[*] http://teh.id.au/posts/2017/03/03/map-merge/index.html _______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

David Feuer

7:59 p.m.

I don't think the old proposal was really the right way. On Tue, Feb 13, 2018 at 2:54 PM, Mario Blažević wrote:

...

+1. But whatever happened to your proposal from last May? I don't think there were any objections to it. Would the two proposals be combined, or have you decided to drop the previous one?

https://mail.haskell.org/pipermail/libraries/2017-May/028036.html

On 2017-05-25 12:55 PM, David Feuer wrote:

...
A lot of people have wrappers around Data.Map and Data.IntMap to give them more useful (Semigroup and) Monoid instances. I'd like to add such wrappers to containers. What we need to be able to do that are *names* for the new modules. I can't think of any, so I'm reaching out to the list. Please suggest names! Another question is whether we should take the opportunity of new modules to modernize and streamline the API a bit. I'd like, at least, to separate "safe" from "unsafe" functions, putting the unsafe ones in .Unsafe modules.

On 2018-02-13 02:33 PM, David Feuer wrote:

...
Many people have recognized for years that the Semigroup and Monoid instances for Data.Map, Data.IntMap, and Data.HashMap are not so great. In particular, when the same key is present in both maps, they simply use the value from the first argument, ignoring the other one. This somewhat counter-intuitive behavior can lead to bugs. See, for example, the discussion in Tim Humphries's blog post[*]. I would like do do the following:

1. Deprecate the Semigroup and Monoid instances for Data.Map.Map, Data.IntMap.IntMap, and Data.HashMap.HashMap in the next releases of containers and unordered-containers.

2. Remove the deprecated instances.

3. After another several years (four or five, perhaps?), make a major release of each package in which the instances are replaced with the following:

instance (Ord k, Semigroup v) => Semigroup (Map k v) where (<>) = Data.Map.Strict.unionWith (<>) instance (Ord k, Semigroup v) => Monoid (Map k v) where mempty = Data.Map.Strict.empty

instance Semigroup v => Semigroup (IntMap v) where (<>) = Data.IntMap.Strict.unionWith (<>) instance Semigroup v => Monoid (IntMap v) where mempty = Data.IntMap.Strict.empty

instance (Eq k, Hashable k, Semigroup v) => Semigroup (HashMap k v) where (<>) = Data.HashMap.Strict.unionWith (<>) instance (Eq k, Hashable k, Semigroup v) => Monoid(HashMap k v) where mempty = Data.HashMap.Strict.empty

Why do I want the strict versions? That choice may seem a bit surprising, since the data structures are lazy. But the lazy versions really like to leak memory, making them unsuitable for most practical purposes.

The big risk:

Someone using old code or old tutorial documentation could get subtly wrong behavior without noticing. That is why I have specified an extended period between removing the current instances and adding the desired ones.

Alternatives:

1. Remove the instances but don't add the new ones. I fear this may lead others to write their own orphan instances, which may not even all do the same thing.

2. Write separate modules with newtype-wrapped versions of the data structures implementing the desired instances. Unfortunately, this would be annoying to maintain, and also annoying to use--packages using the unwrapped and wrapped versions will use different types. Manually wrapping and unwrapping to make the different types work with each other will introduce lots of potential for mistakes and confusion.

Discussion period: three weeks.

[*] http://teh.id.au/posts/2017/03/03/map-merge/index.html _______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

Kris Nuttycombe

7:56 p.m.

I actually had this very issue bite me *in an interview*. I would love these instances to be removed, and probably not replaced directly - instead, I think we should take the `Sum`/`Product` approach, and provide instances for newtypes around these type constructors so that there can be no backwards-incompatible-semantics issues. On Tue, Feb 13, 2018 at 12:33 PM, David Feuer wrote:

...

Many people have recognized for years that the Semigroup and Monoid instances for Data.Map, Data.IntMap, and Data.HashMap are not so great. In particular, when the same key is present in both maps, they simply use the value from the first argument, ignoring the other one. This somewhat counter-intuitive behavior can lead to bugs. See, for example, the discussion in Tim Humphries's blog post[*]. I would like do do the following:

1. Deprecate the Semigroup and Monoid instances for Data.Map.Map, Data.IntMap.IntMap, and Data.HashMap.HashMap in the next releases of containers and unordered-containers.

2. Remove the deprecated instances.

3. After another several years (four or five, perhaps?), make a major release of each package in which the instances are replaced with the following:

instance (Ord k, Semigroup v) => Semigroup (Map k v) where (<>) = Data.Map.Strict.unionWith (<>) instance (Ord k, Semigroup v) => Monoid (Map k v) where mempty = Data.Map.Strict.empty

instance Semigroup v => Semigroup (IntMap v) where (<>) = Data.IntMap.Strict.unionWith (<>) instance Semigroup v => Monoid (IntMap v) where mempty = Data.IntMap.Strict.empty

instance (Eq k, Hashable k, Semigroup v) => Semigroup (HashMap k v) where (<>) = Data.HashMap.Strict.unionWith (<>) instance (Eq k, Hashable k, Semigroup v) => Monoid(HashMap k v) where mempty = Data.HashMap.Strict.empty

Why do I want the strict versions? That choice may seem a bit surprising, since the data structures are lazy. But the lazy versions really like to leak memory, making them unsuitable for most practical purposes.

The big risk:

Someone using old code or old tutorial documentation could get subtly wrong behavior without noticing. That is why I have specified an extended period between removing the current instances and adding the desired ones.

Alternatives:

1. Remove the instances but don't add the new ones. I fear this may lead others to write their own orphan instances, which may not even all do the same thing.

2. Write separate modules with newtype-wrapped versions of the data structures implementing the desired instances. Unfortunately, this would be annoying to maintain, and also annoying to use--packages using the unwrapped and wrapped versions will use different types. Manually wrapping and unwrapping to make the different types work with each other will introduce lots of potential for mistakes and confusion.

Discussion period: three weeks.

[*] http://teh.id.au/posts/2017/03/03/map-merge/index.html _______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

Herbert Valerio Riedel

10:28 p.m.

David, On 2018-02-13 at 14:33:45 -0500, David Feuer wrote: [...]

...

1. Deprecate the Semigroup and Monoid instances for Data.Map.Map, Data.IntMap.IntMap, and Data.HashMap.HashMap in the next releases of containers and unordered-containers.

2. Remove the deprecated instances.

3. After another several years (four or five, perhaps?), make a major release of each package in which the instances are replaced with the following:

Why does this need to be a such a dreadfully long-winded process? All we need is a way to somehow signal a semantic change in the exposed API -- and it turns out that we actually already have the technology for this! This is exactly one of the core ideas of "semantic versioning" (as a dep-solver-computation-friendly proxy for machine-checkable formal API contracts...) and it's got even "semantic" in its name! In other words, the proposal can be greatly simplified to 1. Make a new major release (or maybe even make it a super-major release, i.e. to `containers-1.0.0.0` for extra saliency) replacing the instances with the more desirable ones; describe the change prominently in the changelog. Profit! Life's short; do we really need a multi-generational journey where the original supporters may not even be around anymore to see the proposal reach its final destination... ;-) Cheers, HVR

Matt Renaud

10:39 p.m.

+1 to HVR's suggestion, this could be a great opportunity to clean out all the old cruft from the containers and unordered-containers APIs in one go. That combined with all the documentation improvements would warrant a 1.0.0.0 for both packages IMHO :) I'd be happy to help with figuring out what that 1.0.0.0 API could/should look like, since it should almost certainly include a lot of community input/feedback. On Tue, Feb 13, 2018 at 2:28 PM, Herbert Valerio Riedel wrote:

...

David,

On 2018-02-13 at 14:33:45 -0500, David Feuer wrote:

[...]

...
1. Deprecate the Semigroup and Monoid instances for Data.Map.Map, Data.IntMap.IntMap, and Data.HashMap.HashMap in the next releases of containers and unordered-containers.

2. Remove the deprecated instances.

3. After another several years (four or five, perhaps?), make a major release of each package in which the instances are replaced with the following:

Why does this need to be a such a dreadfully long-winded process? All we need is a way to somehow signal a semantic change in the exposed API -- and it turns out that we actually already have the technology for this!

This is exactly one of the core ideas of "semantic versioning" (as a dep-solver-computation-friendly proxy for machine-checkable formal API contracts...) and it's got even "semantic" in its name!

In other words, the proposal can be greatly simplified to

1. Make a new major release (or maybe even make it a super-major release, i.e. to `containers-1.0.0.0` for extra saliency) replacing the instances with the more desirable ones; describe the change prominently in the changelog.

Profit!

Life's short; do we really need a multi-generational journey where the original supporters may not even be around anymore to see the proposal reach its final destination... ;-)

Cheers, HVR _______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

David Feuer

10:41 p.m.

We may not have to be *quite* as long-winded as I initially suggested, but I don't think your way is sufficiently long-winded. The advantage of removing the instances first and then adding them back is that the version(s) without the instances will make compilation of every single package that uses them fail. The way you suggest, if someone has a package using containers or unordered-containers, they don't even have a simple way to find out whether they need to make changes. On the opposite side, Chris Wong's suggestion would let us be very explicit, with a type error that warns users that the instance is missing *because it is changing* and that no code using the instance will work for both 0.* and 1.* versions. But I do think we should also consider Kris's more conservative approach. On Tue, Feb 13, 2018 at 5:28 PM, Herbert Valerio Riedel wrote:

...

David,

On 2018-02-13 at 14:33:45 -0500, David Feuer wrote:

[...]

...
1. Deprecate the Semigroup and Monoid instances for Data.Map.Map, Data.IntMap.IntMap, and Data.HashMap.HashMap in the next releases of containers and unordered-containers.

2. Remove the deprecated instances.

3. After another several years (four or five, perhaps?), make a major release of each package in which the instances are replaced with the following:

Why does this need to be a such a dreadfully long-winded process? All we need is a way to somehow signal a semantic change in the exposed API -- and it turns out that we actually already have the technology for this!

This is exactly one of the core ideas of "semantic versioning" (as a dep-solver-computation-friendly proxy for machine-checkable formal API contracts...) and it's got even "semantic" in its name!

In other words, the proposal can be greatly simplified to

1. Make a new major release (or maybe even make it a super-major release, i.e. to `containers-1.0.0.0` for extra saliency) replacing the instances with the more desirable ones; describe the change prominently in the changelog.

Profit!

Life's short; do we really need a multi-generational journey where the original supporters may not even be around anymore to see the proposal reach its final destination... ;-)

Cheers, HVR

Herbert Valerio Riedel

10:48 p.m.

Hi, On 2018-02-13 at 17:41:25 -0500, David Feuer wrote:

...

We may not have to be *quite* as long-winded as I initially suggested, but I don't think your way is sufficiently long-winded. The advantage of removing the instances first and then adding them back is that the version(s) without the instances will make compilation of every single package that uses them fail. The way you suggest, if someone has a package using containers or unordered-containers, they don't even have a simple way to find out whether they need to make changes. On the opposite side, Chris Wong's suggestion would let us be very explicit, with a type error that warns users that the instance is missing *because it is changing* and that no code using the instance will work for both 0.* and 1.* versions.

Alright, then let's do a little Gedankenexperiment; what if you release a containers-0.9.0.0 (which drops the instances) and a containers-1.0.0.0 (which 'adds back' the desired new instances) on the same day! ...wouldn't this allow us to have the cake and eat it too? ;-)

David Feuer

10:52 p.m.

It would let us have some cake. Users would be able to test against 0.9, in theory. But they'd have to do it intentionally. And Stack-based projects would probably need some shenanigans to deal with a version of containers that doesn't come with GHC. So I really think that avoiding these subtle bugs requires at least a full GHC release cycle. On Feb 13, 2018 5:48 PM, "Herbert Valerio Riedel" wrote: Hi, On 2018-02-13 at 17:41:25 -0500, David Feuer wrote:

...

We may not have to be *quite* as long-winded as I initially suggested, but I don't think your way is sufficiently long-winded. The advantage of removing the instances first and then adding them back is that the version(s) without the instances will make compilation of every single package that uses them fail. The way you suggest, if someone has a package using containers or unordered-containers, they don't even have a simple way to find out whether they need to make changes. On the opposite side, Chris Wong's suggestion would let us be very explicit, with a type error that warns users that the instance is missing *because it is changing* and that no code using the instance will work for both 0.* and 1.* versions.

Matt Renaud

11:14 p.m.

...

avoiding these subtle bugs requires at least a full GHC release cycle

Now that GHC is on a 6 month release cycle this seems preferable to waiting 3-5 years. On Tue, Feb 13, 2018 at 2:52 PM, David Feuer wrote:

...

It would let us have some cake. Users would be able to test against 0.9, in theory. But they'd have to do it intentionally. And Stack-based projects would probably need some shenanigans to deal with a version of containers that doesn't come with GHC. So I really think that avoiding these subtle bugs requires at least a full GHC release cycle.

On Feb 13, 2018 5:48 PM, "Herbert Valerio Riedel" wrote:

Hi,

On 2018-02-13 at 17:41:25 -0500, David Feuer wrote:

...
We may not have to be *quite* as long-winded as I initially suggested, but I don't think your way is sufficiently long-winded. The advantage of removing the instances first and then adding them back is that the version(s) without the instances will make compilation of every single package that uses them fail. The way you suggest, if someone has a package using containers or unordered-containers, they don't even have a simple way to find out whether they need to make changes. On the opposite side, Chris Wong's suggestion would let us be very explicit, with a type error that warns users that the instance is missing *because it is changing* and that no code using the instance will work for both 0.* and 1.* versions.

Alright, then let's do a little Gedankenexperiment; what if you release a containers-0.9.0.0 (which drops the instances) and a containers-1.0.0.0 (which 'adds back' the desired new instances) on the same day!

...wouldn't this allow us to have the cake and eat it too? ;-)

_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

Herbert Valerio Riedel

11:50 p.m.

David,

...

...
Alright, then let's do a little Gedankenexperiment; what if you release a containers-0.9.0.0 (which drops the instances) and a containers-1.0.0.0 (which 'adds back' the desired new instances) on the same day!

...wouldn't this allow us to have the cake and eat it too? ;-)

...

It would let us have some cake. Users would be able to test against 0.9, in theory. But they'd have to do it intentionally.

This is what I was trying to get at: that's always the case. It doesn't matter whether you release it same day, a week later, or a year later. I have to intentionally *not* skip the 0.9 release. In fact, I'd want to skip the 0.9 release myself if possible, as supporting both, containers-0.9 and containers-1.0 in the same consumer code is going to be awkward enough for me not wanting to do that (or I'd have to basically not use the instances at all); If I care about the new instances, I'd rather specify `containers ^>= 1.0.0.0` and thus not support containers-0.9. I would have proceeded to point out in my Gedankenexperiment, that in order to use such a container-0.9 release, you'd have to force every package that depends on `containers` to go through such a 0.9 phase in lockstep, which in itself poses an ordeal in terms of package update logistic; and then if you spread the time between the 0.9 and the 1.0 release apart, repeat this dance a 2nd time, with yet another new major version bump, which requires yet another round of consumer-code reviews (*because* a major version increment was signalled; and we do that as API consumers are supposed to pay attention to the major version increment signal...) So consequently, if anything, a container-0.9 release would be a kind of an artificial pseudo-release anyway IMO. You could even just condense it into a cabal package flag of a containers-1.0 release, which disables the Monoid/Semigroup instances; then you don't even need a distinct container-0.9 release at all, nor do you have to contaminate the API progression with an artificial container-0.9 discontinuity.

...

And Stack-based projects would probably need some shenanigans to deal with a version of containers that doesn't come with GHC.

I'm not convinced we should let the weakest link hold us back. If there's limitations in the tooling, we should rather address them, rather than contort ourselves; c.f. the Genius Tailor ( http://www.wealthculture.cn/infoShow.asp?nid=880&sort=Article%20List )

...

So I really think that avoiding these subtle bugs requires at least a full GHC release cycle.

I don't think waiting a full GHC release would be either necessary nor effective at addressing your concerns, which I consider to be disputable arguments to begin with. -- hvr

Oleg Grenrus

14 Feb 14 Feb

3:48 p.m.

I don't like a "no-instances" window. We can change the instances to correct ones directly. That's what semantic versioning is about, to be able to communicate semantic changes. We should go directly to Semigroup v => Semigroup (Map k v) Semigroup v => Monoid (Map k v) If `containers-0.6` or `containers-1.0` included some other (compile-time) breaking change: even better. I want to point out that a lot of code *will* break at compile-time with the change. In polymorphic setting `v` doesn't have `Semigroup v` attached. Many concrete types arent `Semigroup` (e.g. `Int`; here and later when I say isn't `Semigroup`, I mean that it isn't `Monoid` either). I tried to compile our `futurice-prelude` [1] and one small size internal app (40 modules) with patched containers: things break at compile time This change *won't get thru unnoticed*. See below for details. Also people (we and others) use `mempty` for `Map.empty`, but for `mempty` semantics won't change. We might need to use `Map.empty` if `v` isn't `Semigroup`, but that's catched by compiler. Note: that would mean to use `Map.empty` everywhere if there is no-instances window. I also conjecture, based on the results, that `Ord k => Monoid (Map k v)` for `union` is currently avoided in libs. Maybe because it does *the wrong thing*, so people don't use it. In our own 50kLOC codebase we have three `Map.unionWith` or `Map.unionsWith`, one of which is groupThings = Map.unionsWith (<>) . map f). I think `Map k String` or `Map k Text` things might be common in private codebases (not ours). Maybe not the most popular opinion, but I think they deserve to break. Something else: If we replace the instance, we have to leave it out of `containers` for older than `base-4.9`. - Either we provide `Semigroup` (and `Monoid` !) in `semigroups` (See +) or - just have `base >= 4.9` in that release of `containers` (that will be three GHCs soon). To conclude, The brekage on Hackage would be driven by things not having `Semigroup` instances, not semantics. There might be subtle breakages, but not something we as community cannot handle. So let's just make a change. Best regards, Oleg + At this point we want `Semigroup v` for `Monoid (Map k v)`, and because `semigroups` depend on `containers` (and not other way), we'd need to have `Monoid (Map k v)` also in `semigroups`. --- I did an experiment: compile our `futurice-prelude` [1] with amended `containers`, I changed only instances for `Map`. Removal of the instances breaks Cabal, so I could test far enough what happens: Distribution/Simple/Program/GHC.hs:546:12: error: • No instance for (Semigroup (Map Extension String)) arising from a use of ‘gmempty’ • In the expression: gmempty In an equation for ‘mempty’: mempty = gmempty In the instance declaration for ‘Monoid GhcOptions’ | 546 | mempty = gmempty | ^^^^^^^ This is one place where we want `union` semantics. OTOH in current development version of Cabal the map is of type Map Extension (Maybe Compiler.Flag), as we moving avoid from stringly-typed programming. And `Flag` doesn't have `Semigroup` instance. Second step: Let's see where Monoid (Map k v) is used. We cannot deprecate the instance, but we can se where it's used by a -fdefer-type-error trick (thanks Bartosz Nitka aka niteria [2]) Note that this step actually has changes from third step already, so we don't see failures where `Semigroup v` context would have caused compilation failure. To my happy surprise the instance were used only once in Cabal-2.0.1.1 (above), and then in `futurice-prelude` itself where it would also break with `Semigroup v`. So 231 packages at the bottom of dependency graph of lens+servant apps don't use `Monoid (Map k v)`. Interesting fact, I'd say. Third step: If I change instance to use `unionWith (<>)`, then the first failure in our production code is `reducers-3.12.2` src/Data/Semigroup/Reducer.hs:233:10: error: • Could not deduce (Semigroup v) arising from the superclasses of an instance declaration from the context: Ord k bound by the instance declaration at src/Data/Semigroup/Reducer.hs:233:10-42 Possible fix: add (Semigroup v) to the context of the instance declaration • In the instance declaration for ‘Reducer (k, v) (Map k v)’ | 233 | instance Ord k => Reducer (k, v) (Map k v) where and `servant-0.13` src/Servant/Server/Internal.hs:691:34: error: • No instance for (Data.Semigroup.Semigroup (Router' env RoutingApplication)) arising from a use of ‘mempty’ • In the first argument of ‘StaticRouter’, namely ‘mempty’ In the expression: StaticRouter mempty mempty In an equation for ‘route’: route Proxy _ _ = StaticRouter mempty mempty | 691 | route Proxy _ _ = StaticRouter mempty mempty and `swagger2-2.2`. `SwaggerMonoid` differes for `Maybe`, but has default implementation in terms of `Monoid`. src/Data/Swagger/Internal/Utils.hs:116:10: error: • Could not deduce (Data.Semigroup.Semigroup v) arising from a use of ‘Data.Swagger.Internal.Utils.$dmswaggerMappend’ from the context: Ord k bound by the instance declaration at src/Data/Swagger/Internal/Utils.hs:116:10-41 Possible fix: add (Data.Semigroup.Semigroup v) to the context of the instance declaration • In the expression: Data.Swagger.Internal.Utils.$dmswaggerMappend @Map k v In an equation for ‘swaggerMappend’: swaggerMappend = Data.Swagger.Internal.Utils.$dmswaggerMappend @Map k v In the instance declaration for ‘SwaggerMonoid (Map k v)’ | 116 | instance Ord k => SwaggerMonoid (Map k v) And then we get to `futurice-prelude` which also fails: src/Futurice/Prelude.hs:177:41: error: • Could not deduce (Semigroup v) arising from a use of ‘ifolded’ Funny thing about that code, is that there we work with `Map k (Map l v)` and fold them. For other one we want `UnionWith` and for other any semantic should be ok (we have tests, they *will* break if I get things wrong). We also have a `Map` wrapper which also breaks loudly. Fourth step: fix above and contunue investigation where `Monoid (Map k v)` is used. I built one internal application which pulls in even more dependencies in addition to what `futurice-prelude` already depends upon (465 deps, e.g. `amazonka`). - Two of our internal libraries use `mempty` (second one a lot). Not breaking. - One package uses a `Map` wrapper used above (for the sake of experiment I passed the `Warning` constraint down) - The app itself uses `mempty` of `folded` on things which aren't `Semigroup`. - `yaml-0.8.28` breaks: Data/Yaml/Parser.hs:154:13: error: • Could not deduce (Data.Map.Internal.Warning YamlValue) arising from a use of ‘await’ `YamlValue` doesn't have a `Semigroup` (or `Monoid`) instance. - `amazonka-core-1.5.0`: `mempty`: non silent breakage. src/Network/AWS/Data/XML.hs:252:49: error: • No instance for (containers-0.5.11.0:Data.Map.Internal.Warning Text) arising from a use of ‘mempty’ • In the second argument of ‘Element’, namely ‘mempty’ In the first argument of ‘NodeElement’, namely ‘(Element (name k) mempty [NodeContent v])’ In the expression: NodeElement (Element (name k) mempty [NodeContent v]) | 252 | node (k, v) = NodeElement (Element (name k) mempty [NodeContent v]) [1]: https://github.com/futurice/haskell-futurice-prelude [2]: https://ghc.haskell.org/trac/ghc/ticket/12014#comment:6 On 14.02.2018 01:50, Herbert Valerio Riedel wrote:

...

David,

...
...
Alright, then let's do a little Gedankenexperiment; what if you release a containers-0.9.0.0 (which drops the instances) and a containers-1.0.0.0 (which 'adds back' the desired new instances) on the same day!

...wouldn't this allow us to have the cake and eat it too? ;-) It would let us have some cake. Users would be able to test against 0.9, in theory. But they'd have to do it intentionally. This is what I was trying to get at: that's always the case. It doesn't matter whether you release it same day, a week later, or a year later. I have to intentionally *not* skip the 0.9 release. In fact, I'd want to skip the 0.9 release myself if possible, as supporting both, containers-0.9 and containers-1.0 in the same consumer code is going to be awkward enough for me not wanting to do that (or I'd have to basically not use the instances at all); If I care about the new instances, I'd rather specify `containers ^>= 1.0.0.0` and thus not support containers-0.9.

I would have proceeded to point out in my Gedankenexperiment, that in order to use such a container-0.9 release, you'd have to force every package that depends on `containers` to go through such a 0.9 phase in lockstep, which in itself poses an ordeal in terms of package update logistic; and then if you spread the time between the 0.9 and the 1.0 release apart, repeat this dance a 2nd time, with yet another new major version bump, which requires yet another round of consumer-code reviews (*because* a major version increment was signalled; and we do that as API consumers are supposed to pay attention to the major version increment signal...)

So consequently, if anything, a container-0.9 release would be a kind of an artificial pseudo-release anyway IMO. You could even just condense it into a cabal package flag of a containers-1.0 release, which disables the Monoid/Semigroup instances; then you don't even need a distinct container-0.9 release at all, nor do you have to contaminate the API progression with an artificial container-0.9 discontinuity.

...
And Stack-based projects would probably need some shenanigans to deal with a version of containers that doesn't come with GHC. I'm not convinced we should let the weakest link hold us back. If there's limitations in the tooling, we should rather address them, rather than contort ourselves; c.f. the Genius Tailor ( http://www.wealthculture.cn/infoShow.asp?nid=880&sort=Article%20List )

...
So I really think that avoiding these subtle bugs requires at least a full GHC release cycle. I don't think waiting a full GHC release would be either necessary nor effective at addressing your concerns, which I consider to be disputable arguments to begin with.

-- hvr _______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

Carter Schonwald

17 Feb 17 Feb

4:41 p.m.

Plus one to Herbert’s idea. Let’s do a major version bump and tear the bandaid. The current left biased default ones are a recurrent source of errors and bugs in code, in my experiences thereof. On Tue, Feb 13, 2018 at 5:29 PM Herbert Valerio Riedel wrote:

...

David,

On 2018-02-13 at 14:33:45 -0500, David Feuer wrote:

[...]

...
1. Deprecate the Semigroup and Monoid instances for Data.Map.Map, Data.IntMap.IntMap, and Data.HashMap.HashMap in the next releases of containers and unordered-containers.

2. Remove the deprecated instances.

3. After another several years (four or five, perhaps?), make a major release of each package in which the instances are replaced with the following:

Why does this need to be a such a dreadfully long-winded process? All we need is a way to somehow signal a semantic change in the exposed API -- and it turns out that we actually already have the technology for this!

This is exactly one of the core ideas of "semantic versioning" (as a dep-solver-computation-friendly proxy for machine-checkable formal API contracts...) and it's got even "semantic" in its name!

In other words, the proposal can be greatly simplified to

1. Make a new major release (or maybe even make it a super-major release, i.e. to `containers-1.0.0.0` for extra saliency) replacing the instances with the more desirable ones; describe the change prominently in the changelog.

Profit!

Life's short; do we really need a multi-generational journey where the original supporters may not even be around anymore to see the proposal reach its final destination... ;-)

Cheers, HVR _______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

M Farkas-Dyck

13 Feb 13 Feb

11:44 p.m.

I am strongly in favor of parts 1 and 2, and HVR's proposal for immediately defining new instances with a major version bump — this is what major versions are for, and if we leave them undefined, it will encourage users to define orphan instances.

Henning Thielemann

14 Feb 14 Feb

1:35 a.m.

On Tue, 13 Feb 2018, David Feuer wrote:

...

1. Deprecate the Semigroup and Monoid instances for Data.Map.Map, Data.IntMap.IntMap, and Data.HashMap.HashMap in the next releases of containers and unordered-containers.

2. Remove the deprecated instances.

3. After another several years (four or five, perhaps?), make a major release of each package in which the instances are replaced with the following:

I like the plan. Re-adding with Semigroup superclass would prevent silent re-interpretation of code if it is generic.

Akio Takano

8:49 a.m.

Hi David, On 13 February 2018 at 19:33, David Feuer wrote:

...

Many people have recognized for years that the Semigroup and Monoid instances for Data.Map, Data.IntMap, and Data.HashMap are not so great. In particular, when the same key is present in both maps, they simply use the value from the first argument, ignoring the other one. This somewhat counter-intuitive behavior can lead to bugs. See, for example, the discussion in Tim Humphries's blog post[*]. I would like do do the following:

I'm against this proposal because I think the improvement is, if not negative, too small to justify breaking existing code. I can think of 3 reasonable definitions for (<>) for lazy maps: 1. (<>) = union 2. (<>) = unionWith (<>) 3. (<>) = Strict.unionWith (<>) Of these, (1) is by far the most useful operation in my experience. (2) has the advantage that it seems like the most obvious choice, but it's not very useful in practice. (3) is slightly more useful than (2), but feels less canonical. Also (3) seems inconsistent with how fmap is defined. So there doesn't seem to be a clear winner. Perhaps there shouldn't have been a Monoid instance for maps in the first place. Given that we already have one, however, it seems that the marginal gain of removing it doesn't justify the cost of breaking code. (If the removal is going to happen anyway, I vote for not adding back any instance.) Regards, Takano Akio

Evan Laforge

9:08 a.m.

On Wed, Feb 14, 2018 at 12:49 AM, Akio Takano wrote:

...

I'm against this proposal because I think the improvement is, if not negative, too small to justify breaking existing code.

I agree for the same reasons. I also find that unions is the most common. In my code, I have many uses of (<>) with the union meaning... it's hard to grep for, but I guess in the 40-80 range. Meanwhile, I have a Util.Map.mappend that also mappends the values, and it's used in 2 places (out of around 140k lines). It wouldn't be so bad to have to convert them all to Map.union, but doesn't seem worth the bother either.

Mario Blažević

2:47 p.m.

On 2018-02-14 04:08 AM, Evan Laforge wrote:

...

On Wed, Feb 14, 2018 at 12:49 AM, Akio Takano wrote:

...
I'm against this proposal because I think the improvement is, if not negative, too small to justify breaking existing code.

I agree for the same reasons.

I also find that unions is the most common. In my code, I have many uses of (<>) with the union meaning... it's hard to grep for, but I guess in the 40-80 range. Meanwhile, I have a Util.Map.mappend that also mappends the values, and it's used in 2 places (out of around 140k lines). It wouldn't be so bad to have to convert them all to Map.union, but doesn't seem worth the bother either.

I understand it's hard to grep, but can you guess how many of those 40-80 uses of (<>) with the union meaning actually depend on the left-biased semantics? I've looked at my code, and it turns out that wherever this operation appears, the two maps are guaranteed to have distinct keys. There is actually one use I'm not sure about, I'll have to make sure there's no bug hiding there. That is one reason I'm supporting the proposal, by the way: the existing behaviour that silently drops values is much more likely to cause bugs.

Joachim Breitner

3:03 p.m.

Hi, Am Mittwoch, den 14.02.2018, 09:47 -0500 schrieb Mario Blažević:

...

I understand it's hard to grep, but can you guess how many of those 40-80 uses of (<>) with the union meaning actually depend on the left-biased semantics? I've looked at my code, and it turns out that wherever this operation appears, the two maps are guaranteed to have distinct keys.

does this indicate the need for disjointUnion :: IntMap a -> IntMap a -> IntMap a that throws an error when the maps are not disjoint? (run-time error only, because we are not doing theorem proving here…) One can get the effect with unionWith (error "not disjoint) of course (and I have done so), but having `disjointUnion` in the API might remind people that they have to think about which variant they want. Cheers, Joachim -- Joachim Breitner mail@joachim-breitner.de http://www.joachim-breitner.de/

David Feuer

3:07 p.m.

This, of course, is another argument for Kris's idea of just offering a pile of newtypes, each with their own Semigroup (and sometimes Monoid) instance. On Wed, Feb 14, 2018 at 10:03 AM, Joachim Breitner wrote:

...

Hi,

Am Mittwoch, den 14.02.2018, 09:47 -0500 schrieb Mario Blažević:

...
I understand it's hard to grep, but can you guess how many of those 40-80 uses of (<>) with the union meaning actually depend on the left-biased semantics? I've looked at my code, and it turns out that wherever this operation appears, the two maps are guaranteed to have distinct keys.

does this indicate the need for

disjointUnion :: IntMap a -> IntMap a -> IntMap a

that throws an error when the maps are not disjoint? (run-time error only, because we are not doing theorem proving here…)

One can get the effect with

unionWith (error "not disjoint)

of course (and I have done so), but having `disjointUnion` in the API might remind people that they have to think about which variant they want.

Cheers, Joachim

-- Joachim Breitner mail@joachim-breitner.de http://www.joachim-breitner.de/ _______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

Mario Blažević

4:13 p.m.

On 2018-02-14 10:07 AM, David Feuer wrote:

...

This, of course, is another argument for Kris's idea of just offering a pile of newtypes, each with their own Semigroup (and sometimes Monoid) instance.

I would gladly use disjointUnion where appropriate, but I don't think I'd ever use a newtype providing the same effect. Why? There are two situations when I depend on union* or (<>) on maps: 1. Directly at the top level, where I can use either union or (<>) on two known maps. In this situation I often use (<>) only because (a <> b) is shorter and easier to read then (union a b), but that criterion points the other way if I have to say (fromDisjoint (Disjoint a <> Disjoint b)) instead of (disjointUnion a b). 2. Indirectly when folding a Foldable collection of maps. In this case I have no choice but to use (<>), but I really can't think of any occasion when all maps in the collection are guaranteed to have distinct keys.

...

On Wed, Feb 14, 2018 at 10:03 AM, Joachim Breitner wrote:

...
Hi,

Am Mittwoch, den 14.02.2018, 09:47 -0500 schrieb Mario Blažević:

...
I understand it's hard to grep, but can you guess how many of those 40-80 uses of (<>) with the union meaning actually depend on the left-biased semantics? I've looked at my code, and it turns out that wherever this operation appears, the two maps are guaranteed to have distinct keys.

does this indicate the need for

disjointUnion :: IntMap a -> IntMap a -> IntMap a

that throws an error when the maps are not disjoint? (run-time error only, because we are not doing theorem proving here…)

One can get the effect with

unionWith (error "not disjoint)

of course (and I have done so), but having `disjointUnion` in the API might remind people that they have to think about which variant they want.

Matt Renaud

4:17 p.m.

Minor nomenclature note: disjoint union has a specific meaning https://en.wikipedia.org/wiki/Disjoint_union in set theory which is different than how its being used here. On Wed, Feb 14, 2018 at 8:13 AM, Mario Blažević wrote:

...

On 2018-02-14 10:07 AM, David Feuer wrote:

...
This, of course, is another argument for Kris's idea of just offering a pile of newtypes, each with their own Semigroup (and sometimes Monoid) instance.

I would gladly use disjointUnion where appropriate, but I don't think I'd ever use a newtype providing the same effect. Why? There are two situations when I depend on union* or (<>) on maps:

1. Directly at the top level, where I can use either union or (<>) on two known maps. In this situation I often use (<>) only because (a <> b) is shorter and easier to read then (union a b), but that criterion points the other way if I have to say (fromDisjoint (Disjoint a <> Disjoint b)) instead of (disjointUnion a b).

2. Indirectly when folding a Foldable collection of maps. In this case I have no choice but to use (<>), but I really can't think of any occasion when all maps in the collection are guaranteed to have distinct keys.

On Wed, Feb 14, 2018 at 10:03 AM, Joachim Breitner

...
wrote:

...
Hi,

Am Mittwoch, den 14.02.2018, 09:47 -0500 schrieb Mario Blažević:

...
I understand it's hard to grep, but can you guess how many of those 40-80 uses of (<>) with the union meaning actually depend on the left-biased semantics? I've looked at my code, and it turns out that wherever this operation appears, the two maps are guaranteed to have distinct keys.

does this indicate the need for

disjointUnion :: IntMap a -> IntMap a -> IntMap a

that throws an error when the maps are not disjoint? (run-time error only, because we are not doing theorem proving here…)

One can get the effect with

unionWith (error "not disjoint)

of course (and I have done so), but having `disjointUnion` in the API might remind people that they have to think about which variant they want.

_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

Evan Laforge

15 Feb 15 Feb

9:25 p.m.

On Wed, Feb 14, 2018 at 6:47 AM, Mario Blažević wrote:

...

I understand it's hard to grep, but can you guess how many of those 40-80 uses of (<>) with the union meaning actually depend on the left-biased semantics? I've looked at my code, and it turns out that wherever this operation appears, the two maps are guaranteed to have distinct keys.

I know that in a few cases I do explicitly want "left wins", and I do frequently use (x<>) as a way to accumulate new data. Sometimes I expect that new data to override old stuff, sometimes I don't expect them to overlap, but it's not a problem if they do. In the cases where it is a problem (e.g. warn about shadowed names or something), I have Util.Map.unique_union :: Ord k => Map.Map k a -> Map.Map k a -> (Map.Map k a, Map.Map k a) -- ^ (union, rejected) I can grep for that: 3 calls, plus 11 more for the version that takes [(k, a)]. However, in the absence of actual numbers for (<>), don't take me too seriously. These kinds of feelings are often shown wrong by the real numbers. But even if I could count them, it's not like it proves anything. It comes down to style and problem domain. That said though, is there a way to grep for methods overloaded at a certain type? One thing I thought of was recompile Data.Map with a TypeError instance, but it seems it might be a hassle for a base package like containers. Otherwise I imagine you could parse core, look for resolved methods, and map back to file and line. Is there a general purpose core analysis suite out there?

...

There is actually one use I'm not sure about, I'll have to make sure there's no bug hiding there. That is one reason I'm supporting the proposal, by the way: the existing behaviour that silently drops values is much more likely to cause bugs.

It's true it does, but since I think of (<>) as union, and union does that too, it doesn't surprise me. Actually the surprising thing was that Map.union is left-biased, while it seems like most languages go the other way. But that was just a general haskell oddity that I accustomed myself to many years ago.

Oliver Charles

14 Feb 14 Feb

9:25 a.m.

On Wed, Feb 14, 2018 at 8:49 AM, Akio Takano wrote:

...

I can think of 3 reasonable definitions for (<>) for lazy maps:

1. (<>) = union 2. (<>) = unionWith (<>) 3. (<>) = Strict.unionWith (<>)

Of these, (1) is by far the most useful operation in my experience. (2) has the advantage that it seems like the most obvious choice, but it's not very useful in practice. (3) is slightly more useful than (2), but feels less canonical. Also (3) seems inconsistent with how fmap is defined.

I think you are very much in the minority here. (1) which we have now is almost never what I want. (2) is much more likely to be what I need. And of course, you can recover (1) with (2) by simply mapping First into every element. So there doesn't seem to be a clear winner. Perhaps there shouldn't

...

have been a Monoid instance for maps in the first place. Given that we already have one, however, it seems that the marginal gain of removing it doesn't justify the cost of breaking code.

There should be a Monoid instance, and http://conal.net/papers/type-class-morphisms/ argues that it should be (2).

Akio Takano

9:38 a.m.

Hi Oliver, On 14 February 2018 at 09:25, Oliver Charles wrote:

...

On Wed, Feb 14, 2018 at 8:49 AM, Akio Takano wrote:

...
I can think of 3 reasonable definitions for (<>) for lazy maps:

1. (<>) = union 2. (<>) = unionWith (<>) 3. (<>) = Strict.unionWith (<>)

Of these, (1) is by far the most useful operation in my experience. (2) has the advantage that it seems like the most obvious choice, but it's not very useful in practice. (3) is slightly more useful than (2), but feels less canonical. Also (3) seems inconsistent with how fmap is defined.

I think you are very much in the minority here. (1) which we have now is almost never what I want. (2) is much more likely to be what I need. And of course, you can recover (1) with (2) by simply mapping First into every element.

This is not true if you consider efficiency. Mapping `First` takes linear time. Even if you use `coerce` instead of `map First`, the resulting map will contain thunks that were not in either of the original maps.

...

...
So there doesn't seem to be a clear winner. Perhaps there shouldn't have been a Monoid instance for maps in the first place. Given that we already have one, however, it seems that the marginal gain of removing it doesn't justify the cost of breaking code.

There should be a Monoid instance, and http://conal.net/papers/type-class-morphisms/ argues that it should be (2).

That is an interesting paper, thanks. However, I don't think this is going to change the fact that I almost never want to use (2) in practice. - Akio

Oleg Grenrus

9:48 a.m.

I agree with Oliver, was just writing the very same. `Map.map First = Map.map coerce = coerce` i.e. mapping First is _zero-cost_, thanks to the fact `v` in `Map k v` has represenational role. (If there aren't RULES in place, it's a bug). See Zero-Cost coercions in Haskell https://www.microsoft.com/en-us/research/wp-content/uploads/2016/07/coercibl... Cheers, Oleg. On 14.02.2018 11:38, Akio Takano wrote:

...

Hi Oliver,

...
On Wed, Feb 14, 2018 at 8:49 AM, Akio Takano wrote:

...
I can think of 3 reasonable definitions for (<>) for lazy maps:

1. (<>) = union 2. (<>) = unionWith (<>) 3. (<>) = Strict.unionWith (<>)

Of these, (1) is by far the most useful operation in my experience. (2) has the advantage that it seems like the most obvious choice, but it's not very useful in practice. (3) is slightly more useful than (2), but feels less canonical. Also (3) seems inconsistent with how fmap is defined.

I think you are very much in the minority here. (1) which we have now is almost never what I want. (2) is much more likely to be what I need. And of course, you can recover (1) with (2) by simply mapping First into every element. This is not true if you consider efficiency. Mapping `First` takes

On 14 February 2018 at 09:25, Oliver Charles wrote: linear time. Even if you use `coerce` instead of `map First`, the resulting map will contain thunks that were not in either of the original maps.

...
...
So there doesn't seem to be a clear winner. Perhaps there shouldn't have been a Monoid instance for maps in the first place. Given that we already have one, however, it seems that the marginal gain of removing it doesn't justify the cost of breaking code.

There should be a Monoid instance, and http://conal.net/papers/type-class-morphisms/ argues that it should be (2). That is an interesting paper, thanks. However, I don't think this is going to change the fact that I almost never want to use (2) in practice.

- Akio _______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

Akio Takano

10:16 a.m.

Hi Oleg, On 14 February 2018 at 09:48, Oleg Grenrus wrote:

...

I agree with Oliver, was just writing the very same.

`Map.map First = Map.map coerce = coerce`

i.e. mapping First is _zero-cost_, thanks to the fact `v` in `Map k v` has represenational role.

Yes, I agree that mapping First is zero-cost. I'm saying that `unionWith (<>)` over the First monoid costs more than `union`, due to allocation of extra thunks. - Akio

David Feuer

2:47 p.m.

We can use rewrite rules to recognize and optimize unionWith in one or more special cases: unionWith f = plainUnionWith f {-# NOINLINE [1] unionWith #-} bottom1, bottom2 :: a bottom1 = undefined {-# NOINLINE bottom1 #-} bottom2 = undefined {-# NOINLINE bottom2 #-} unionWithMagic :: a -> (a -> a -> a) -> Map k a -> Map k a -> Map k a unionWithMagic _ f = plainUnionWith f {-# INLINE [1] unionWithMagic #-} RULES: unionWith f ===> unionWithMagic (f bottom1 bottom2) f unionWithMagic bottom1 f ===> union On Wed, Feb 14, 2018 at 5:16 AM, Akio Takano wrote:

...

Hi Oleg,

On 14 February 2018 at 09:48, Oleg Grenrus wrote:

...
I agree with Oliver, was just writing the very same.

`Map.map First = Map.map coerce = coerce`

i.e. mapping First is _zero-cost_, thanks to the fact `v` in `Map k v` has represenational role.

Yes, I agree that mapping First is zero-cost. I'm saying that `unionWith (<>)` over the First monoid costs more than `union`, due to allocation of extra thunks.

- Akio _______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

Bardur Arantsson

15 Feb 15 Feb

9:13 p.m.

[ blahblah :) ] +1 for the original suggestion. +2 for Kris' newtype-only suggestion -- especially if we can actually prevent orphan instances effectively. (I think I got that right, it's a big thread at this point.). IMO, it's fine to force people to at least *consider* which instance they want absent a "best" answer. I do think it's important that we are actually able to effectively prevent people from just declaring orphans willy-nilly. +0 for _hvr's dual-incompatible releases. It would be a +1, but my worry here is (as usual) the worry about *forcing* the whole ecosystem to upgrade all at once. Usually this is not actually a big deal, but "containers" is used by almost *everything*, so it's effectively a "split" until everything upgrades. (Obviously, things are not quite are dramatic as that since, presumably, relatively few packages actually use the instance in question, but...) Cheers,

Petr Pudlák

16 Feb 16 Feb

3:53 p.m.

+1 for the proposal! Dne út 13. 2. 2018 20:34 uživatel David Feuer napsal:

...

Many people have recognized for years that the Semigroup and Monoid instances for Data.Map, Data.IntMap, and Data.HashMap are not so great. In particular, when the same key is present in both maps, they simply use the value from the first argument, ignoring the other one. This somewhat counter-intuitive behavior can lead to bugs. See, for example, the discussion in Tim Humphries's blog post[*]. I would like do do the following:

1. Deprecate the Semigroup and Monoid instances for Data.Map.Map, Data.IntMap.IntMap, and Data.HashMap.HashMap in the next releases of containers and unordered-containers.

2. Remove the deprecated instances.

3. After another several years (four or five, perhaps?), make a major release of each package in which the instances are replaced with the following:

instance (Ord k, Semigroup v) => Semigroup (Map k v) where (<>) = Data.Map.Strict.unionWith (<>) instance (Ord k, Semigroup v) => Monoid (Map k v) where mempty = Data.Map.Strict.empty

instance Semigroup v => Semigroup (IntMap v) where (<>) = Data.IntMap.Strict.unionWith (<>) instance Semigroup v => Monoid (IntMap v) where mempty = Data.IntMap.Strict.empty

instance (Eq k, Hashable k, Semigroup v) => Semigroup (HashMap k v) where (<>) = Data.HashMap.Strict.unionWith (<>) instance (Eq k, Hashable k, Semigroup v) => Monoid(HashMap k v) where mempty = Data.HashMap.Strict.empty

Why do I want the strict versions? That choice may seem a bit surprising, since the data structures are lazy. But the lazy versions really like to leak memory, making them unsuitable for most practical purposes.

The big risk:

Someone using old code or old tutorial documentation could get subtly wrong behavior without noticing. That is why I have specified an extended period between removing the current instances and adding the desired ones.

Alternatives:

1. Remove the instances but don't add the new ones. I fear this may lead others to write their own orphan instances, which may not even all do the same thing.

2. Write separate modules with newtype-wrapped versions of the data structures implementing the desired instances. Unfortunately, this would be annoying to maintain, and also annoying to use--packages using the unwrapped and wrapped versions will use different types. Manually wrapping and unwrapping to make the different types work with each other will introduce lots of potential for mistakes and confusion.

Discussion period: three weeks.

[*] http://teh.id.au/posts/2017/03/03/map-merge/index.html _______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

Tony Morris

11:31 p.m.

I agree with this proposal, except for the deprecation period, then waiting for years. Fix it immediately, in my opinion. On Wed, Feb 14, 2018 at 5:33 AM, David Feuer wrote:

...

Many people have recognized for years that the Semigroup and Monoid instances for Data.Map, Data.IntMap, and Data.HashMap are not so great. In particular, when the same key is present in both maps, they simply use the value from the first argument, ignoring the other one. This somewhat counter-intuitive behavior can lead to bugs. See, for example, the discussion in Tim Humphries's blog post[*]. I would like do do the following:

1. Deprecate the Semigroup and Monoid instances for Data.Map.Map, Data.IntMap.IntMap, and Data.HashMap.HashMap in the next releases of containers and unordered-containers.

2. Remove the deprecated instances.

3. After another several years (four or five, perhaps?), make a major release of each package in which the instances are replaced with the following:

instance (Ord k, Semigroup v) => Semigroup (Map k v) where (<>) = Data.Map.Strict.unionWith (<>) instance (Ord k, Semigroup v) => Monoid (Map k v) where mempty = Data.Map.Strict.empty

instance Semigroup v => Semigroup (IntMap v) where (<>) = Data.IntMap.Strict.unionWith (<>) instance Semigroup v => Monoid (IntMap v) where mempty = Data.IntMap.Strict.empty

instance (Eq k, Hashable k, Semigroup v) => Semigroup (HashMap k v) where (<>) = Data.HashMap.Strict.unionWith (<>) instance (Eq k, Hashable k, Semigroup v) => Monoid(HashMap k v) where mempty = Data.HashMap.Strict.empty

Why do I want the strict versions? That choice may seem a bit surprising, since the data structures are lazy. But the lazy versions really like to leak memory, making them unsuitable for most practical purposes.

The big risk:

Someone using old code or old tutorial documentation could get subtly wrong behavior without noticing. That is why I have specified an extended period between removing the current instances and adding the desired ones.

Alternatives:

1. Remove the instances but don't add the new ones. I fear this may lead others to write their own orphan instances, which may not even all do the same thing.

2. Write separate modules with newtype-wrapped versions of the data structures implementing the desired instances. Unfortunately, this would be annoying to maintain, and also annoying to use--packages using the unwrapped and wrapped versions will use different types. Manually wrapping and unwrapping to make the different types work with each other will introduce lots of potential for mistakes and confusion.

Discussion period: three weeks.

[*] http://teh.id.au/posts/2017/03/03/map-merge/index.html _______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

2703

Age (days ago)

2707

Last active (days ago)

List overview

Download

33 comments

19 participants

participants (19)

Akio Takano
Andrew Martin
Bardur Arantsson
Carter Schonwald
Chris Wong
David Feuer
Evan Laforge
Henning Thielemann
Herbert Valerio Riedel
Joachim Breitner
John Wiegley
Kris Nuttycombe
M Farkas-Dyck
Mario Blažević
Matt Renaud
Oleg Grenrus
Oliver Charles
Petr Pudlák
Tony Morris