Re: [core libraries] Rename NonEmpty.groupWith/sortWIth to *On*

I think I follow.
Pedantic point : when you say semantics, it might be less confusing and
more precise to performance tradeoffs.
My English language and naive preference is to assume *With functions
grammar better. Or *by. I think the *on idiom is younger relatively.
Can I challenge you to reflect on what challenge you are trying to address
with this proposal and if there’s a course of action that you feel improves
everything but breaks no code ? And what would make the change an
improvement for software using these modules already
On Sat, Feb 15, 2020 at 7:36 AM Philip Hazelden
Hi,
clc gets / reads libraries anyways, no need to double email : )
Fair enough. I tried to follow the process suggested at https://wiki.haskell.org/Core_Libraries_Committee by emailing both lists. I was surprised that the mail to libraries@ got rejected. No big deal, but should that page be updated?
are we talking about the specifics of the implementations while stilling having the same observable pure results, OR something different?
For sortWith/sortOn, yeah, my proposal #3 would change performance characteristics but not the pure semantics.
The existing functions that we have are
Data.List.sortOn f = map snd . sortBy (comparing fst) . map (\x -> let y = f x in y `seq` (y, x)) GHC.Exts.sortWith f = sortBy (\x y -> compare (f x) (f y)) Data.List.NonEmpty.sortWith = sortBy . comparing
Where the first two have type signature `Ord b :: (a -> b) -> [a] -> [a]` and the last is `Ord b :: (a -> b) -> NonEmpty a -> NonEmpty a`.
Data.List.sortOn is implemented in a way that calls its projection function just once per list element, but it allocates an intermediate list. If the projection function is expensive, that'll be worth it. The style of the functions named sortWith avoids the intermediate list, but at the cost of calling the projection twice for each comparison; so that'll be better if the projection is cheap.
I bring this up because issue 12044 suggested removing `GHC.Exts.sortWith` for being semantically the same as `Data.List.sortOn`, and the performance question was brought up there. To be clear, I don't think it's a big deal.
please spell this out more concretely, i'm not able to decode what you mean unambiguiously. Just changing the names of some functions in base in isolation breaks code for no reason,
My proposals #1 and #2 are just to change the names of functions, yeah. But it's not for no reason.
There are two problems with the current names. The smaller one, applying to all of these functions, is that I believe the "On" pattern is more widely used. And so if a user tries to guess what these functions will be called, they'll most likely go for that.
(I don't have any hard data here, admittedly. We see "On" in Data.List.sortOn, but also many functions in Data.List.Extra like `nubOn`, `groupOn`, `maximumOn`. Admittedly that's not in base. But I'll also note that NonEmpty.sortWith was originally called sortOn, and the groupWith functions were originally requested with the name groupOn; see https://github.com/ekmett/semigroups/pull/52. So yeah, my sense is that "On" is what most people will expect.)
The more significant one, applying to the group functions but not to sortWith, is that the only other function in base named `groupWith` has a comparable type signature but different semantics. Data.List.NonEmpty.groupWith is "group according to a projection function" while GHC.Exts.groupWith is "sort and group according to a projection function". If someone gets these behaviours confused, they're going to sort when they don't intend to, or to not sort when they do intend to. Using more consistent naming may help avoid that.
It may be that this isn't enough reason to break code. If the committee decides that it's not, then fair enough.
Though, it now occurs to me that renaming `GHC.Exts.groupWith` to `groupAllOn` or `groupAllWith` or something (in Data.List.Extra it's named "groupSortOn"), could also help clear up the larger problem while breaking less code. So I'll offer that possibility for consideration too.
On Thu, Feb 13, 2020 at 10:17 PM Carter Schonwald < carter.schonwald@gmail.com> wrote:
Hello Philip, clc gets / reads libraries anyways, no need to double email : )
perhaps i'm missing something, are we talking about the specifics of the implementations while stilling having the same observable pure results, OR something different?
When does each have a clear performance win?
at least from my native english speaker perspective, sortWith is better grammar. Are you describing a choice of performance characteristics related to *On vs *With functions?
also, could you spell out which reference implementations of which convention enshrine which meanings you have in mind?
I dont see a good reason to change the names that are provided from one to the other. Are you merely making the case that both styles should be exported from certain places, and that a particular choice of evaluation trade offs be enshrined by the respective naming convention?
please spell this out more concretely, i'm not able to decode what you mean unambiguiously. Just changing the names of some functions in base in isolation breaks code for no reason, so if you are arguing for that, i'm gonna say nope :)
On Wed, Feb 12, 2020 at 12:58 PM Philip Hazelden < philip.hazelden@gmail.com> wrote:
Hi,
I'm referring to these functions: `sortWith` `groupWith` `groupAllWith` `groupWith1` `groupAllWith1`.
The `On` suffix is more common for such things, and I anticipate it's what most users will normally expect them to be named.
Additionally, the name `groupWith` is potentially misleading. That name is also used in GHC.Exts[1], for a function with the same type signature but different semantics. `GHC.Exts.groupWith` sorts its input list, unlike `NonEmpty.groupWith` or `Data.List.Extra.groupOn`. (In NonEmpty, we have `groupAllWith` for group-with-sort. So we have three names, two semantics, and no consistency.)
According to https://github.com/ekmett/semigroups/pull/52, `With` was chosen because:
The preferred vocabulary for On here is With to match the combinators in GHC.Exts from the "comprehensive comprehensions" paper. That'd make it so that if you use these with comprehensive comprehensions turned on and RebindableSyntax turned on then you'd get these combinators.
But I don't see anything in the docs which suggests that the TransformListComp extension[2] uses these functions implicitly, or interacts with RebindableSyntax[3]. So perhaps I'm missing something, but my guess is this is no longer a concern.
The case for `sortWith` is weaker, since it might trip people up but it won't introduce unexpected semantics. There's also a counter argument in https://gitlab.haskell.org/ghc/ghc/issues/12044. `Data.List.sortOn` only computes the mapping function once for each element, while `GHC.Exts.sortWith` computes it every time it does a comparison but uses fewer allocations. `Data.NonEmpty.sortWith` acts like the latter. My suggestion would be to replace it with a `sortOn` implemented like `Data.List.sortOn`, but I also don't think it would be terrible to have both, or even to just rename and leave this small inconsistency. If there's no agreement on this function, I think it would be worth renaming the group functions anyway.
And so I propose, in descending order of how much I care:
1. Rename the group functions to be named "On" instead of "With".
2. Rename `sortWith` to `sortOn` In the case of `sortWith`, also reimplement it as
3. Reimplement `sortOn` as
sortOn :: Ord b => (a -> b) -> NonEmpty a -> NonEmpty a sortOn = fmap snd . sortBy (comparing fst) . fmap (\x -> let y = f x in y `seq` (y, x))
I assume the process here would be to leave the existing names for a time with deprecation warnings, and then remove them a few versions down the line.
[1] https://hackage.haskell.org/package/base-4.12.0.0/docs/GHC-Exts.html#v:group... [2] https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts... [3] https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts...
Best, Phil
-- You received this message because you are subscribed to the Google Groups "haskell-core-libraries" group. To unsubscribe from this group and stop receiving emails from it, send an email to haskell-core-libraries+unsubscribe@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/haskell-core-libraries/CALB5dS-5qKjK6rmvFg... https://groups.google.com/d/msgid/haskell-core-libraries/CALB5dS-5qKjK6rmvFg2TuZ_3HvPtPyuGzsTrVO8adAnX1zovdg%40mail.gmail.com?utm_medium=email&utm_source=footer .

Hi,
On Sat, Feb 15, 2020 at 1:00 PM Carter Schonwald
Pedantic point : when you say semantics, it might be less confusing and more precise to performance tradeoffs.
Apologies, but I'm not sure what you mean here. Could you rephrase?
My English language and naive preference is to assume *With functions grammar better. Or *by. I think the *on idiom is younger relatively.
I don't personally have an a particular preference between *With and *On, just a preference for consistency. My proposal isn't motivated by the english meanings of the words "with" and "on", just by (my perception of) their current usage in the ecosystem. If *With was used in most places, and *On was used in Data.List.NonEmpty, then I'd recommend switching NonEmpty to *With. (*By is already taken: `sortBy`, `nubBy`, `groupBy` and so on. That does seem to be already consistent.) An advantage of *On is that it reflects how the functions can be defined using the `on` function: sortOn f = sortBy (compare `on` f) groupOn f = groupBy ((==) `on` f) Historically, it does look like *With has been in use longer. (It looks like `sortOn` has been in base since 4.8 (2015), and in the "extra" package since the very first version 0.1 in 2014. `sortWith` has been in GHC.Exts since at least base 4.0 in 2009.) But again, neither of those points factors into my motivation here.
Can I challenge you to reflect on what challenge you are trying to address with this proposal and if there’s a course of action that you feel improves everything but breaks no code ?
I suppose that simply adding the new names, marking the old ones as deprecated, and then never actually removing the old ones, would work for that.
And what would make the change an improvement for software using these modules already
Well, a small one is that it would make that software slightly easier to read for people who recognize the *On convention but not the *With one. There's also a chance it would make a small number of people notice that they were using a function that had different semantics from what they thought, catching bugs that hadn't previously surfaced. But I don't think there's any version of this proposal that would accomplish this in any significant way. Improving existing software isn't really the point.

Philip,
Here is the documentation for Generalized List Comprehensions that mentions
sortWith:
https://downloads.haskell.org/ghc/latest/docs/html/users_guide/glasgow_exts....
The asymmetry between Data.List and Data.List.NonEmpty is unfortunate. I
think my preference would be a non-breaking change where groupOn is added
to Data.List.NonEmpty effectively as an alias for groupWith and they would
both stay forever. I've not looked at the semantic difference between
sortWith and sortOn, so I'm not sure which of those should be implemented
with the other, but ideally, I would like to see both of them there.
On Sat, Feb 15, 2020 at 1:59 PM Philip Hazelden
Hi,
On Sat, Feb 15, 2020 at 1:00 PM Carter Schonwald < carter.schonwald@gmail.com> wrote:
Pedantic point : when you say semantics, it might be less confusing and more precise to performance tradeoffs.
Apologies, but I'm not sure what you mean here. Could you rephrase?
My English language and naive preference is to assume *With functions grammar better. Or *by. I think the *on idiom is younger relatively.
I don't personally have an a particular preference between *With and *On, just a preference for consistency. My proposal isn't motivated by the english meanings of the words "with" and "on", just by (my perception of) their current usage in the ecosystem. If *With was used in most places, and *On was used in Data.List.NonEmpty, then I'd recommend switching NonEmpty to *With.
(*By is already taken: `sortBy`, `nubBy`, `groupBy` and so on. That does seem to be already consistent.)
An advantage of *On is that it reflects how the functions can be defined using the `on` function:
sortOn f = sortBy (compare `on` f) groupOn f = groupBy ((==) `on` f)
Historically, it does look like *With has been in use longer. (It looks like `sortOn` has been in base since 4.8 (2015), and in the "extra" package since the very first version 0.1 in 2014. `sortWith` has been in GHC.Exts since at least base 4.0 in 2009.)
But again, neither of those points factors into my motivation here.
Can I challenge you to reflect on what challenge you are trying to address with this proposal and if there’s a course of action that you feel improves everything but breaks no code ?
I suppose that simply adding the new names, marking the old ones as deprecated, and then never actually removing the old ones, would work for that.
And what would make the change an improvement for software using these modules already
Well, a small one is that it would make that software slightly easier to read for people who recognize the *On convention but not the *With one. There's also a chance it would make a small number of people notice that they were using a function that had different semantics from what they thought, catching bugs that hadn't previously surfaced.
But I don't think there's any version of this proposal that would accomplish this in any significant way. Improving existing software isn't really the point. _______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
-- -Andrew Thaddeus Martin

It seems like the *on family in general has a nontrivial increase in space
usage and the only differences have to do with application specific time vs
memory tradeoffs that need to measured at the application level.
I’m totally for improved consistency. But I’m not sure what consistency
there is to aim for here.
On Mon, Feb 17, 2020 at 8:00 AM Andrew Martin
Philip,
Here is the documentation for Generalized List Comprehensions that mentions sortWith: https://downloads.haskell.org/ghc/latest/docs/html/users_guide/glasgow_exts....
The asymmetry between Data.List and Data.List.NonEmpty is unfortunate. I think my preference would be a non-breaking change where groupOn is added to Data.List.NonEmpty effectively as an alias for groupWith and they would both stay forever. I've not looked at the semantic difference between sortWith and sortOn, so I'm not sure which of those should be implemented with the other, but ideally, I would like to see both of them there.
On Sat, Feb 15, 2020 at 1:59 PM Philip Hazelden
wrote: Hi,
On Sat, Feb 15, 2020 at 1:00 PM Carter Schonwald < carter.schonwald@gmail.com> wrote:
Pedantic point : when you say semantics, it might be less confusing and more precise to performance tradeoffs.
Apologies, but I'm not sure what you mean here. Could you rephrase?
My English language and naive preference is to assume *With functions grammar better. Or *by. I think the *on idiom is younger relatively.
I don't personally have an a particular preference between *With and *On, just a preference for consistency. My proposal isn't motivated by the english meanings of the words "with" and "on", just by (my perception of) their current usage in the ecosystem. If *With was used in most places, and *On was used in Data.List.NonEmpty, then I'd recommend switching NonEmpty to *With.
(*By is already taken: `sortBy`, `nubBy`, `groupBy` and so on. That does seem to be already consistent.)
An advantage of *On is that it reflects how the functions can be defined using the `on` function:
sortOn f = sortBy (compare `on` f) groupOn f = groupBy ((==) `on` f)
Historically, it does look like *With has been in use longer. (It looks like `sortOn` has been in base since 4.8 (2015), and in the "extra" package since the very first version 0.1 in 2014. `sortWith` has been in GHC.Exts since at least base 4.0 in 2009.)
But again, neither of those points factors into my motivation here.
Can I challenge you to reflect on what challenge you are trying to address with this proposal and if there’s a course of action that you feel improves everything but breaks no code ?
I suppose that simply adding the new names, marking the old ones as deprecated, and then never actually removing the old ones, would work for that.
And what would make the change an improvement for software using these modules already
Well, a small one is that it would make that software slightly easier to read for people who recognize the *On convention but not the *With one. There's also a chance it would make a small number of people notice that they were using a function that had different semantics from what they thought, catching bugs that hadn't previously surfaced.
But I don't think there's any version of this proposal that would accomplish this in any significant way. Improving existing software isn't really the point.
_______________________________________________
Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
-- -Andrew Thaddeus Martin

On Mon, Feb 17, 2020 at 3:59 PM Carter Schonwald
It seems like the *on family in general has a nontrivial increase in space usage and the only differences have to do with application specific time vs memory tradeoffs that need to measured at the application level.
To be clear, this particular tradeoff isn't relevant for every *On function. Of the ones under discussion, it applies to sortWith, groupAllWith and groupAllWith1, but not groupWith or groupWith1. Some things I think are worth noting: * There must be lots of common functions that could choose different implementation tradeoffs, but we don't keep multiple versions of them around. * The existing performance differences aren't documented. (Data.List.sortOn tells us that it only evaluates `f` once per element, but it doesn't tell us about the costs; the others don't mention performance at all.) I predict very few people currently choose between GHC.Exts.sortWith and Data.List.sortOn according to their performance needs. * The Data.List.sortOn implementation is somewhat subtle and complicated. The NonEmpty.sortWith implementation is literally `sortBy . comparing` and GHC.Exts.sortWith isn't much more complicated. (`sortWith f = sortBy (\x y -> compare (f x) (f y))`; I don't know if there are reasons to prefer or disprefer this compared to `sortBy . comparing`.) If someone wants this behavior, it's not hard for them to get. Based on these, I don't think there's much value in deliberately[1] making sure both implementations have a name, which is why I proposed to replace `sortWith` with `sortOn` instead of keeping them both around. [1] Of course, there's value in not changing things once they're there. By "deliberately", I mean that if we weren't already in this situation, and someone proposed adding a separate function that was semantically equivalent to `sortOn` but implemented as `sortBy . comparing`, I don't think that would see much support.

Adding to Philip's "things worth noting": I think anyone sophisticated
enough (and lucky enough) to have their most important performance
bottleneck localized to a call to sortOn instead of sortWith (or vice
versa) are also sophisticated enough to implement their own version when
necessary.
Consistent names, on the other hand, are useful to everyone, all the time.
I don't think what I just said is a surprising insight for anyone, but I
thought it would be useful to state it explicitly. :)
-Bryan
P.s. +1 to documenting performance differences, for those rare moments they
matter.
On Tue, 18 Feb 2020, 1.27 Philip Hazelden,
On Mon, Feb 17, 2020 at 3:59 PM Carter Schonwald < carter.schonwald@gmail.com> wrote:
It seems like the *on family in general has a nontrivial increase in space usage and the only differences have to do with application specific time vs memory tradeoffs that need to measured at the application level.
To be clear, this particular tradeoff isn't relevant for every *On function. Of the ones under discussion, it applies to sortWith, groupAllWith and groupAllWith1, but not groupWith or groupWith1.
Some things I think are worth noting:
* There must be lots of common functions that could choose different implementation tradeoffs, but we don't keep multiple versions of them around. * The existing performance differences aren't documented. (Data.List.sortOn tells us that it only evaluates `f` once per element, but it doesn't tell us about the costs; the others don't mention performance at all.) I predict very few people currently choose between GHC.Exts.sortWith and Data.List.sortOn according to their performance needs. * The Data.List.sortOn implementation is somewhat subtle and complicated. The NonEmpty.sortWith implementation is literally `sortBy . comparing` and GHC.Exts.sortWith isn't much more complicated. (`sortWith f = sortBy (\x y -> compare (f x) (f y))`; I don't know if there are reasons to prefer or disprefer this compared to `sortBy . comparing`.) If someone wants this behavior, it's not hard for them to get.
Based on these, I don't think there's much value in deliberately[1] making sure both implementations have a name, which is why I proposed to replace `sortWith` with `sortOn` instead of keeping them both around.
[1] Of course, there's value in not changing things once they're there. By "deliberately", I mean that if we weren't already in this situation, and someone proposed adding a separate function that was semantically equivalent to `sortOn` but implemented as `sortBy . comparing`, I don't think that would see much support. _______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

I second Andrew's suggestion - add a function alias(es) and then keep the old ones for a long period of time. They're not related to typeclasses or their methods so I don't see the need to keep them forever; anyone with an issue during updating could just resort to `sed`. The compatibility window could be large enough to not disrupt library authors so much, though. Since this is base that window should be fairly large. On Mon, Feb 17, 2020 at 11:45 PM Bryan Richter wrote:
Adding to Philip's "things worth noting": I think anyone sophisticated enough (and lucky enough) to have their most important performance bottleneck localized to a call to sortOn instead of sortWith (or vice versa) are also sophisticated enough to implement their own version when necessary.
Consistent names, on the other hand, are useful to everyone, all the time.
I don't think what I just said is a surprising insight for anyone, but I thought it would be useful to state it explicitly. :)
-Bryan
P.s. +1 to documenting performance differences, for those rare moments they matter.
On Tue, 18 Feb 2020, 1.27 Philip Hazelden,
wrote: On Mon, Feb 17, 2020 at 3:59 PM Carter Schonwald
wrote: It seems like the *on family in general has a nontrivial increase in space usage and the only differences have to do with application specific time vs memory tradeoffs that need to measured at the application level.
To be clear, this particular tradeoff isn't relevant for every *On function. Of the ones under discussion, it applies to sortWith, groupAllWith and groupAllWith1, but not groupWith or groupWith1.
Some things I think are worth noting:
* There must be lots of common functions that could choose different implementation tradeoffs, but we don't keep multiple versions of them around. * The existing performance differences aren't documented. (Data.List.sortOn tells us that it only evaluates `f` once per element, but it doesn't tell us about the costs; the others don't mention performance at all.) I predict very few people currently choose between GHC.Exts.sortWith and Data.List.sortOn according to their performance needs. * The Data.List.sortOn implementation is somewhat subtle and complicated. The NonEmpty.sortWith implementation is literally `sortBy . comparing` and GHC.Exts.sortWith isn't much more complicated. (`sortWith f = sortBy (\x y -> compare (f x) (f y))`; I don't know if there are reasons to prefer or disprefer this compared to `sortBy . comparing`.) If someone wants this behavior, it's not hard for them to get.
Based on these, I don't think there's much value in deliberately[1] making sure both implementations have a name, which is why I proposed to replace `sortWith` with `sortOn` instead of keeping them both around.
[1] Of course, there's value in not changing things once they're there. By "deliberately", I mean that if we weren't already in this situation, and someone proposed adding a separate function that was semantically equivalent to `sortOn` but implemented as `sortBy . comparing`, I don't think that would see much support. _______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

(Sending again with reply all...)
On Mon, Feb 17, 2020 at 1:00 PM Andrew Martin
Here is the documentation for Generalized List Comprehensions that mentions sortWith: https://downloads.haskell.org/ghc/latest/docs/html/users_guide/glasgow_exts....
The asymmetry between Data.List and Data.List.NonEmpty is unfortunate. I
Yes, it mentions sortWith and groupWith. But these functions only seem to be used explicitly, by name. The rationale given for the *With names in NonEmpty was that using that extension and RebindableSyntax, the names could be used implicitly. In that case it would be more awkward to use functions named sortOn or groupOn. But as far as I can tell, that doesn't happen. (I assume it either did at the time, or was expected to in the near future.) think my preference would be a non-breaking change where groupOn is added to Data.List.NonEmpty effectively as an alias for groupWith and they would both stay forever. I've not looked at the semantic difference between sortWith and sortOn, so I'm not sure which of those should be implemented with the other, but ideally, I would like to see both of them there. I do think adding the new names and not removing the others would be an improvement on the status quo, though it's not my preference.
participants (5)
-
Andrew Martin
-
Bryan Richter
-
Carter Schonwald
-
chessai .
-
Philip Hazelden