Proposal #2659: Add sortOn and friends to Data.List

Hello list, Almost all uses of sortBy in user code use 'comparing', 'on' or a similar construction [1]. I think we should add a function that makes this common behavior more convenient: sortOn :: Ord b => (a -> b) -> [a] -> [a] For consistency we should also add *On for the other *By functions in Data.List: nubOn :: Eq b => (a -> b) -> [a] -> [a] deleteOn :: Eq b => (a -> b) -> a -> [a] -> [a] deleteFirstsOn :: Eq b => (a -> b) -> [a] -> [a] -> [a] unionOn :: Eq b => (a -> b) -> [a] -> [a] -> [a] intersectOn :: Eq b => (a -> b) -> [a] -> [a] -> [a] groupOn :: Eq b => (a -> b) -> [a] -> [[a]] sortOn :: Ord b => (a -> b) -> [a] -> [a] insertOn :: Ord b => (a -> b) -> a -> [a] -> [a] maximumOn :: Ord b => (a -> b) -> [a] -> a minimumOn :: Ord b => (a -> b) -> [a] -> a (nubSortOn :: Ord b => (a -> b) -> [a] -> [a]) -- see #2629 -- Naming -- "On": The reason for the "on" suffix is that it relates to the "on" function from Data.Function: "sortOn f = sort (compare `on` f)". In code, "sortOn fst" is reasonably natural, "sortBy fst" would be better but that is already taken. "With": Another possible choice is the "with" suffix. There is some precedence for this choice [2]. A big disadvantage is that the "with" suffix is commonly used for combining functions, as in Data.Map.unionWith. Making a Data.List.unionWith that does something completely different sounds like a bad idea. "ByComparing": Defining sortByComparing f = sortBy (comparing f) makes sense, naming wise. A disadvantage of this name is that it is too long. Also, we would get a distinction between "groupByEquating" and "sortByComparing". -- Variations -- When f is a slow function the call "sortOn f xs" recomputes f more than necessary. A version that caches the result of f is easy to define: sortOn' f = map fst . sortOn snd . map (\x -> (x,f x)) This is perhaps not the best name, since ' usually means strictness. -- Descending sort-- To be able to sort lists in reverse order, a simple newtype should be added to Data.Ord: newtype Down a = Down { getDown :: a } instance Ord a => Ord (Down a) where Down x < Down y = y < x Now "sortOn Down xs == reverse (sort xs)". The name Down comes from [2], maybe Desc is a better name? -- Proposal-- Ticket: #2659 (for sortOn and friends) Ticket: #2660 (for Down newtype) Deadline: now+2 weeks = 2008-10-20 Questions: 1. should sortOn be added to Data.List? 2. should all other *On functions be added as well? 3. what name should these functions get? 4. should the sortOn' variations be added? What about the naming? 5. should Down be added to Data.Ord? Note: The addition of sortOn was previously proposed as #2406 and rejected because it did not follow the library submission guidelines. Twan [1] http://www.google.com/codesearch?q=lang%3Ahaskell+sortBy [2] http://research.microsoft.com/~simonpj/papers/list-comp/index.htm

On Sun, 5 Oct 2008, Twan van Laarhoven wrote:
Almost all uses of sortBy in user code use 'comparing', 'on' or a similar construction [1]. I think we should add a function that makes this common behavior more convenient:
sortOn :: Ord b => (a -> b) -> [a] -> [a]
For consistency we should also add *On for the other *By functions in Data.List:
nubOn :: Eq b => (a -> b) -> [a] -> [a] deleteOn :: Eq b => (a -> b) -> a -> [a] -> [a] deleteFirstsOn :: Eq b => (a -> b) -> [a] -> [a] -> [a] unionOn :: Eq b => (a -> b) -> [a] -> [a] -> [a] intersectOn :: Eq b => (a -> b) -> [a] -> [a] -> [a] groupOn :: Eq b => (a -> b) -> [a] -> [[a]] sortOn :: Ord b => (a -> b) -> [a] -> [a] insertOn :: Ord b => (a -> b) -> a -> [a] -> [a] maximumOn :: Ord b => (a -> b) -> [a] -> a minimumOn :: Ord b => (a -> b) -> [a] -> a (nubSortOn :: Ord b => (a -> b) -> [a] -> [a]) -- see #2629
I also prefer these functions and have called them '*Key', because they work on a key: http://darcs.haskell.org/htam/src/Useful.hs
Questions: 1. should sortOn be added to Data.List?
yes
2. should all other *On functions be added as well?
yes
3. what name should these functions get?
'On' is ok for me
4. should the sortOn' variations be added? What about the naming?
Both variants differ only in efficiency, where none is superior over the other. I don't think that the prime is a good way to indicate the difference. Maybe you can use 'sortKey' for the variant for selector functions and 'sortOn' for the caching variant.
5. should Down be added to Data.Ord?
I find it useful.

On Sun, 2008-10-05 at 17:39 +0200, Twan van Laarhoven wrote:
Hello list,
Almost all uses of sortBy in user code use 'comparing', 'on' or a similar construction [1]. I think we should add a function that makes this common behavior more convenient:
sortOn :: Ord b => (a -> b) -> [a] -> [a]
I think fewer names and more combinations is usually best unless there is a really compelling reason. Just because things are often used in combination doesn't mean we have to make a new name to represent that composition. As functional programmers we are very used to using composition, especially simple function application. What is wrong with sortBy (comparing fieldFoo) ? It's not very long, it's pretty descriptive and easily generalises using the 'on' function. The only thing missing here is equating. I still have to define that myself to write: groupBy (equating fieldBar) I know I can write groupBy ((==) `on` fieldBar) but that just does not read so nicely. So perhaps I should file a counter-proposal suggesting that all we need to do is to add equating to Data.Eq. :-) Duncan

Am Sonntag, 5. Oktober 2008 20:34 schrieb Duncan Coutts:
I think fewer names and more combinations is usually best unless there is a really compelling reason. Just because things are often used in combination doesn't mean we have to make a new name to represent that composition. As functional programmers we are very used to using composition, especially simple function application.
I fully agree! The identifiers of the proposed new functions are rather systematic. So why not use the means of the language (combining higher-order functions) to express the underlying structure instead of relying on naming conventions and boilerplate code? Best wishes, Wolfgang

On Tue, 7 Oct 2008, Wolfgang Jeltsch wrote:
Am Sonntag, 5. Oktober 2008 20:34 schrieb Duncan Coutts:
I think fewer names and more combinations is usually best unless there is a really compelling reason. Just because things are often used in combination doesn't mean we have to make a new name to represent that composition. As functional programmers we are very used to using composition, especially simple function application.
I fully agree!
The identifiers of the proposed new functions are rather systematic. So why not use the means of the language (combining higher-order functions) to express the underlying structure instead of relying on naming conventions and boilerplate code?
The problem is, that combinations of functions get larger and larger this way. I consider it good style to write short function definitions, which in turn use other functions, that make sense of their own. If every function definition is short, your argument would mean, that it is better not to define such functions at all, why not inlining all those definitions in the main program? Why having 'map' and 'filter', they are just 'foldr' with simple function arguments. I have once proposed to add 'toMaybe' to Data.Maybe [1]. It was rejected, because it can easily be implemented with 'guard'. However, that's not the point. It's often the case that 'toMaybe' is simpler to apply and it denotes a frequent pattern. So I continue to define it in the Utility modules of many of my packages. The same discussion arose about 'concatMap' and 'intercalate'. I see two extremes: 1. Minimal number of functions in standard modules and many re-implementations of common patterns in Utility modules of various packages or worse: large function definitions in application code. 2. Maximum number of functions in standard moudles for every combination or at least for every already used combination of functions, where application code only consists of calling one function from a standard module. I think none of the extremes is what we want. In my opinion if people become aware that a certain combination of functions is widely spread, or even more a certain combination is the main application of one of the invoked functions, this is a good sign to make this use pattern explicit by a new function. I share the impression with the original poster, that 'sortBy' and 'groupBy' are most oftenly used with 'on'. I remember that 'on' was precisely introduced to support 'sortBy' and 'groupBy'. However 'sortBy' with 'on' must recompute the sorting key. Such recomputation can only be avoided with a new function - or by optimizer rules, which replace 'sortBy (compare `on` f)' by 'sortOn f' and so on. [1] http://www.haskell.org/pipermail/haskell-cafe/2007-November/034259.html

Hi Twan,
Almost all uses of sortBy in user code use 'comparing', 'on' or a similar construction [1]. I think we should add a function that makes this common behavior more convenient:
sortOn :: Ord b => (a -> b) -> [a] -> [a]
For consistency we should also add *On for the other *By functions in Data.List:
nubOn :: Eq b => (a -> b) -> [a] -> [a] deleteOn :: Eq b => (a -> b) -> a -> [a] -> [a] deleteFirstsOn :: Eq b => (a -> b) -> [a] -> [a] -> [a] unionOn :: Eq b => (a -> b) -> [a] -> [a] -> [a] intersectOn :: Eq b => (a -> b) -> [a] -> [a] -> [a] groupOn :: Eq b => (a -> b) -> [a] -> [[a]] sortOn :: Ord b => (a -> b) -> [a] -> [a] insertOn :: Ord b => (a -> b) -> a -> [a] -> [a] maximumOn :: Ord b => (a -> b) -> [a] -> a minimumOn :: Ord b => (a -> b) -> [a] -> a (nubSortOn :: Ord b => (a -> b) -> [a] -> [a]) -- see #2629
All good, apart from deleteFirstsOn. I honestly don't know what that function does, I was thinking possibly something like the first component of tuples? But no real idea.
When f is a slow function the call "sortOn f xs" recomputes f more than necessary. A version that caches the result of f is easy to define:
sortOn' f = map fst . sortOn snd . map (\x -> (x,f x))
In Hoogle I use sortOn to be the non-cacheing version, and sortWith to be the cacheing version. However, I think we can just ignore the non-cacheing version - if your extraction function is expensive you probably want to think a bit anyway.
To be able to sort lists in reverse order, a simple newtype should be added to Data.Ord:
newtype Down a = Down { getDown :: a }
instance Ord a => Ord (Down a) where Down x < Down y = y < x
Now "sortOn Down xs == reverse (sort xs)". The name Down comes from [2], maybe Desc is a better name?
I prefer Down, and strongly support this proposal as well.
1. should sortOn be added to Data.List?
Yes.
2. should all other *On functions be added as well?
Yes (apart from deleteFirstsOn)
3. what name should these functions get?
*On, as you have given them.
4. should the sortOn' variations be added? What about the naming?
No.
5. should Down be added to Data.Ord?
Yes. Thanks Neil ============================================================================== Please access the attached hyperlink for an important electronic communications disclaimer: http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html ==============================================================================

Mitchell, Neil wrote:
Hi Twan,
Almost all uses of sortBy in user code use 'comparing', 'on' or a similar construction [1]. I think we should add a function that makes this common behavior more convenient:
sortOn :: Ord b => (a -> b) -> [a] -> [a]
For consistency we should also add *On for the other *By functions in Data.List:
nubOn :: Eq b => (a -> b) -> [a] -> [a] deleteOn :: Eq b => (a -> b) -> a -> [a] -> [a] deleteFirstsOn :: Eq b => (a -> b) -> [a] -> [a] -> [a] unionOn :: Eq b => (a -> b) -> [a] -> [a] -> [a] intersectOn :: Eq b => (a -> b) -> [a] -> [a] -> [a] groupOn :: Eq b => (a -> b) -> [a] -> [[a]] sortOn :: Ord b => (a -> b) -> [a] -> [a] insertOn :: Ord b => (a -> b) -> a -> [a] -> [a] maximumOn :: Ord b => (a -> b) -> [a] -> a minimumOn :: Ord b => (a -> b) -> [a] -> a (nubSortOn :: Ord b => (a -> b) -> [a] -> [a]) -- see #2629
All good, apart from deleteFirstsOn. I honestly don't know what that function does, I was thinking possibly something like the first component of tuples? But no real idea.
deleteFirstBy is "\\By", so it should perhaps be called differenceBy. But this is just the name that the List module uses. The reason for adding deleteFirstOn is simply consistency, since we also have a By version.
In Hoogle I use sortOn to be the non-cacheing version, and sortWith to be the cacheing version. However, I think we can just ignore the non-cacheing version - if your extraction function is expensive you probably want to think a bit anyway.
If the extraction function is slow, like say "sortOn length", caching is a clear win. On the other hand, there is some constant cost involved. Here are some benchmarks of sorting words in a dictionary: no cache cache sortOn f 3.203125 0.625000 sortOn id 0.156250 0.187500 where f = length . nub . sort, i.e. an expensive function. I'm not sure if the factor 1.2 overhead of the caching version is important enough to warrant exposing two different implementations. Twan

How about capturing the pattern in higher order functions? if we define:
import Data.Function import Data.List
data Cache b a = Cache b a fromCache (Cache _ a) = a toCache f a = Cache (f a) a cache (Cache b _) = b
caching f g = map fromCache . g . map (toCache f)
then sortOn becomes:
sortOn2 f = caching f (sortBy (compare `on` cache))
and with the instances
instance Ord b => Ord (Cache b a) where compare x y = compare (cache x) (cache y)
instance Eq b => Eq (Cache b a) where x == y = cache x == cache y
it gets even simpler:
sortOn3 f = caching f sort
It could be that I missed the point, but I'd much rather not see so many functions added to base without a strong case for it. Cheers, JP.

On Mon, Oct 6, 2008 at 5:32 PM, Jean-Philippe Bernardy wrote: How about capturing the pattern in higher order functions? I absolutely agree with the spirit of this. We're in a language with higher
order functions and polymorphism -- the best there is for that! We should
take advantage of this and do away with specialized implementations
altogether, including By. The caching overhead is relatively minimal for
simple cases, and for complex cases its a big win.
The problem with the "caching" solution as presented, however, is that it
works for functions with the signature of sort. But most functions, such as
maximum, have no such signature. But wait... polymorphism comes to the
rescue!
First the preliminaries: {-# LANGUAGE MultiParamTypeClasses, FlexibleInstances,
FunctionalDependencies #-}
module Schwartz where
import Data.List
import Control.Applicative
import Data.Function Then a type for our Schwartzian transform: data Schwartz a b = Schwartz {sw_out :: !a, sw_trans :: !b}
instance Eq b => Eq (Schwartz a b) where (==) = (==) `on` sw_trans
instance Ord b => Ord (Schwartz a b) where compare = compare `on` sw_trans Injection: sw_in f = Schwartz <*> f And a higher order function for the [a] -> [a] case: schwartzAWith :: (a -> b) -> ([Schwartz a b] -> [Schwartz a b]) -> [a] ->
[a]
schwartzAWith f g = map sw_out . g . map (sw_in f) And then for the [a] -> a case: schwartzBWith :: (a -> b) -> ([Schwartz a b] -> Schwartz a b) -> [a] -> a
schwartzBWith f g = sw_out . g . map (sw_in f) A bit clunky, but a typeclass that embodies the pattern: class ToSchwartz a b c d | c -> d where
using :: (a -> b) -> ([Schwartz a b] -> c) -> [a] -> d Instances: instance ToSchwartz a b [Schwartz a b] [a] where
using u = schwartzAWith u instance ToSchwartz a b (Schwartz a b) a where
using u = schwartzBWith u And now....
GHCi> using negate sort [1..5]
[5,4,3,2,1]
GHCi> using negate maximum [1..5]
1
A solution that would please the most discerning Haskell programmer (or Mel
Brooks).
--Sterl.

On Mon, 6 Oct 2008, Jean-Philippe Bernardy wrote:
How about capturing the pattern in higher order functions?
if we define:
import Data.Function import Data.List
data Cache b a = Cache b a fromCache (Cache _ a) = a toCache f a = Cache (f a) a cache (Cache b _) = b
caching f g = map fromCache . g . map (toCache f)
I've already done this with plain pairs - a distinct type like Cache is of course better. See attachKey: http://darcs.haskell.org/htam/src/Useful.hs

Twan van Laarhoven
Almost all uses of sortBy in user code use 'comparing', 'on' or a similar construction [1].
I certainly agree that there is room for improvement here. But I hate to see the proliferation of functions, almost all alike. I guess I would support the proposal if that's the way folks would like to go, though. I would prefer: 1) Get rid of any need for Data.Function by exporting 'fix' and 'on' into the Prelude. The rest of it is already in the Prelude anyway, and the implementations of 'fix' and 'on' are trivial. (I'd say Data.Function is kind of a failed experiment at this point, but what do I know?) 2) Move the sort functions into Data.Ord. Obviously, they would have to be re-exported by Data.List. It seems to me that 'sort' should have been there in the first place, and that 'sortBy" should be in the same place as 'sort'. 3) Incorporate the 'Down' (Is 'Descending' too long?) type in Data.Ord as suggested (but see below). Now Data.Ord would be the one-stop shopping place for all our sorting needs. If all of this were done, I really wouldn't mind importing Data.Ord to get the sorting functions and continuing to write 'sortBy (comparing f)' or 'sortBy (p `on` f)' when I wanted to do such a thing. Even 'sortBy (comparing Down `on` f)' isn't so bad. Minor nits in the proposal: * You need to implement either (<=) or `comparing` for Down (preferably `comparing`), not just (<). Otherwise sort nonterminates with a stack overflow. This one took me a while to figure out. As far as I can tell you'll need to make Down an instance of Eq as well as Ord to get the latter instance to typecheck. These problems suggest that you haven't yet tried your proposal at all. :-) :-) * getDown might better be called fromDown? Bart Massey bart <at> cs.pdx.edu

Twan van Laarhoven wrote:
-- Proposal--
Ticket: #2659 (for sortOn and friends) Ticket: #2660 (for Down newtype) Deadline: now+2 weeks = 2008-10-20
Questions: 1. should sortOn be added to Data.List? 2. should all other *On functions be added as well? 3. what name should these functions get? 4. should the sortOn' variations be added? What about the naming? 5. should Down be added to Data.Ord?
-1 concerning sortOn and friends I think that sortBy (comparing f) is a very good and modular name for the functionality intended, no need to create another one. The only advantage of a special sortOn function would be that you can document it, i.e. there's no haddock blurb for sortBy (comparing f) . Hm, though sortOn could be useful if it caches the values of f by default, like the sortOn' proposed. But Jean-Philippe's code is a beautiful solution for this situation. +1 concerning Down, but I don't like the name. In particular I don't like the "get" prefix in "getDown". I'd simply use newtype Desc a = Desc a without label or fromDesc if really necessary. Regards, apfelmus

On Tue, 2008-10-07 at 11:44 +0200, apfelmus wrote:
I think that
sortBy (comparing f)
is a very good and modular name for the functionality intended, no need to create another one. The only advantage of a special sortOn function would be that you can document it, i.e. there's no haddock blurb for sortBy (comparing f) .
Lets fix the sortBy documentation to mention the comparing idiom.
Hm, though sortOn could be useful if it caches the values of f by default, like the sortOn' proposed. But Jean-Philippe's code is a beautiful solution for this situation.
+1 concerning Down,
but I don't like the name. In particular I don't like the "get" prefix in "getDown". I'd simply use
Can't we do it with just some compare flip function: sortBy (thing compare) or sortBy (thing $ comparing fieldFoo) Would that work and what would be a good name? Duncan

Duncan Coutts wrote:
+1 concerning Down,
but I don't like the name. In particular I don't like the "get" prefix in "getDown". I'd simply use
Can't we do it with just some compare flip function:
sortBy (thing compare)
or
sortBy (thing $ comparing fieldFoo)
Would that work and what would be a good name?
thing = flip works :) But Down could be useful on its own, for instance for converting a heap that stores elements in maximum order into one that stores them in minimum order, i.e. when there is an Ord class constraint instead of a simple higher-order function. Some exploration of the design space here http://thread.gmane.org/gmane.comp.lang.haskell.cafe/35988 Hm, I think that the Cache b a thing is more general, though, it may be worth extending this to a full proposal. Regards, apfelmus

apfelmus wrote:
but I don't like the name. In particular I don't like the "get" prefix in "getDown". I'd simply use
newtype Desc a = Desc a
what's up with "Down", "Descending" etc.? "Ord" is a symmetric, mathematical concept, no more "ascending" than "descending"! It's just a convention (created/used by functions like "sort") to sort ascending, and it annoys me a little that they're not called "sortAsc" (sortAscending) to say what they do. I'd suggest something like "ReverseOrd" or "ReverseOrder", (or "Opposite..."?) (or some abbreviated version?) -Isaac
participants (11)
-
apfelmus
-
Bart Massey
-
Duncan Coutts
-
Henning Thielemann
-
Isaac Dupree
-
Jean-Philippe Bernardy
-
Malcolm Wallace
-
Mitchell, Neil
-
Sterling Clover
-
Twan van Laarhoven
-
Wolfgang Jeltsch