Proposal: add ordNub somewhere in containers

Gershom B

16 Oct 2017 16 Oct '17

10:17 p.m.

There have been many discussions over the years about adding an efficient order preserving nub somewhere to our core libraries. It always comes down to the same issue: an efficient nub wants to be backed by an efficient `Set`, but the API of the `nub` itself doesn't make reference to any other data structures besides lists. So it feels a bit conceptually strange to put an efficient nub anywhere besides `Data.List` even though it can't go there without inverting our dependency tree in a weird way or inlining an efficient set implementation into the middle of it. Nonetheless, the convenience of having a good `nub` lying around in a core library is undeniable, and after writing the "usual" one in my code for the zillionth time, I decided to raise an issue about it: https://github.com/haskell/containers/issues/439 I was promptly directed here to make a proper proposal. So, here: 1) I propose two new functions, `ordNub` and `intNub` with the standard implementation (from https://github.com/nh2/haskell-ordnub): import qualified Data.Set as Set ordNub :: (Ord a) => [a] -> [a] ordNub l = go Set.empty l where go _ [] = [] go s (x:xs) = if x `Set.member` s then go s xs else x : go (Set.insert x s) xs and the same implementation, but specialized to `Int` and using `IntSet`s. The rationale for the names is that the former has a long history of use in folklore, and the latter is the obvious specialization of it. 2) I propose these functions be added to a new module in the `containers` library: `Data.Containers.ListUtils`. This can also potentially in the future add efficient list intersection, etc. as documented on the above reference link. The rationale for the new module is that it can provide a meaningful home for such functions which operate on lists, but require other data structures to be implemented efficiently... Discussion period: 2 weeks. --Gershom

Show replies by date

David Feuer

16 Oct 16 Oct

10:44 p.m.

I would imagine ordNub :: (Foldable t, Ord a) => t a -> [a] ordNub xs = foldr go (const []) xs Set.empty where go x r s | x `Set.member` s = r s | otherwise = x : r (Set.insert x s) which would suggest also ordNubR :: (Foldable t, Ord a) => t a -> [a] ordNubR xs = foldl go (const []) xs Set.empty where go r x s | x `Set.member` s = r s | otherwise = x : r (Set.insert x s) For containers biased the other way. Another question: do you have anything else you think should be stuck in the new module? On Oct 16, 2017 6:18 PM, "Gershom B" wrote: There have been many discussions over the years about adding an efficient order preserving nub somewhere to our core libraries. It always comes down to the same issue: an efficient nub wants to be backed by an efficient `Set`, but the API of the `nub` itself doesn't make reference to any other data structures besides lists. So it feels a bit conceptually strange to put an efficient nub anywhere besides `Data.List` even though it can't go there without inverting our dependency tree in a weird way or inlining an efficient set implementation into the middle of it. Nonetheless, the convenience of having a good `nub` lying around in a core library is undeniable, and after writing the "usual" one in my code for the zillionth time, I decided to raise an issue about it: https://github.com/haskell/containers/issues/439 I was promptly directed here to make a proper proposal. So, here: 1) I propose two new functions, `ordNub` and `intNub` with the standard implementation (from https://github.com/nh2/haskell-ordnub ): import qualified Data.Set as Set ordNub :: (Ord a) => [a] -> [a] ordNub l = go Set.empty l where go _ [] = [] go s (x:xs) = if x `Set.member` s then go s xs else x : go (Set.insert x s) xs and the same implementation, but specialized to `Int` and using `IntSet`s. The rationale for the names is that the former has a long history of use in folklore, and the latter is the obvious specialization of it. 2) I propose these functions be added to a new module in the `containers` library: `Data.Containers.ListUtils`. This can also potentially in the future add efficient list intersection, etc. as documented on the above reference link. The rationale for the new module is that it can provide a meaningful home for such functions which operate on lists, but require other data structures to be implemented efficiently... Discussion period: 2 weeks. --Gershom _______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

Alex Rozenshteyn

17 Oct 17 Oct

5:44 a.m.

(Forgive me if I shouldn't be posting on libraries@. I did a little searching and couldn't determine if this list is supposed to be a public forum or not)

...

[D]o you have anything else you think should be stuck in the new module?

As a user, I would expect to find the `...By` variants in the same location, but that precludes reusing `Set` without relying on complicated machinery like `reflection`, doesn't it? On Mon, Oct 16, 2017 at 3:45 PM David Feuer wrote:

...

I would imagine

ordNub :: (Foldable t, Ord a) => t a -> [a] ordNub xs = foldr go (const []) xs Set.empty where go x r s | x `Set.member` s = r s | otherwise = x : r (Set.insert x s)

which would suggest also

ordNubR :: (Foldable t, Ord a) => t a -> [a] ordNubR xs = foldl go (const []) xs Set.empty where go r x s | x `Set.member` s = r s | otherwise = x : r (Set.insert x s)

For containers biased the other way.

Another question: do you have anything else you think should be stuck in the new module?

On Oct 16, 2017 6:18 PM, "Gershom B" wrote:

There have been many discussions over the years about adding an efficient order preserving nub somewhere to our core libraries. It always comes down to the same issue: an efficient nub wants to be backed by an efficient `Set`, but the API of the `nub` itself doesn't make reference to any other data structures besides lists. So it feels a bit conceptually strange to put an efficient nub anywhere besides `Data.List` even though it can't go there without inverting our dependency tree in a weird way or inlining an efficient set implementation into the middle of it.

Nonetheless, the convenience of having a good `nub` lying around in a core library is undeniable, and after writing the "usual" one in my code for the zillionth time, I decided to raise an issue about it:

https://github.com/haskell/containers/issues/439

I was promptly directed here to make a proper proposal.

So, here:

1) I propose two new functions,

`ordNub` and `intNub`

with the standard implementation (from https://github.com/nh2/haskell-ordnub):

import qualified Data.Set as Set

ordNub :: (Ord a) => [a] -> [a] ordNub l = go Set.empty l where go _ [] = [] go s (x:xs) = if x `Set.member` s then go s xs else x : go (Set.insert x s) xs

and the same implementation, but specialized to `Int` and using `IntSet`s.

The rationale for the names is that the former has a long history of use in folklore, and the latter is the obvious specialization of it.

2) I propose these functions be added to a new module in the `containers` library: `Data.Containers.ListUtils`. This can also potentially in the future add efficient list intersection, etc. as documented on the above reference link.

The rationale for the new module is that it can provide a meaningful home for such functions which operate on lists, but require other data structures to be implemented efficiently...

Discussion period: 2 weeks.

--Gershom _______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

David Feuer

5:54 a.m.

Not really. We have access to the internals, so we *can* do that sort of thing. Should we? On Oct 17, 2017 1:44 AM, "Alex Rozenshteyn" wrote:

...

(Forgive me if I shouldn't be posting on libraries@. I did a little searching and couldn't determine if this list is supposed to be a public forum or not)

...
[D]o you have anything else you think should be stuck in the new module?

As a user, I would expect to find the `...By` variants in the same location, but that precludes reusing `Set` without relying on complicated machinery like `reflection`, doesn't it?

On Mon, Oct 16, 2017 at 3:45 PM David Feuer wrote:

...
I would imagine

ordNub :: (Foldable t, Ord a) => t a -> [a] ordNub xs = foldr go (const []) xs Set.empty where go x r s | x `Set.member` s = r s | otherwise = x : r (Set.insert x s)

which would suggest also

ordNubR :: (Foldable t, Ord a) => t a -> [a] ordNubR xs = foldl go (const []) xs Set.empty where go r x s | x `Set.member` s = r s | otherwise = x : r (Set.insert x s)

For containers biased the other way.

Another question: do you have anything else you think should be stuck in the new module?

On Oct 16, 2017 6:18 PM, "Gershom B" wrote:

There have been many discussions over the years about adding an efficient order preserving nub somewhere to our core libraries. It always comes down to the same issue: an efficient nub wants to be backed by an efficient `Set`, but the API of the `nub` itself doesn't make reference to any other data structures besides lists. So it feels a bit conceptually strange to put an efficient nub anywhere besides `Data.List` even though it can't go there without inverting our dependency tree in a weird way or inlining an efficient set implementation into the middle of it.

Nonetheless, the convenience of having a good `nub` lying around in a core library is undeniable, and after writing the "usual" one in my code for the zillionth time, I decided to raise an issue about it:

https://github.com/haskell/containers/issues/439

I was promptly directed here to make a proper proposal.

So, here:

1) I propose two new functions,

`ordNub` and `intNub`

with the standard implementation (from https://github.com/nh2/ haskell-ordnub):

import qualified Data.Set as Set

ordNub :: (Ord a) => [a] -> [a] ordNub l = go Set.empty l where go _ [] = [] go s (x:xs) = if x `Set.member` s then go s xs else x : go (Set.insert x s) xs

and the same implementation, but specialized to `Int` and using `IntSet`s.

The rationale for the names is that the former has a long history of use in folklore, and the latter is the obvious specialization of it.

2) I propose these functions be added to a new module in the `containers` library: `Data.Containers.ListUtils`. This can also potentially in the future add efficient list intersection, etc. as documented on the above reference link.

The rationale for the new module is that it can provide a meaningful home for such functions which operate on lists, but require other data structures to be implemented efficiently...

Discussion period: 2 weeks.

--Gershom _______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

Gershom B

6:09 a.m.

Good point on the by variants. (And indeed this is a public forum). Those can be written without any fancy machinery, just keeping the `by` data in the set, in the straightforward way. I agree it makes sense to add them. -g On October 17, 2017 at 1:44:30 AM, Alex Rozenshteyn (rpglover64@gmail.com) wrote:

...

(Forgive me if I shouldn't be posting on libraries@. I did a little searching and couldn't determine if this list is supposed to be a public forum or not)

...
[D]o you have anything else you think should be stuck in the new module?

As a user, I would expect to find the `...By` variants in the same location, but that precludes reusing `Set` without relying on complicated machinery like `reflection`, doesn't it?

On Mon, Oct 16, 2017 at 3:45 PM David Feuer wrote:

...
I would imagine

ordNub :: (Foldable t, Ord a) => t a -> [a] ordNub xs = foldr go (const []) xs Set.empty where go x r s | x `Set.member` s = r s | otherwise = x : r (Set.insert x s)

which would suggest also

ordNubR :: (Foldable t, Ord a) => t a -> [a] ordNubR xs = foldl go (const []) xs Set.empty where go r x s | x `Set.member` s = r s | otherwise = x : r (Set.insert x s)

For containers biased the other way.

Another question: do you have anything else you think should be stuck in the new module?

On Oct 16, 2017 6:18 PM, "Gershom B" wrote:

There have been many discussions over the years about adding an efficient order preserving nub somewhere to our core libraries. It always comes down to the same issue: an efficient nub wants to be backed by an efficient `Set`, but the API of the `nub` itself doesn't make reference to any other data structures besides lists. So it feels a bit conceptually strange to put an efficient nub anywhere besides `Data.List` even though it can't go there without inverting our dependency tree in a weird way or inlining an efficient set implementation into the middle of it.

Nonetheless, the convenience of having a good `nub` lying around in a core library is undeniable, and after writing the "usual" one in my code for the zillionth time, I decided to raise an issue about it:

https://github.com/haskell/containers/issues/439

I was promptly directed here to make a proper proposal.

So, here:

1) I propose two new functions,

`ordNub` and `intNub`

with the standard implementation (from https://github.com/nh2/haskell-ordnub):

import qualified Data.Set as Set

ordNub :: (Ord a) => [a] -> [a] ordNub l = go Set.empty l where go _ [] = [] go s (x:xs) = if x `Set.member` s then go s xs else x : go (Set.insert x s) xs

and the same implementation, but specialized to `Int` and using `IntSet`s.

The rationale for the names is that the former has a long history of use in folklore, and the latter is the obvious specialization of it.

2) I propose these functions be added to a new module in the `containers` library: `Data.Containers.ListUtils`. This can also potentially in the future add efficient list intersection, etc. as documented on the above reference link.

The rationale for the new module is that it can provide a meaningful home for such functions which operate on lists, but require other data structures to be implemented efficiently...

Discussion period: 2 weeks.

--Gershom _______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

David Feuer

6:20 a.m.

nubOn :: Ord b => (a -> b) -> [a] -> [a], or the Foldable version, seem just a bit safer, but less convenient. On Oct 17, 2017 2:09 AM, "Gershom B" wrote:

...

Good point on the by variants. (And indeed this is a public forum).

Those can be written without any fancy machinery, just keeping the `by` data in the set, in the straightforward way. I agree it makes sense to add them.

-g

On October 17, 2017 at 1:44:30 AM, Alex Rozenshteyn (rpglover64@gmail.com) wrote:

...
(Forgive me if I shouldn't be posting on libraries@. I did a little searching and couldn't determine if this list is supposed to be a public forum or not)

...
[D]o you have anything else you think should be stuck in the new module?

As a user, I would expect to find the `...By` variants in the same location, but that precludes reusing `Set` without relying on complicated machinery like `reflection`, doesn't it?

On Mon, Oct 16, 2017 at 3:45 PM David Feuer wrote:

...
I would imagine

ordNub :: (Foldable t, Ord a) => t a -> [a] ordNub xs = foldr go (const []) xs Set.empty where go x r s | x `Set.member` s = r s | otherwise = x : r (Set.insert x s)

which would suggest also

ordNubR :: (Foldable t, Ord a) => t a -> [a] ordNubR xs = foldl go (const []) xs Set.empty where go r x s | x `Set.member` s = r s | otherwise = x : r (Set.insert x s)

For containers biased the other way.

Another question: do you have anything else you think should be stuck in the new module?

On Oct 16, 2017 6:18 PM, "Gershom B" wrote:

There have been many discussions over the years about adding an efficient order preserving nub somewhere to our core libraries. It always comes down to the same issue: an efficient nub wants to be backed by an efficient `Set`, but the API of the `nub` itself doesn't make reference to any other data structures besides lists. So it feels a bit conceptually strange to put an efficient nub anywhere besides `Data.List` even though it can't go there without inverting our dependency tree in a weird way or inlining an efficient set implementation into the middle of it.

Nonetheless, the convenience of having a good `nub` lying around in a core library is undeniable, and after writing the "usual" one in my code for the zillionth time, I decided to raise an issue about it:

https://github.com/haskell/containers/issues/439

I was promptly directed here to make a proper proposal.

So, here:

1) I propose two new functions,

`ordNub` and `intNub`

with the standard implementation (from https://github.com/nh2/haskell-ordnub):

import qualified Data.Set as Set

ordNub :: (Ord a) => [a] -> [a] ordNub l = go Set.empty l where go _ [] = [] go s (x:xs) = if x `Set.member` s then go s xs else x : go (Set.insert x s) xs

and the same implementation, but specialized to `Int` and using `IntSet`s.

The rationale for the names is that the former has a long history of use in folklore, and the latter is the obvious specialization of it.

2) I propose these functions be added to a new module in the `containers` library: `Data.Containers.ListUtils`. This can also potentially in the future add efficient list intersection, etc. as documented on the above reference link.

The rationale for the new module is that it can provide a meaningful home for such functions which operate on lists, but require other data structures to be implemented efficiently...

Discussion period: 2 weeks.

--Gershom _______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

Alex Rozenshteyn

3:21 p.m.

...

nubOn :: Ord b => (a -> b) -> [a] -> [a], or the Foldable version, seem just a bit safer, but less convenient.

I actually find the `...On` variants more convenient. I will often want to `nubBy ((==) `on` fst` (or `groupBy`), but rarely `nubBy (<=)`.

...

On Oct 17, 2017 2:09 AM, "Gershom B" wrote:

...
Good point on the by variants. (And indeed this is a public forum).

Those can be written without any fancy machinery, just keeping the `by` data in the set, in the straightforward way. I agree it makes sense to add them.

-g

On October 17, 2017 at 1:44:30 AM, Alex Rozenshteyn (rpglover64@gmail.com) wrote:

...
(Forgive me if I shouldn't be posting on libraries@. I did a little searching and couldn't determine if this list is supposed to be a public forum or not)

...
[D]o you have anything else you think should be stuck in the new module?

As a user, I would expect to find the `...By` variants in the same location, but that precludes reusing `Set` without relying on complicated machinery like `reflection`, doesn't it?

On Mon, Oct 16, 2017 at 3:45 PM David Feuer wrote:

...
I would imagine

ordNub :: (Foldable t, Ord a) => t a -> [a] ordNub xs = foldr go (const []) xs Set.empty where go x r s | x `Set.member` s = r s | otherwise = x : r (Set.insert x s)

which would suggest also

ordNubR :: (Foldable t, Ord a) => t a -> [a] ordNubR xs = foldl go (const []) xs Set.empty where go r x s | x `Set.member` s = r s | otherwise = x : r (Set.insert x s)

For containers biased the other way.

Another question: do you have anything else you think should be stuck in the new module?

On Oct 16, 2017 6:18 PM, "Gershom B" wrote:

There have been many discussions over the years about adding an efficient order preserving nub somewhere to our core libraries. It always comes down to the same issue: an efficient nub wants to be backed by an efficient `Set`, but the API of the `nub` itself doesn't make reference to any other data structures besides lists. So it feels a bit conceptually strange to put an efficient nub anywhere besides `Data.List` even though it can't go there without inverting our dependency tree in a weird way or inlining an efficient set implementation into the middle of it.

Nonetheless, the convenience of having a good `nub` lying around in a core library is undeniable, and after writing the "usual" one in my code for the zillionth time, I decided to raise an issue about it:

https://github.com/haskell/containers/issues/439

I was promptly directed here to make a proper proposal.

So, here:

1) I propose two new functions,

`ordNub` and `intNub`

with the standard implementation (from https://github.com/nh2/haskell-ordnub):

import qualified Data.Set as Set

ordNub :: (Ord a) => [a] -> [a] ordNub l = go Set.empty l where go _ [] = [] go s (x:xs) = if x `Set.member` s then go s xs else x : go (Set.insert x s) xs

and the same implementation, but specialized to `Int` and using `IntSet`s.

The rationale for the names is that the former has a long history of use in folklore, and the latter is the obvious specialization of it.

2) I propose these functions be added to a new module in the `containers` library: `Data.Containers.ListUtils`. This can also potentially in the future add efficient list intersection, etc. as documented on the above reference link.

The rationale for the new module is that it can provide a meaningful home for such functions which operate on lists, but require other data structures to be implemented efficiently...

Discussion period: 2 weeks.

--Gershom _______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

Andreas Abel

3:44 p.m.

+1 for nubOn. This is a very useful generalization and no more work than implementing just nub. On 17.10.2017 17:21, Alex Rozenshteyn wrote:

...

nubOn :: Ord b => (a -> b) -> [a] -> [a], or the Foldable version, seem just a bit safer, but less convenient.

I actually find the `...On` variants more convenient. I will often want to `nubBy ((==) `on` fst` (or `groupBy`), but rarely `nubBy (<=)`.

On Oct 17, 2017 2:09 AM, "Gershom B" mailto:gershomb@gmail.com> wrote:

Good point on the by variants. (And indeed this is a public forum).

Those can be written without any fancy machinery, just keeping the `by` data in the set, in the straightforward way. I agree it makes sense to add them.

-g

On October 17, 2017 at 1:44:30 AM, Alex Rozenshteyn (rpglover64@gmail.com mailto:rpglover64@gmail.com) wrote: > (Forgive me if I shouldn't be posting on libraries@. I did a little > searching and couldn't determine if this list is supposed to be a public > forum or not) > > > [D]o you have anything else you think should be stuck in the new module? > > As a user, I would expect to find the `...By` variants in the same > location, but that precludes reusing `Set` without relying on complicated > machinery like `reflection`, doesn't it? > > On Mon, Oct 16, 2017 at 3:45 PM David Feuer wrote: > > > I would imagine > > > > ordNub :: (Foldable t, Ord a) > > => t a -> [a] > > ordNub xs = foldr go (const []) xs Set.empty where > > go x r s > > | x `Set.member` s = r s > > | otherwise = x : r (Set.insert x s) > > > > which would suggest also > > > > ordNubR :: (Foldable t, Ord a) > > => t a -> [a] > > ordNubR xs = foldl go (const []) xs Set.empty where > > go r x s > > | x `Set.member` s = r s > > | otherwise = x : r (Set.insert x s) > > > > For containers biased the other way. > > > > Another question: do you have anything else you think should be stuck in > > the new module? > > > > On Oct 16, 2017 6:18 PM, "Gershom B" wrote: > > > > There have been many discussions over the years about adding an > > efficient order preserving nub somewhere to our core libraries. It > > always comes down to the same issue: an efficient nub wants to be > > backed by an efficient `Set`, but the API of the `nub` itself doesn't > > make reference to any other data structures besides lists. So it feels > > a bit conceptually strange to put an efficient nub anywhere besides > > `Data.List` even though it can't go there without inverting our > > dependency tree in a weird way or inlining an efficient set > > implementation into the middle of it. > > > > Nonetheless, the convenience of having a good `nub` lying around in a > > core library is undeniable, and after writing the "usual" one in my > > code for the zillionth time, I decided to raise an issue about it: > > > > https://github.com/haskell/containers/issues/439 > > > > I was promptly directed here to make a proper proposal. > > > > So, here: > > > > 1) I propose two new functions, > > > > `ordNub` and `intNub` > > > > with the standard implementation (from > > https://github.com/nh2/haskell-ordnub): > > > > import qualified Data.Set as Set > > > > ordNub :: (Ord a) => [a] -> [a] > > ordNub l = go Set.empty l > > where > > go _ [] = [] > > go s (x:xs) = if x `Set.member` s then go s xs > > else x : go (Set.insert x s) xs > > > > and the same implementation, but specialized to `Int` and using `IntSet`s. > > > > The rationale for the names is that the former has a long history of > > use in folklore, and the latter is the obvious specialization of it. > > > > 2) I propose these functions be added to a new module in the > > `containers` library: `Data.Containers.ListUtils`. This can also > > potentially in the future add efficient list intersection, etc. as > > documented on the above reference link. > > > > The rationale for the new module is that it can provide a meaningful > > home for such functions which operate on lists, but require other data > > structures to be implemented efficiently... > > > > Discussion period: 2 weeks. > > > > --Gershom > > _______________________________________________ > > Libraries mailing list > > Libraries@haskell.org mailto:Libraries@haskell.org > > http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries > > > > > > _______________________________________________ > > Libraries mailing list > > Libraries@haskell.org mailto:Libraries@haskell.org > > http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries > > >

_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

-- Andreas Abel <>< Du bist der geliebte Mensch. Department of Computer Science and Engineering Chalmers and Gothenburg University, Sweden andreas.abel@gu.se http://www.cse.chalmers.se/~abela/

Henning Thielemann

9:04 a.m.

On Mon, 16 Oct 2017, David Feuer wrote:

...

I would imagine

ordNub :: (Foldable t, Ord a) => t a -> [a]

Eric Torreborre

9:18 a.m.

As a tangent I am interested in knowing what would be the typeclass for this sort of operation: - take all the elements out of `f a` - do something with them with a function a -> b - insert the results back to get the same "original shape" `f b` In this case "do something with them" is "order them" but it could also be "invoke a function with all the 'a's and dispatch back the results" to allow "batching" function calls. ------------------------------------------------ Eric TORREBORRE T +49 176 420 8383 4 E etorreborre@yahoo.com P http://specs2.org B http://etorreborre.blogspot.com ------------------------------------------------ Le mardi 17 octobre 2017 à 11:05:43 UTC+2, Henning Thielemann a écrit : On Mon, 16 Oct 2017, David Feuer wrote:

...

I would imagine

ordNub :: (Foldable t, Ord a) => t a -> [a]

The kind of generalization where the input is Foldable but the output is just a list looks pretty unnatural to me. I think it means that we still do not have the proper generalization that allows us to define ordNub :: (??? f, Ord a) => f a -> f a I would just leave it as ordNub :: (Ord a) => [a] -> [a]_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

Henning Thielemann

9:32 a.m.

On Tue, 17 Oct 2017, Eric Torreborre via Libraries wrote:

...

As a tangent I am interested in knowing what would be the typeclass for this sort of operation:

- take all the elements out of `f a` - do something with them with a function a -> b - insert the results back to get the same "original shape" `f b`

I guess, this one can only be Functor.

...

In this case "do something with them" is "order them"

A function of type (a -> b) would not be able to order elements. ordNub could be implemented using Traversable: catMaybes (traverse ??? xs :: [Maybe a]) You would only need a generalization for catMaybes like mfilter.

Michael Walker

9:41 a.m.

...

ordNub could be implemented using Traversable:

catMaybes (traverse ??? xs :: [Maybe a])

You would only need a generalization for catMaybes like mfilter.

The witherable package (https://hackage.haskell.org/package/witherable) has a typeclass generalising Traversable to containers which can remove elements. So I suspect Witherable is exactly the abstraction needed to write a polymorphic function: ordNub :: (Witherable f, Ord a) => f a -> f a ordNub = catMaybes . ordNub' ordNub' :: (Traversable f, Ord a) => f a -> f (Maybe a) ordNub; = ... -- Michael Walker (http://www.barrucadu.co.uk)

Henning Thielemann

9:54 a.m.

On Tue, 17 Oct 2017, Henning Thielemann wrote:

...

ordNub could be implemented using Traversable:

catMaybes (traverse ??? xs :: [Maybe a])

You would only need a generalization for catMaybes like mfilter.

ordNub :: (Traversable f, MonadPlus f, Ord a) => f a -> f a ordNub = join . snd . Trav.mapAccumL (\s x -> if Set.member x s then (s,mzero) else (Set.insert x s, return x)) Set.empty We don't need mplus, only Monad and mzero.

Herbert Valerio Riedel

18 Oct 18 Oct

12:35 p.m.

On 2017-10-16 at 18:17:19 -0400, Gershom B wrote: [...]

...

1) I propose two new functions,

`ordNub` and `intNub`

with the standard implementation (from https://github.com/nh2/haskell-ordnub):

import qualified Data.Set as Set

ordNub :: (Ord a) => [a] -> [a] ordNub l = go Set.empty l where go _ [] = [] go s (x:xs) = if x `Set.member` s then go s xs else x : go (Set.insert x s) xs

and the same implementation, but specialized to `Int` and using `IntSet`s.

I'm generally +1 for an 'ordNub'; I don't mind the particular color of the shed. While at it, I'd like to use the occasion to point out two minor things with that specific implementation: a) One thing I've been missing from `container's API are "UPSERT"-ish operations; In the case of `Set`, the implementation above performs a lookup, and then has to start the insertion from scratch. If we had something like, Set.tryInsert :: Ord a => a -> Set a -> (Set a, Bool) or Set.tryInsert :: Ord a => a -> Set a -> Maybe (Set a) `ordNub` would benefit. b) This is more of an observation, as there's not much we can do here with little effort: iirc, `nub` has a worst-case space complexity (if the input list is made up of N distinct values, e.g. (`nub [1..n] :: [Int])) of 3N words for the [a]-spine to keep track of the already seen values. Using `Set` however trades time-complexity for memory-complexity, and needs 5N words for the 'Set a'-spine. c.f. http://blog.johantibell.com/2011/06/memory-footprints-of-some-common-data.ht... -- hvr

Niklas Hambüchen

5:25 p.m.

I guess it goes without saying that I'm +1 on this issue. Like you have I copied ordNub hundreds of times, and debugged clients' code slow performance for hours and hours, as due to lack of an easily available nub alternative they longed for nub. It would be great if this could finally come to an end.

David Feuer

5:49 p.m.

I am convinced that we should add ordNub :: Ord a => [a] -> [a] ordNubOn :: Ord b => (a -> b) -> [a] -> [b] intNub :: [Int] -> [Int] intNubOn :: (a -> Int) -> [a] -> [a] And because nub preserves non-emptiness, I believe we should also offer ordNub1 :: Ord a => NonEmpty a -> NonEmpty a ordNubOn1 :: Ord b => (a -> b) -> NonEmpty a -> NonEmpty a intNub1 :: NonEmpty Int -> NonEmpty Int intNubOn1 :: (a -> Int) -> NonEmpty a -> NonEmpty a I imagine we should also add these operations for Data.Sequence.Seq. I'm not yet convinced that we should add ordNubBy :: (a -> a -> Ordering) -> [a] -> [a] but I'm open to further discussion of that question. My main concern is that the properties of the comparison argument require careful documentation. In its favor, using it improperly cannot *expose* a broken Set to later operations. I would very much like to hear further bikeshedding around names and namespaces. On Oct 16, 2017 6:18 PM, "Gershom B" wrote:

...

There have been many discussions over the years about adding an efficient order preserving nub somewhere to our core libraries. It always comes down to the same issue: an efficient nub wants to be backed by an efficient `Set`, but the API of the `nub` itself doesn't make reference to any other data structures besides lists. So it feels a bit conceptually strange to put an efficient nub anywhere besides `Data.List` even though it can't go there without inverting our dependency tree in a weird way or inlining an efficient set implementation into the middle of it.

Nonetheless, the convenience of having a good `nub` lying around in a core library is undeniable, and after writing the "usual" one in my code for the zillionth time, I decided to raise an issue about it:

https://github.com/haskell/containers/issues/439

I was promptly directed here to make a proper proposal.

So, here:

1) I propose two new functions,

`ordNub` and `intNub`

with the standard implementation (from https://github.com/nh2/ haskell-ordnub):

import qualified Data.Set as Set

ordNub :: (Ord a) => [a] -> [a] ordNub l = go Set.empty l where go _ [] = [] go s (x:xs) = if x `Set.member` s then go s xs else x : go (Set.insert x s) xs

and the same implementation, but specialized to `Int` and using `IntSet`s.

The rationale for the names is that the former has a long history of use in folklore, and the latter is the obvious specialization of it.

2) I propose these functions be added to a new module in the `containers` library: `Data.Containers.ListUtils`. This can also potentially in the future add efficient list intersection, etc. as documented on the above reference link.

The rationale for the new module is that it can provide a meaningful home for such functions which operate on lists, but require other data structures to be implemented efficiently...

Discussion period: 2 weeks.

--Gershom _______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

Gershom B

9:28 p.m.

I agree that `ordNubOn` suffices and we don't need `ordNubBy` since there's nothing lawful you can do with the latter that you can't do with the former. I'm indifferent on the NonEmpty and Seq case as I don't suspect that they will yield much more efficient implementations than going via lists, especially if we setup (and we should!) the fusion rules correctly. I have no objection to adding them for completeness however. If we do add them, then the proposed module name `Data.Containers.ListUtils` becomes slightly less appropriate, but I think still fine, since these are "morally" all lists of various sorts. -g On Wed, Oct 18, 2017 at 1:49 PM, David Feuer wrote:

...

I am convinced that we should add

ordNub :: Ord a => [a] -> [a] ordNubOn :: Ord b => (a -> b) -> [a] -> [b] intNub :: [Int] -> [Int] intNubOn :: (a -> Int) -> [a] -> [a]

And because nub preserves non-emptiness, I believe we should also offer

ordNub1 :: Ord a => NonEmpty a -> NonEmpty a ordNubOn1 :: Ord b => (a -> b) -> NonEmpty a -> NonEmpty a intNub1 :: NonEmpty Int -> NonEmpty Int intNubOn1 :: (a -> Int) -> NonEmpty a -> NonEmpty a

I imagine we should also add these operations for Data.Sequence.Seq.

I'm not yet convinced that we should add

ordNubBy :: (a -> a -> Ordering) -> [a] -> [a]

but I'm open to further discussion of that question. My main concern is that the properties of the comparison argument require careful documentation. In its favor, using it improperly cannot *expose* a broken Set to later operations.

I would very much like to hear further bikeshedding around names and namespaces.

On Oct 16, 2017 6:18 PM, "Gershom B" wrote:

...
There have been many discussions over the years about adding an efficient order preserving nub somewhere to our core libraries. It always comes down to the same issue: an efficient nub wants to be backed by an efficient `Set`, but the API of the `nub` itself doesn't make reference to any other data structures besides lists. So it feels a bit conceptually strange to put an efficient nub anywhere besides `Data.List` even though it can't go there without inverting our dependency tree in a weird way or inlining an efficient set implementation into the middle of it.

Nonetheless, the convenience of having a good `nub` lying around in a core library is undeniable, and after writing the "usual" one in my code for the zillionth time, I decided to raise an issue about it:

https://github.com/haskell/containers/issues/439

I was promptly directed here to make a proper proposal.

So, here:

1) I propose two new functions,

`ordNub` and `intNub`

with the standard implementation (from https://github.com/nh2/haskell-ordnub):

import qualified Data.Set as Set

ordNub :: (Ord a) => [a] -> [a] ordNub l = go Set.empty l where go _ [] = [] go s (x:xs) = if x `Set.member` s then go s xs else x : go (Set.insert x s) xs

and the same implementation, but specialized to `Int` and using `IntSet`s.

The rationale for the names is that the former has a long history of use in folklore, and the latter is the obvious specialization of it.

2) I propose these functions be added to a new module in the `containers` library: `Data.Containers.ListUtils`. This can also potentially in the future add efficient list intersection, etc. as documented on the above reference link.

The rationale for the new module is that it can provide a meaningful home for such functions which operate on lists, but require other data structures to be implemented efficiently...

Discussion period: 2 weeks.

--Gershom _______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

David Feuer

9:40 p.m.

The trouble with NonEmpty is that I don't like the idea of users having to roll their own. They'd end up looking like case ordNub (toList xs) of [] -> error "can't happen!" x : xs -> x :| xs I hate "can't happen" errors! The case for sequences is less compelling with fusion, for sure, but it seems a bit strange to leave them out. On Oct 18, 2017 5:28 PM, "Gershom B" wrote:

...

I agree that `ordNubOn` suffices and we don't need `ordNubBy` since there's nothing lawful you can do with the latter that you can't do with the former. I'm indifferent on the NonEmpty and Seq case as I don't suspect that they will yield much more efficient implementations than going via lists, especially if we setup (and we should!) the fusion rules correctly. I have no objection to adding them for completeness however.

If we do add them, then the proposed module name `Data.Containers.ListUtils` becomes slightly less appropriate, but I think still fine, since these are "morally" all lists of various sorts.

-g

...
I am convinced that we should add

ordNub :: Ord a => [a] -> [a] ordNubOn :: Ord b => (a -> b) -> [a] -> [b] intNub :: [Int] -> [Int] intNubOn :: (a -> Int) -> [a] -> [a]

And because nub preserves non-emptiness, I believe we should also offer

ordNub1 :: Ord a => NonEmpty a -> NonEmpty a ordNubOn1 :: Ord b => (a -> b) -> NonEmpty a -> NonEmpty a intNub1 :: NonEmpty Int -> NonEmpty Int intNubOn1 :: (a -> Int) -> NonEmpty a -> NonEmpty a

I imagine we should also add these operations for Data.Sequence.Seq.

I'm not yet convinced that we should add

ordNubBy :: (a -> a -> Ordering) -> [a] -> [a]

but I'm open to further discussion of that question. My main concern is

On Wed, Oct 18, 2017 at 1:49 PM, David Feuer wrote: that

...
the properties of the comparison argument require careful documentation. In its favor, using it improperly cannot *expose* a broken Set to later operations.

I would very much like to hear further bikeshedding around names and namespaces.

On Oct 16, 2017 6:18 PM, "Gershom B" wrote:

...
There have been many discussions over the years about adding an efficient order preserving nub somewhere to our core libraries. It always comes down to the same issue: an efficient nub wants to be backed by an efficient `Set`, but the API of the `nub` itself doesn't make reference to any other data structures besides lists. So it feels a bit conceptually strange to put an efficient nub anywhere besides `Data.List` even though it can't go there without inverting our dependency tree in a weird way or inlining an efficient set implementation into the middle of it.

Nonetheless, the convenience of having a good `nub` lying around in a core library is undeniable, and after writing the "usual" one in my code for the zillionth time, I decided to raise an issue about it:

https://github.com/haskell/containers/issues/439

I was promptly directed here to make a proper proposal.

So, here:

1) I propose two new functions,

`ordNub` and `intNub`

with the standard implementation (from https://github.com/nh2/haskell-ordnub):

import qualified Data.Set as Set

ordNub :: (Ord a) => [a] -> [a] ordNub l = go Set.empty l where go _ [] = [] go s (x:xs) = if x `Set.member` s then go s xs else x : go (Set.insert x s) xs

and the same implementation, but specialized to `Int` and using

`IntSet`s.

...
The rationale for the names is that the former has a long history of use in folklore, and the latter is the obvious specialization of it.

2) I propose these functions be added to a new module in the `containers` library: `Data.Containers.ListUtils`. This can also potentially in the future add efficient list intersection, etc. as documented on the above reference link.

The rationale for the new module is that it can provide a meaningful home for such functions which operate on lists, but require other data structures to be implemented efficiently...

Discussion period: 2 weeks.

--Gershom _______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

Gershom B

9:45 p.m.

Vis a vis NonEmpty, base provides a partial `fromList`, much as I dislike it. Hence, the proper "roll your own" is `fromList . ordNub . toList`. --Gershom On Wed, Oct 18, 2017 at 5:40 PM, David Feuer wrote:

...

The trouble with NonEmpty is that I don't like the idea of users having to roll their own. They'd end up looking like

case ordNub (toList xs) of [] -> error "can't happen!" x : xs -> x :| xs

I hate "can't happen" errors!

The case for sequences is less compelling with fusion, for sure, but it seems a bit strange to leave them out.

On Oct 18, 2017 5:28 PM, "Gershom B" wrote:

...
I agree that `ordNubOn` suffices and we don't need `ordNubBy` since there's nothing lawful you can do with the latter that you can't do with the former. I'm indifferent on the NonEmpty and Seq case as I don't suspect that they will yield much more efficient implementations than going via lists, especially if we setup (and we should!) the fusion rules correctly. I have no objection to adding them for completeness however.

If we do add them, then the proposed module name `Data.Containers.ListUtils` becomes slightly less appropriate, but I think still fine, since these are "morally" all lists of various sorts.

-g

On Wed, Oct 18, 2017 at 1:49 PM, David Feuer wrote:

...
I am convinced that we should add

ordNub :: Ord a => [a] -> [a] ordNubOn :: Ord b => (a -> b) -> [a] -> [b] intNub :: [Int] -> [Int] intNubOn :: (a -> Int) -> [a] -> [a]

And because nub preserves non-emptiness, I believe we should also offer

ordNub1 :: Ord a => NonEmpty a -> NonEmpty a ordNubOn1 :: Ord b => (a -> b) -> NonEmpty a -> NonEmpty a intNub1 :: NonEmpty Int -> NonEmpty Int intNubOn1 :: (a -> Int) -> NonEmpty a -> NonEmpty a

I imagine we should also add these operations for Data.Sequence.Seq.

I'm not yet convinced that we should add

ordNubBy :: (a -> a -> Ordering) -> [a] -> [a]

but I'm open to further discussion of that question. My main concern is that the properties of the comparison argument require careful documentation. In its favor, using it improperly cannot *expose* a broken Set to later operations.

I would very much like to hear further bikeshedding around names and namespaces.

On Oct 16, 2017 6:18 PM, "Gershom B" wrote:

...
There have been many discussions over the years about adding an efficient order preserving nub somewhere to our core libraries. It always comes down to the same issue: an efficient nub wants to be backed by an efficient `Set`, but the API of the `nub` itself doesn't make reference to any other data structures besides lists. So it feels a bit conceptually strange to put an efficient nub anywhere besides `Data.List` even though it can't go there without inverting our dependency tree in a weird way or inlining an efficient set implementation into the middle of it.

Nonetheless, the convenience of having a good `nub` lying around in a core library is undeniable, and after writing the "usual" one in my code for the zillionth time, I decided to raise an issue about it:

https://github.com/haskell/containers/issues/439

I was promptly directed here to make a proper proposal.

So, here:

1) I propose two new functions,

`ordNub` and `intNub`

with the standard implementation (from https://github.com/nh2/haskell-ordnub):

import qualified Data.Set as Set

ordNub :: (Ord a) => [a] -> [a] ordNub l = go Set.empty l where go _ [] = [] go s (x:xs) = if x `Set.member` s then go s xs else x : go (Set.insert x s) xs

and the same implementation, but specialized to `Int` and using `IntSet`s.

The rationale for the names is that the former has a long history of use in folklore, and the latter is the obvious specialization of it.

2) I propose these functions be added to a new module in the `containers` library: `Data.Containers.ListUtils`. This can also potentially in the future add efficient list intersection, etc. as documented on the above reference link.

The rationale for the new module is that it can provide a meaningful home for such functions which operate on lists, but require other data structures to be implemented efficiently...

Discussion period: 2 weeks.

--Gershom _______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

Henning Thielemann

19 Oct 19 Oct

7:50 a.m.

On Wed, 18 Oct 2017, Gershom B wrote:

...

Vis a vis NonEmpty, base provides a partial `fromList`, much as I dislike it. Hence, the proper "roll your own" is `fromList . ordNub . toList`.

My idea of NonEmpty is to prove totality by implementation. I'd prefer a custom obviously total implementation of NonEmpty.ordNub.

Michael Sloan

18 Oct 18 Oct

10:22 p.m.

On Wed, Oct 18, 2017 at 10:49 AM, David Feuer wrote:

...

I am convinced that we should add

ordNub :: Ord a => [a] -> [a] ordNubOn :: Ord b => (a -> b) -> [a] -> [b] intNub :: [Int] -> [Int] intNubOn :: (a -> Int) -> [a] -> [a]

+1 to ordNub / ordNubOn, and having the documentation for nub mention that you probably want ordNub in nearly all cases. +1 to intNub / intNubOn, with RULES pragma to specialize ordNub / ordNubOn to it.

...

And because nub preserves non-emptiness, I believe we should also offer

ordNub1 :: Ord a => NonEmpty a -> NonEmpty a ordNubOn1 :: Ord b => (a -> b) -> NonEmpty a -> NonEmpty a intNub1 :: NonEmpty Int -> NonEmpty Int intNubOn1 :: (a -> Int) -> NonEmpty a -> NonEmpty a

I wish these could go in base. It would be export these from Data.List.NonEmpty, using the same names as the list operations, but this is not possible due to the implementation in terms of Set.

...

I imagine we should also add these operations for Data.Sequence.Seq.

...

I'm not yet convinced that we should add

ordNubBy :: (a -> a -> Ordering) -> [a] -> [a]

Such an operation is quite useful, though unfortunately it does not fit naturally into an implementation in terms of Set. There are quite reasonable use-cases here, where even a hack with a new datatype could be quite convoluted. I recently used a "sortNubBy" in a client project. Perhaps that should be another proposal - it is something that could be in base, though less efficient. In particular, this function is useful in cases where the ordering depends on some value other than the value of the element. Another, more common case, is ignoring some distinctions between values, such as ignoring a field. Sure, you can make a newtype and use "coerce" for this, but I don't like using "coerce" because in general it can break invariants. Much better than unsafeCoerce, of course, but still unsafe. I'd rather not create an Ord instance just to use it in one place, it seems like unnecessary plumbing.

...

but I'm open to further discussion of that question. My main concern is that the properties of the comparison argument require careful documentation. In its favor, using it improperly cannot *expose* a broken Set to later operations.

I would very much like to hear further bikeshedding around names and namespaces.

On Oct 16, 2017 6:18 PM, "Gershom B" wrote:

...
There have been many discussions over the years about adding an efficient order preserving nub somewhere to our core libraries. It always comes down to the same issue: an efficient nub wants to be backed by an efficient `Set`, but the API of the `nub` itself doesn't make reference to any other data structures besides lists. So it feels a bit conceptually strange to put an efficient nub anywhere besides `Data.List` even though it can't go there without inverting our dependency tree in a weird way or inlining an efficient set implementation into the middle of it.

Nonetheless, the convenience of having a good `nub` lying around in a core library is undeniable, and after writing the "usual" one in my code for the zillionth time, I decided to raise an issue about it:

https://github.com/haskell/containers/issues/439

I was promptly directed here to make a proper proposal.

So, here:

1) I propose two new functions,

`ordNub` and `intNub`

with the standard implementation (from https://github.com/nh2/haskell-ordnub):

import qualified Data.Set as Set

ordNub :: (Ord a) => [a] -> [a] ordNub l = go Set.empty l where go _ [] = [] go s (x:xs) = if x `Set.member` s then go s xs else x : go (Set.insert x s) xs

and the same implementation, but specialized to `Int` and using `IntSet`s.

The rationale for the names is that the former has a long history of use in folklore, and the latter is the obvious specialization of it.

2) I propose these functions be added to a new module in the `containers` library: `Data.Containers.ListUtils`. This can also potentially in the future add efficient list intersection, etc. as documented on the above reference link.

The rationale for the new module is that it can provide a meaningful home for such functions which operate on lists, but require other data structures to be implemented efficiently...

Discussion period: 2 weeks.

--Gershom _______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

Andreas Abel

19 Oct 19 Oct

8 a.m.

Your current proposal sounds fine, I think it is safe to go ahead.

...

ordNubBy :: (a -> a -> Ordering) -> [a] -> [a]

can be taken at a second step, if there is no consensus now. Personally, I would include it. On 18.10.2017 19:49, David Feuer wrote:

...

I am convinced that we should add

ordNub :: Ord a => [a] -> [a] ordNubOn :: Ord b => (a -> b) -> [a] -> [b] intNub :: [Int] -> [Int] intNubOn :: (a -> Int) -> [a] -> [a]

And because nub preserves non-emptiness, I believe we should also offer

ordNub1 :: Ord a => NonEmpty a -> NonEmpty a ordNubOn1 :: Ord b => (a -> b) -> NonEmpty a -> NonEmpty a intNub1 :: NonEmpty Int -> NonEmpty Int intNubOn1 :: (a -> Int) -> NonEmpty a -> NonEmpty a

I imagine we should also add these operations for Data.Sequence.Seq.

I'm not yet convinced that we should add

ordNubBy :: (a -> a -> Ordering) -> [a] -> [a]

but I'm open to further discussion of that question. My main concern is that the properties of the comparison argument require careful documentation. In its favor, using it improperly cannot *expose* a broken Set to later operations.

I would very much like to hear further bikeshedding around names and namespaces.

On Oct 16, 2017 6:18 PM, "Gershom B" mailto:gershomb@gmail.com> wrote:

There have been many discussions over the years about adding an efficient order preserving nub somewhere to our core libraries. It always comes down to the same issue: an efficient nub wants to be backed by an efficient `Set`, but the API of the `nub` itself doesn't make reference to any other data structures besides lists. So it feels a bit conceptually strange to put an efficient nub anywhere besides `Data.List` even though it can't go there without inverting our dependency tree in a weird way or inlining an efficient set implementation into the middle of it.

Nonetheless, the convenience of having a good `nub` lying around in a core library is undeniable, and after writing the "usual" one in my code for the zillionth time, I decided to raise an issue about it:

https://github.com/haskell/containers/issues/439 https://github.com/haskell/containers/issues/439

I was promptly directed here to make a proper proposal.

So, here:

1) I propose two new functions,

`ordNub` and `intNub`

with the standard implementation (from https://github.com/nh2/haskell-ordnub https://github.com/nh2/haskell-ordnub):

import qualified Data.Set as Set

ordNub :: (Ord a) => [a] -> [a] ordNub l = go Set.empty l where go _ [] = [] go s (x:xs) = if x `Set.member` s then go s xs else x : go (Set.insert x s) xs

and the same implementation, but specialized to `Int` and using `IntSet`s.

The rationale for the names is that the former has a long history of use in folklore, and the latter is the obvious specialization of it.

2) I propose these functions be added to a new module in the `containers` library: `Data.Containers.ListUtils`. This can also potentially in the future add efficient list intersection, etc. as documented on the above reference link.

The rationale for the new module is that it can provide a meaningful home for such functions which operate on lists, but require other data structures to be implemented efficiently...

Discussion period: 2 weeks.

--Gershom _______________________________________________ Libraries mailing list Libraries@haskell.org mailto:Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

-- Andreas Abel <>< Du bist der geliebte Mensch. Department of Computer Science and Engineering Chalmers and Gothenburg University, Sweden andreas.abel@gu.se http://www.cse.chalmers.se/~abela/

Joachim Breitner

20 Oct 20 Oct

2:14 p.m.

New subject: Proposal: add nubOrd somewhere in containers

Hi, Am Mittwoch, den 18.10.2017, 13:49 -0400 schrieb David Feuer:

...

I am convinced that we should add

ordNub :: Ord a => [a] -> [a] ordNubOn :: Ord b => (a -> b) -> [a] -> [b] intNub :: [Int] -> [Int] intNubOn :: (a -> Int) -> [a] -> [a]

can we shed the names a bit? I think the operation “nubbing” should be first, and the details (”…On”, “…By”, “…OrdOn”, “Ord”, “Int”) should be the second component. This would also have the advantage of listing all ”nub” variants together in an index. So: nubOrd :: Ord a => [a] -> [a] nubOrdOn :: Ord b => (a -> b) -> [a] -> [b] nubInt :: [Int] -> [Int] nubIntOn :: (a -> Int) -> [a] -> [a] This is consistent with, say “map” vs. “mapM” and other suffix-based naming in the libraries. Greetings, Joachim -- Joachim Breitner mail@joachim-breitner.de http://www.joachim-breitner.de/

Elliot Cameron

2:20 p.m.

New subject: Proposal: add nubOrd somewhere in containers

That's also consistent with the commonly used "extra" package: https://hackage.haskell.org/package/extra-1.6/docs/Data-List-Extra.html#v:nu... On Fri, Oct 20, 2017 at 10:14 AM, Joachim Breitner

...

wrote:

...

Hi,

Am Mittwoch, den 18.10.2017, 13:49 -0400 schrieb David Feuer:

...
I am convinced that we should add

ordNub :: Ord a => [a] -> [a] ordNubOn :: Ord b => (a -> b) -> [a] -> [b] intNub :: [Int] -> [Int] intNubOn :: (a -> Int) -> [a] -> [a]

can we shed the names a bit? I think the operation “nubbing” should be first, and the details (”…On”, “…By”, “…OrdOn”, “Ord”, “Int”) should be the second component. This would also have the advantage of listing all ”nub” variants together in an index.

So:

nubOrd :: Ord a => [a] -> [a] nubOrdOn :: Ord b => (a -> b) -> [a] -> [b] nubInt :: [Int] -> [Int] nubIntOn :: (a -> Int) -> [a] -> [a]

This is consistent with, say “map” vs. “mapM” and other suffix-based naming in the libraries.

Greetings, Joachim -- Joachim Breitner mail@joachim-breitner.de http://www.joachim-breitner.de/

_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

Gershom B

13 Nov 13 Nov

5:53 p.m.

The discussion period is well over and it seems the consensus is for adding this, basically as David proposes. -- i.e. ordNub, and ordNubOn, but not ordNubBy. (and also the -1 variants and the int- variants). I think the only module name proposed to add them to is Data.Containers.ListUtils, so I guess that stands. David -- are you up for adding these, or would you like someone else (i.e. me) to prepare a patch? --Gershom On Wed, Oct 18, 2017 at 1:49 PM, David Feuer wrote:

...

I am convinced that we should add

ordNub :: Ord a => [a] -> [a] ordNubOn :: Ord b => (a -> b) -> [a] -> [b] intNub :: [Int] -> [Int] intNubOn :: (a -> Int) -> [a] -> [a]

And because nub preserves non-emptiness, I believe we should also offer

ordNub1 :: Ord a => NonEmpty a -> NonEmpty a ordNubOn1 :: Ord b => (a -> b) -> NonEmpty a -> NonEmpty a intNub1 :: NonEmpty Int -> NonEmpty Int intNubOn1 :: (a -> Int) -> NonEmpty a -> NonEmpty a

I imagine we should also add these operations for Data.Sequence.Seq.

I'm not yet convinced that we should add

ordNubBy :: (a -> a -> Ordering) -> [a] -> [a]

but I'm open to further discussion of that question. My main concern is that the properties of the comparison argument require careful documentation. In its favor, using it improperly cannot *expose* a broken Set to later operations.

I would very much like to hear further bikeshedding around names and namespaces.

On Oct 16, 2017 6:18 PM, "Gershom B" wrote:

...
There have been many discussions over the years about adding an efficient order preserving nub somewhere to our core libraries. It always comes down to the same issue: an efficient nub wants to be backed by an efficient `Set`, but the API of the `nub` itself doesn't make reference to any other data structures besides lists. So it feels a bit conceptually strange to put an efficient nub anywhere besides `Data.List` even though it can't go there without inverting our dependency tree in a weird way or inlining an efficient set implementation into the middle of it.

Nonetheless, the convenience of having a good `nub` lying around in a core library is undeniable, and after writing the "usual" one in my code for the zillionth time, I decided to raise an issue about it:

https://github.com/haskell/containers/issues/439

I was promptly directed here to make a proper proposal.

So, here:

1) I propose two new functions,

`ordNub` and `intNub`

with the standard implementation (from https://github.com/nh2/haskell-ordnub):

import qualified Data.Set as Set

ordNub :: (Ord a) => [a] -> [a] ordNub l = go Set.empty l where go _ [] = [] go s (x:xs) = if x `Set.member` s then go s xs else x : go (Set.insert x s) xs

and the same implementation, but specialized to `Int` and using `IntSet`s.

The rationale for the names is that the former has a long history of use in folklore, and the latter is the obvious specialization of it.

2) I propose these functions be added to a new module in the `containers` library: `Data.Containers.ListUtils`. This can also potentially in the future add efficient list intersection, etc. as documented on the above reference link.

The rationale for the new module is that it can provide a meaningful home for such functions which operate on lists, but require other data structures to be implemented efficiently...

Discussion period: 2 weeks.

--Gershom _______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

Henning Thielemann

6:10 p.m.

On Mon, 13 Nov 2017, Gershom B wrote:

...

The discussion period is well over and it seems the consensus is for adding this, basically as David proposes. -- i.e. ordNub, and ordNubOn, but not ordNubBy. (and also the -1 variants and the int- variants). I think the only module name proposed to add them to is Data.Containers.ListUtils, so I guess that stands.

I think there should be some more thoughts on a good module name. E.g. use of plural (Utils vs. Util) - singular makes more sense in qualified identifiers, use of abbreviations (Util vs. Utility) - we have 'import ... as' for abbreviations, consistency with other data structures - where would we put ordNub for Seq, in Data.Container.SequenceUtility or in Data.Sequence?

2799

Age (days ago)

2827

Last active (days ago)

List overview

Download

25 comments

13 participants

participants (13)

Alex Rozenshteyn
Andreas Abel
Andreas Abel
David Feuer
Elliot Cameron
Eric Torreborre
Gershom B
Henning Thielemann
Herbert Valerio Riedel
Joachim Breitner
Michael Sloan
Michael Walker
Niklas Hambüchen

Proposal: add ordNub somewhere in containers

tags

participants (13)