Proposal: Add filterM or filterA to Data.Sequence

newer
Proposal: improve the Data.Tree API

older
Proposal: Move some functions from...

David Feuer

28 Dec 2014 28 Dec '14

10:52 a.m.

This can be given exactly the same implementation as the one for lists: filterM :: (Applicative f) => (a -> f Bool) -> Seq a -> f (Seq a) filterM p = foldr go (pure empty) where go x r = f <$> p x <*> r where f flg ys = if flg then x <| ys else ys Bikeshed all you like over the name.

Show replies by date

Andreas Abel

29 Dec 29 Dec

4:22 p.m.

+1 On 28.12.2014 06:22, David Feuer wrote:

...

This can be given exactly the same implementation as the one for lists:

filterM :: (Applicative f) => (a -> f Bool) -> Seq a -> f (Seq a) filterM p = foldr go (pure empty) where go x r = f <$> p x <*> r where f flg ys = if flg then x <| ys else ys

Bikeshed all you like over the name.

-- Andreas Abel <>< Du bist der geliebte Mensch. Department of Computer Science and Engineering Chalmers and Gothenburg University, Sweden andreas.abel@gu.se http://www2.tcs.ifi.lmu.de/~abel/

Edward Kmett

4:43 p.m.

+1 to just generalizing filterM in Data.Sequence On Sun, Dec 28, 2014 at 12:22 AM, David Feuer wrote:

...

This can be given exactly the same implementation as the one for lists:

filterM :: (Applicative f) => (a -> f Bool) -> Seq a -> f (Seq a) filterM p = foldr go (pure empty) where go x r = f <$> p x <*> r where f flg ys = if flg then x <| ys else ys

Bikeshed all you like over the name. _______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

Johan Tibell

30 Dec 30 Dec

10:35 p.m.

We should check that `filterM . toList` isn't as fast. That was the case for bytestring and we rejected adding filterM there for that reason. On Mon, Dec 29, 2014 at 6:13 AM, Edward Kmett wrote:

...

+1 to just generalizing filterM in Data.Sequence

On Sun, Dec 28, 2014 at 12:22 AM, David Feuer wrote:

...
This can be given exactly the same implementation as the one for lists:

filterM :: (Applicative f) => (a -> f Bool) -> Seq a -> f (Seq a) filterM p = foldr go (pure empty) where go x r = f <$> p x <*> r where f flg ys = if flg then x <| ys else ys

Bikeshed all you like over the name. _______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

Artyom

11:22 p.m.

I’m not sure I understand. Do you mean that |filterM| shouldn’t exist for data structures for which |filterM . toList| is as fast? If this is the case, I wish that it was at least specified in the documentation that e.g. “this function doesn’t exist because the naive composition is guaranteed to be optimised away / the faster version is actually impossible to write / etc.”. I find myself wondering way too often whether some piece of code I’ve written is a potential candidate for optimisation, and knowing in advance that the naive version is the “recommended” approach lets me not waste my time on benchmarking code which was already benchmarked by others. On 12/30/2014 08:05 PM, Johan Tibell wrote:

...

We should check that `filterM . toList` isn't as fast. That was the case for bytestring and we rejected adding filterM there for that reason.

On Mon, Dec 29, 2014 at 6:13 AM, Edward Kmett mailto:ekmett@gmail.com> wrote:

+1 to just generalizing filterM in Data.Sequence

On Sun, Dec 28, 2014 at 12:22 AM, David Feuer mailto:david.feuer@gmail.com> wrote:

This can be given exactly the same implementation as the one for lists:

filterM :: (Applicative f) => (a -> f Bool) -> Seq a -> f (Seq a) filterM p = foldr go (pure empty) where go x r = f <0.3927809241601645gt; p x <*> r where f flg ys = if flg then x <| ys else ys

Bikeshed all you like over the name. _______________________________________________ Libraries mailing list Libraries@haskell.org mailto:Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

_______________________________________________ Libraries mailing list Libraries@haskell.org mailto:Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

Johan Tibell

11:35 p.m.

The larger point is this: should we have applicative functions for every higher order function, for every data type? For example, one could make an argument for having Map.alterA in addition to Map.alter, because without alterA one would have to use a combination of lookup and insert, which could be slower, to achieve the same effect. The obvious downside is the explosion of functions in the API, which is even worse due to their already being lazy and strict versions of most higher-order function (i.e. now we'd have to have 2*2 versions of every function). This seems like a failure in composability and abstraction. Until we've figured out a way to deal with this general issue that doesn't involve duplicating tons of code and swelling the API, I've been pushing back on changes like this. In this particular case, having to go via lists might hurt performance a bit, but might still be better than the other alternative. As one of the maintainers of the containers package these are the kind of issues I have to consider*. * This is by the way one of the reasons that it's important for packages to have dedicated maintainers, so make sure proposed changes are considered in the larger context of the health and evolution of the package as a whole. On Tue, Dec 30, 2014 at 12:52 PM, Artyom wrote:

...

I’m not sure I understand. Do you mean that |filterM| shouldn’t exist for data structures for which |filterM . toList| is as fast?

If this is the case, I wish that it was at least specified in the documentation that e.g. “this function doesn’t exist because the naive composition is guaranteed to be optimised away / the faster version is actually impossible to write / etc.”. I find myself wondering way too often whether some piece of code I’ve written is a potential candidate for optimisation, and knowing in advance that the naive version is the “recommended” approach lets me not waste my time on benchmarking code which was already benchmarked by others.

On 12/30/2014 08:05 PM, Johan Tibell wrote:

We should check that `filterM . toList` isn't as fast. That was the case

...
for bytestring and we rejected adding filterM there for that reason.

On Mon, Dec 29, 2014 at 6:13 AM, Edward Kmett > wrote:

+1 to just generalizing filterM in Data.Sequence

On Sun, Dec 28, 2014 at 12:22 AM, David Feuer mailto:david.feuer@gmail.com> wrote:

This can be given exactly the same implementation as the one for lists:

filterM :: (Applicative f) => (a -> f Bool) -> Seq a -> f (Seq a) filterM p = foldr go (pure empty) where go x r = f <0.3927809241601645gt; p x <*> r where f flg ys = if flg then x <| ys else ys

Bikeshed all you like over the name. _______________________________________________ Libraries mailing list Libraries@haskell.org mailto:Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

_______________________________________________ Libraries mailing list Libraries@haskell.org mailto:Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

David Feuer

11:51 p.m.

On Tue, Dec 30, 2014 at 1:05 PM, Johan Tibell wrote:

...

The obvious downside is the explosion of functions in the API, which is even worse due to their already being lazy and strict versions of most higher-order function (i.e. now we'd have to have 2*2 versions of every function). This seems like a failure in composability and abstraction. Until we've figured out a way to deal with this general issue that doesn't involve duplicating tons of code and swelling the API, I've been pushing back on changes like this.

1. I think it's better to duplicate code in a library than to make its users duplicate the code themselves. 2. One option would be to make Data.Sequence.Private, exposing all the types and various internals as well as defining all the necessary instances, and then to add Data.Sequence.This, Data.Sequence.That, etc.

Johan Tibell

31 Dec 31 Dec

12:57 a.m.

On Tue, Dec 30, 2014 at 1:21 PM, David Feuer wrote:

...

On Tue, Dec 30, 2014 at 1:05 PM, Johan Tibell wrote:

...
The obvious downside is the explosion of functions in the API, which is even worse due to their already being lazy and strict versions of most higher-order function (i.e. now we'd have to have 2*2 versions of every function). This seems like a failure in composability and abstraction. Until we've figured out a way to deal with this general issue that doesn't involve duplicating tons of code and swelling the API, I've been pushing back on changes like this.

1. I think it's better to duplicate code in a library than to make its users duplicate the code themselves.

I tend to disagree, especially if we're not talking about duplicating filterMSeq f = fmap fromList . filterM f . toList but the whole implementation of Seq.filter (which would need to if you want an optimal implementation of filterMSeq).

...

2. One option would be to make Data.Sequence.Private, exposing all the types and various internals as well as defining all the necessary instances, and then to add Data.Sequence.This, Data.Sequence.That, etc.

I don't want to expose the internals (we don't do that for other containers types) because we wouldn't, in practice, be able to change the implementation if we wanted to.

Kim-Ee Yeoh

1:06 a.m.

On Wed, Dec 31, 2014 at 1:05 AM, Johan Tibell wrote:

...

As one of the maintainers of the containers package these are the kind of issues I have to consider*.

I share these concerns too with Johan, especially about code duplication and package maintenance. Kudos to him for stewarding it so successfully all along for I and countless others appreciate his work. And I think it's unfair to push work around like this. What happens is open source fatigue and you find your best contributors leaving. Traditionally, if you found some missing function, you would roll your own package and upload it for the benefit of everyone. The proper resolution of this proposal is Data.MissingSeq.filterM. -- Kim-Ee

David Feuer

30 Dec 30 Dec

11:45 p.m.

On Tue, Dec 30, 2014 at 12:05 PM, Johan Tibell wrote:

...

We should check that `filterM . toList` isn't as fast. That was the case for bytestring and we rejected adding filterM there for that reason.

Let me expand that definition to what it really looks like: filterMSeq f = fmap fromList . filterM f . toList Yes, you can do this. Yes, it's probably even pretty efficient. But it's a bit painful to have to expand things out like this just to switch a bit of code from lists to sequences.

Milan Straka

31 Dec 31 Dec

4:11 a.m.

Hi all,

...

-----Original message----- From: David Feuer Sent: 30 Dec 2014, 13:15

On Tue, Dec 30, 2014 at 12:05 PM, Johan Tibell wrote:

...
We should check that `filterM . toList` isn't as fast. That was the case for bytestring and we rejected adding filterM there for that reason.

Let me expand that definition to what it really looks like:

filterMSeq f = fmap fromList . filterM f . toList

Yes, you can do this. Yes, it's probably even pretty efficient. But it's a bit painful to have to expand things out like this just to switch a bit of code from lists to sequences.

the problem with this is that there are really many functions that "would be useful", but we do not really want to add them all. A general rule I have been using is "can this be implemented straightforwardly using existing methods without hurting performance much"? So I would like to see the numbers, please. Also, filterM is not yet settled in base, so we should wait some time until it is. Cheers, Milan PS: It is a shame we cannot reuse Foldable and/or Traversable for filtering. Maybe we should add a new Filterable class?

David Feuer

4:33 a.m.

There won't be any such numbers in this case, because access to the details of the sequence structure does not seem to help implement this function. The same is true of the following, all of which already exist: replicate, replicateM, iterateN -- all are implemented using replicateA unfoldr, unfoldl --implemented using <| and |> null -- can be written just as well using null xs = case viewl xs of {EmptyL -> True; _ -> False} scanl, scanr, scanl1, scanr1 -- defined ultimately in terms of traverse spanl, spanr, breakl, breakr, takeWhile, dropWhile -- defined in terms of splitAt findIndicesL, findIndicesR -- defined in terms of foldlWithIndex, foldrWithIndex take, drop -- previously defined using splitAt; currently defined using splitAt', but those two functions are likely to merge filter -- uses foldl elemIndexL, elemIndicesL, elemIndexR, elemIndicesR, findIndexL, findIndexR -- all defined using findIndicesL and findIndicesR update -- uses adjust zip, zipWith3, zip3, zipWith4, zip4 -- All defined from zipWith I think almost all of these functions are good to have around (although the elemIndex family seems a bit silly). Adding a few more to round things out seems reasonable to me as well. On Tue, Dec 30, 2014 at 5:41 PM, Milan Straka wrote:

...

Hi all,

...
-----Original message----- From: David Feuer Sent: 30 Dec 2014, 13:15

On Tue, Dec 30, 2014 at 12:05 PM, Johan Tibell wrote:

...
We should check that `filterM . toList` isn't as fast. That was the case for bytestring and we rejected adding filterM there for that reason.

Let me expand that definition to what it really looks like:

filterMSeq f = fmap fromList . filterM f . toList

Yes, you can do this. Yes, it's probably even pretty efficient. But it's a bit painful to have to expand things out like this just to switch a bit of code from lists to sequences.

the problem with this is that there are really many functions that "would be useful", but we do not really want to add them all.

A general rule I have been using is "can this be implemented straightforwardly using existing methods without hurting performance much"? So I would like to see the numbers, please.

Also, filterM is not yet settled in base, so we should wait some time until it is.

Cheers, Milan

PS: It is a shame we cannot reuse Foldable and/or Traversable for filtering. Maybe we should add a new Filterable class?

Milan Straka

4:52 a.m.

Hi all,

...

-----Original message----- From: David Feuer Sent: 30 Dec 2014, 18:03

There won't be any such numbers in this case, because access to the details of the sequence structure does not seem to help implement this function.

sorry for not making clear -- I was interested in difference between the mentioned | filterMSeq f = fmap fromList . filterM f . toList and your foldr-based implementation. I can imagine that one can use filterMSeq in place instead of filterM, so that is why I am interested between filterMSeq and the specialized implementation (which is just a plain foldr + <|).

...

The same is true of the following, all of which already exist:

replicate, replicateM, iterateN -- all are implemented using replicateA unfoldr, unfoldl --implemented using <| and |> null -- can be written just as well using null xs = case viewl xs of {EmptyL -> True; _ -> False} scanl, scanr, scanl1, scanr1 -- defined ultimately in terms of traverse spanl, spanr, breakl, breakr, takeWhile, dropWhile -- defined in terms of splitAt findIndicesL, findIndicesR -- defined in terms of foldlWithIndex, foldrWithIndex take, drop -- previously defined using splitAt; currently defined using splitAt', but those two functions are likely to merge filter -- uses foldl elemIndexL, elemIndicesL, elemIndexR, elemIndicesR, findIndexL, findIndexR -- all defined using findIndicesL and findIndicesR update -- uses adjust zip, zipWith3, zip3, zipWith4, zip4 -- All defined from zipWith

I think almost all of these functions are good to have around (although the elemIndex family seems a bit silly). Adding a few more to round things out seems reasonable to me as well.

true, that is why it is a general rule (to not add functions which can be implemented using others), not an absolute rule. The question is whether filterM is as "basic" method used frequently enough (for example, no one obviously questions zip). Note that it is not obvious what type does filterM has (unlike e.g. zip), now that we have Applicative and also filterM is changing in base. From this point of view, it makes sense to use fmap fromList . filterM f . toList because then we do not have inconsistent filterM-s. Cheers, Milan

...

On Tue, Dec 30, 2014 at 5:41 PM, Milan Straka wrote:

...
Hi all,

...
-----Original message----- From: David Feuer Sent: 30 Dec 2014, 13:15

On Tue, Dec 30, 2014 at 12:05 PM, Johan Tibell wrote:

...
We should check that `filterM . toList` isn't as fast. That was the case for bytestring and we rejected adding filterM there for that reason.

Let me expand that definition to what it really looks like:

filterMSeq f = fmap fromList . filterM f . toList

Yes, you can do this. Yes, it's probably even pretty efficient. But it's a bit painful to have to expand things out like this just to switch a bit of code from lists to sequences.

the problem with this is that there are really many functions that "would be useful", but we do not really want to add them all.

A general rule I have been using is "can this be implemented straightforwardly using existing methods without hurting performance much"? So I would like to see the numbers, please.

Also, filterM is not yet settled in base, so we should wait some time until it is.

Cheers, Milan

PS: It is a shame we cannot reuse Foldable and/or Traversable for filtering. Maybe we should add a new Filterable class?

David Feuer

5:22 a.m.

On Tue, Dec 30, 2014 at 6:22 PM, Milan Straka wrote:

...

Hi all,

...
-----Original message----- From: David Feuer Sent: 30 Dec 2014, 18:03

There won't be any such numbers in this case, because access to the details of the sequence structure does not seem to help implement this function.

sorry for not making clear -- I was interested in difference between the mentioned | filterMSeq f = fmap fromList . filterM f . toList and your foldr-based implementation. I can imagine that one can use filterMSeq in place instead of filterM, so that is why I am interested between filterMSeq and the specialized implementation (which is just a plain foldr + <|).

Oh, I really don't know which is better (and that's true also for unfoldr and unfoldl--is the current implementation better or worse than using fromList?).

...

Note that it is not obvious what type does filterM has (unlike e.g. zip), now that we have Applicative and also filterM is changing in base. From this point of view, it makes sense to use fmap fromList . filterM f . toList because then we do not have inconsistent filterM-s.

I have no problem with having filterM in Data.Sequence be more general than filterM in Control.Monad for old versions. I think both unfold(r/l) and filterM actually make a potential point in my favor—if there are multiple implementations that seem reasonable, having whichever turns out to be fastest in the library will save users time benchmarking them. David

Bardur Arantsson

5:55 a.m.

On 2014-12-30 23:41, Milan Straka wrote:

...

Hi all,

...
-----Original message----- From: David Feuer Sent: 30 Dec 2014, 13:15

On Tue, Dec 30, 2014 at 12:05 PM, Johan Tibell wrote:

...
We should check that `filterM . toList` isn't as fast. That was the case for bytestring and we rejected adding filterM there for that reason.

Let me expand that definition to what it really looks like:

filterMSeq f = fmap fromList . filterM f . toList

Yes, you can do this. Yes, it's probably even pretty efficient. But it's a bit painful to have to expand things out like this just to switch a bit of code from lists to sequences.

the problem with this is that there are really many functions that "would be useful", but we do not really want to add them all.

A general rule I have been using is "can this be implemented straightforwardly using existing methods without hurting performance much"? So I would like to see the numbers, please.

Well, there's an opposing point called "usability". If there are already "obvious" functions to use in other data structure modules then they should *probably* also be represented in the other data structure modules. (I don't claim that this is necessarily a "trump", but I think it bears consideration.) Regards,

4032

Age (days ago)

4035

Last active (days ago)

List overview

Download

14 comments

8 participants

participants (8)

Andreas Abel
Artyom
Bardur Arantsson
David Feuer
Edward Kmett
Johan Tibell
Kim-Ee Yeoh
Milan Straka