Proposal: Add a splitBy / splitOn in Data.List - Libraries - Haskell.org

newer
Proposal: give Ptr a nominal role

Proposal: Add a splitBy / splitOn in Data.List

older
Proposal: Move...

Saurabh Nanda

2 Nov 2018 2 Nov '18

1:33 a.m.

This has certainly been discussed before. A quick Google search turned up the following past discussions: - https://mail.haskell.org/pipermail/libraries/2006-July/005494.html - https://mail.haskell.org/pipermail/libraries/2012-July/018228.html Is there anything blocking this discussion & implementation? Anything that can be done to unblock it? -- Saurabh.

Attachments:

attachment.html (text/html — 677 bytes)

Reply

Sign in to reply online Use email software

Show replies by date

Edward Kmett

2 Nov 2 Nov

1:51 p.m.

The main thing that prevented it from going into base is the number of subtleties about what precisely it means to properly "split" something. Most languages make fairly arbitrary calls on topics such as: * Do you split on list elements (e.g. ',') or list of elements, so you can multi-character delimiters ", "? What about multiple types of thing that are all delimiters, e.g. any whitespace character? * What do you do with the delimiters? * What happens with runs of delimiters? * What about initial or final runs of delimiters (e.g. leading spaces)? The end result was that a split package was written by Brent Yorgey back in 2008 or so that rather comprehensively covers the design space, and it was incorporated into the Haskell Platform. http://hackage.haskell.org/package/split-0.2.3.3/docs/Data-List-Split.html -Edward On Thu, Nov 1, 2018 at 1:34 PM Saurabh Nanda <saurabhnanda@gmail.com> wrote:

This has certainly been discussed before. A quick Google search turned up the following past discussions:

- https://mail.haskell.org/pipermail/libraries/2006-July/005494.html - https://mail.haskell.org/pipermail/libraries/2012-July/018228.html

Is there anything blocking this discussion & implementation? Anything that can be done to unblock it?

-- Saurabh.

_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

Reply

Sign in to reply online Use email software

Elliot Cameron

7:38 p.m.

Despite these subtleties, I must confess I've often wanted to whip up a quick script and been frustrated that these functions are missing from base. For example using Haskell as a sed/awk alternative can be pleasant *if* the functions you need are in base. What's more, in many years I've only really wanted one or two versions of this. What if we added the most flexible of versions and included only that? This version would accept multicharacter delimiters, always throw them away, and always produce a new entry in the result for every occurrence of the delimiter. If you don't want the empty entries, you can filter. If you don't want leading, you can dropWhile. If you want the delimiters back, you can map. This seems like a nice trade-off for just being available in base. On Fri, Nov 2, 2018, 1:51 AM Edward Kmett <ekmett@gmail.com wrote:

The main thing that prevented it from going into base is the number of subtleties about what precisely it means to properly "split" something.

Most languages make fairly arbitrary calls on topics such as:

* Do you split on list elements (e.g. ',') or list of elements, so you can multi-character delimiters ", "? What about multiple types of thing that are all delimiters, e.g. any whitespace character? * What do you do with the delimiters? * What happens with runs of delimiters? * What about initial or final runs of delimiters (e.g. leading spaces)?

The end result was that a split package was written by Brent Yorgey back in 2008 or so that rather comprehensively covers the design space, and it was incorporated into the Haskell Platform.

http://hackage.haskell.org/package/split-0.2.3.3/docs/Data-List-Split.html

-Edward

On Thu, Nov 1, 2018 at 1:34 PM Saurabh Nanda <saurabhnanda@gmail.com> wrote:

...
This has certainly been discussed before. A quick Google search turned up the following past discussions:

- https://mail.haskell.org/pipermail/libraries/2006-July/005494.html - https://mail.haskell.org/pipermail/libraries/2012-July/018228.html

Is there anything blocking this discussion & implementation? Anything that can be done to unblock it?

-- Saurabh.

_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

Reply

Sign in to reply online Use email software

Vanessa McHale

8:01 p.m.

cabal now has the ability to be used for [scripting](https://github.com/haskell/cabal/pull/5483#issuecomment-409633079) which I think addresses your use case (at least, it's easier than forking base...). On 11/2/18 6:38 AM, Elliot Cameron wrote:

Despite these subtleties, I must confess I've often wanted to whip up a quick script and been frustrated that these functions are missing from base. For example using Haskell as a sed/awk alternative can be pleasant *if* the functions you need are in base. What's more, in many years I've only really wanted one or two versions of this.

What if we added the most flexible of versions and included only that? This version would accept multicharacter delimiters, always throw them away, and always produce a new entry in the result for every occurrence of the delimiter. If you don't want the empty entries, you can filter. If you don't want leading, you can dropWhile. If you want the delimiters back, you can map. This seems like a nice trade-off for just being available in base.

On Fri, Nov 2, 2018, 1:51 AM Edward Kmett <ekmett@gmail.com <mailto:ekmett@gmail.com> wrote:

The main thing that prevented it from going into base is the number of subtleties about what precisely it means to properly "split" something.

Most languages make fairly arbitrary calls on topics such as:

* Do you split on list elements (e.g. ',') or list of elements, so you can multi-character delimiters ", "? What about multiple types of thing that are all delimiters, e.g. any whitespace character? * What do you do with the delimiters? * What happens with runs of delimiters? * What about initial or final runs of delimiters (e.g. leading spaces)?

The end result was that a split package was written by Brent Yorgey back in 2008 or so that rather comprehensively covers the design space, and it was incorporated into the Haskell Platform.

http://hackage.haskell.org/package/split-0.2.3.3/docs/Data-List-Split.html

-Edward

On Thu, Nov 1, 2018 at 1:34 PM Saurabh Nanda <saurabhnanda@gmail.com <mailto:saurabhnanda@gmail.com>> wrote:

This has certainly been discussed before. A quick Google search turned up the following past discussions:

* https://mail.haskell.org/pipermail/libraries/2006-July/005494.html * https://mail.haskell.org/pipermail/libraries/2012-July/018228.html

Is there anything blocking this discussion & implementation? Anything that can be done to unblock it?

-- Saurabh.

_______________________________________________ Libraries mailing list Libraries@haskell.org <mailto:Libraries@haskell.org> http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

_______________________________________________ Libraries mailing list Libraries@haskell.org <mailto:Libraries@haskell.org> http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries --

*Vanessa McHale* Functional Compiler Engineer | Chicago, IL Website: www.iohk.io <http://iohk.io> Twitter: @vamchale PGP Key ID: 4209B7B5 Input Output <http://iohk.io> Twitter <https://twitter.com/InputOutputHK> Github <https://github.com/input-output-hk> LinkedIn <https://www.linkedin.com/company/input-output-global> This e-mail and any file transmitted with it are confidential and intended solely for the use of the recipient(s) to whom it is addressed. Dissemination, distribution, and/or copying of the transmission by anyone other than the intended recipient(s) is prohibited. If you have received this transmission in error please notify IOHK immediately and delete it from your system. E-mail transmissions cannot be guaranteed to be secure or error free. We do not accept liability for any loss, damage, or error arising from this transmission

Reply

Sign in to reply online Use email software

Theodore Lief Gannon

8:17 p.m.

If you accept more than one delimiter but drop them, you've lost info about which one caused each break and can't map them back. It's more generic to keep them, since you can still filter. On Fri, Nov 2, 2018, 4:39 AM Elliot Cameron <eacameron@gmail.com wrote:

Despite these subtleties, I must confess I've often wanted to whip up a quick script and been frustrated that these functions are missing from base. For example using Haskell as a sed/awk alternative can be pleasant *if* the functions you need are in base. What's more, in many years I've only really wanted one or two versions of this.

What if we added the most flexible of versions and included only that? This version would accept multicharacter delimiters, always throw them away, and always produce a new entry in the result for every occurrence of the delimiter. If you don't want the empty entries, you can filter. If you don't want leading, you can dropWhile. If you want the delimiters back, you can map. This seems like a nice trade-off for just being available in base.

On Fri, Nov 2, 2018, 1:51 AM Edward Kmett <ekmett@gmail.com wrote:

...
The main thing that prevented it from going into base is the number of subtleties about what precisely it means to properly "split" something.

Most languages make fairly arbitrary calls on topics such as:

* Do you split on list elements (e.g. ',') or list of elements, so you can multi-character delimiters ", "? What about multiple types of thing that are all delimiters, e.g. any whitespace character? * What do you do with the delimiters? * What happens with runs of delimiters? * What about initial or final runs of delimiters (e.g. leading spaces)?

The end result was that a split package was written by Brent Yorgey back in 2008 or so that rather comprehensively covers the design space, and it was incorporated into the Haskell Platform.

http://hackage.haskell.org/package/split-0.2.3.3/docs/Data-List-Split.html

-Edward

On Thu, Nov 1, 2018 at 1:34 PM Saurabh Nanda <saurabhnanda@gmail.com> wrote:

...
This has certainly been discussed before. A quick Google search turned up the following past discussions:

- https://mail.haskell.org/pipermail/libraries/2006-July/005494.html - https://mail.haskell.org/pipermail/libraries/2012-July/018228.html

Is there anything blocking this discussion & implementation? Anything that can be done to unblock it?

-- Saurabh.

_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

Reply

Sign in to reply online Use email software

Elliot Cameron

8:55 p.m.

I didn't realize cabal now supported scripting. I suppose that addresses a large number of my use cases for having this. I didn't mean choosing different delimiters but only a single multielement delimiter, albeit that is also not flexible. If we also had a multicharacter replace function then a single-element split would be more tolerable. I'm still in favor of providing one or two of the most common, most flexible versions of this just to help newcomers from other languages that have these functions in their standard libraries, but my opinion is not very strongly held. On Fri, Nov 2, 2018, 8:18 AM Theodore Lief Gannon <tanuki@gmail.com wrote:

If you accept more than one delimiter but drop them, you've lost info about which one caused each break and can't map them back. It's more generic to keep them, since you can still filter.

On Fri, Nov 2, 2018, 4:39 AM Elliot Cameron <eacameron@gmail.com wrote:

...
Despite these subtleties, I must confess I've often wanted to whip up a quick script and been frustrated that these functions are missing from base. For example using Haskell as a sed/awk alternative can be pleasant *if* the functions you need are in base. What's more, in many years I've only really wanted one or two versions of this.

What if we added the most flexible of versions and included only that? This version would accept multicharacter delimiters, always throw them away, and always produce a new entry in the result for every occurrence of the delimiter. If you don't want the empty entries, you can filter. If you don't want leading, you can dropWhile. If you want the delimiters back, you can map. This seems like a nice trade-off for just being available in base.

On Fri, Nov 2, 2018, 1:51 AM Edward Kmett <ekmett@gmail.com wrote:

...
The main thing that prevented it from going into base is the number of subtleties about what precisely it means to properly "split" something.

Most languages make fairly arbitrary calls on topics such as:

* Do you split on list elements (e.g. ',') or list of elements, so you can multi-character delimiters ", "? What about multiple types of thing that are all delimiters, e.g. any whitespace character? * What do you do with the delimiters? * What happens with runs of delimiters? * What about initial or final runs of delimiters (e.g. leading spaces)?

The end result was that a split package was written by Brent Yorgey back in 2008 or so that rather comprehensively covers the design space, and it was incorporated into the Haskell Platform.

http://hackage.haskell.org/package/split-0.2.3.3/docs/Data-List-Split.html

-Edward

On Thu, Nov 1, 2018 at 1:34 PM Saurabh Nanda <saurabhnanda@gmail.com> wrote:

...
This has certainly been discussed before. A quick Google search turned up the following past discussions:

- https://mail.haskell.org/pipermail/libraries/2006-July/005494.html - https://mail.haskell.org/pipermail/libraries/2012-July/018228.html

Is there anything blocking this discussion & implementation? Anything that can be done to unblock it?

-- Saurabh.

_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

Reply

Sign in to reply online Use email software

Dan Burton

10:43 p.m.

What about just adding Data.List.Split to base? -- Dan Burton On Fri, Nov 2, 2018 at 8:56 AM Elliot Cameron <eacameron@gmail.com> wrote:

I didn't realize cabal now supported scripting. I suppose that addresses a large number of my use cases for having this.

I didn't mean choosing different delimiters but only a single multielement delimiter, albeit that is also not flexible. If we also had a multicharacter replace function then a single-element split would be more tolerable.

I'm still in favor of providing one or two of the most common, most flexible versions of this just to help newcomers from other languages that have these functions in their standard libraries, but my opinion is not very strongly held.

On Fri, Nov 2, 2018, 8:18 AM Theodore Lief Gannon <tanuki@gmail.com wrote:

...
If you accept more than one delimiter but drop them, you've lost info about which one caused each break and can't map them back. It's more generic to keep them, since you can still filter.

On Fri, Nov 2, 2018, 4:39 AM Elliot Cameron <eacameron@gmail.com wrote:

...
Despite these subtleties, I must confess I've often wanted to whip up a quick script and been frustrated that these functions are missing from base. For example using Haskell as a sed/awk alternative can be pleasant *if* the functions you need are in base. What's more, in many years I've only really wanted one or two versions of this.

What if we added the most flexible of versions and included only that? This version would accept multicharacter delimiters, always throw them away, and always produce a new entry in the result for every occurrence of the delimiter. If you don't want the empty entries, you can filter. If you don't want leading, you can dropWhile. If you want the delimiters back, you can map. This seems like a nice trade-off for just being available in base.

On Fri, Nov 2, 2018, 1:51 AM Edward Kmett <ekmett@gmail.com wrote:

...
The main thing that prevented it from going into base is the number of subtleties about what precisely it means to properly "split" something.

Most languages make fairly arbitrary calls on topics such as:

* Do you split on list elements (e.g. ',') or list of elements, so you can multi-character delimiters ", "? What about multiple types of thing that are all delimiters, e.g. any whitespace character? * What do you do with the delimiters? * What happens with runs of delimiters? * What about initial or final runs of delimiters (e.g. leading spaces)?

The end result was that a split package was written by Brent Yorgey back in 2008 or so that rather comprehensively covers the design space, and it was incorporated into the Haskell Platform.

http://hackage.haskell.org/package/split-0.2.3.3/docs/Data-List-Split.html

-Edward

On Thu, Nov 1, 2018 at 1:34 PM Saurabh Nanda <saurabhnanda@gmail.com> wrote:

...
This has certainly been discussed before. A quick Google search turned up the following past discussions:

- https://mail.haskell.org/pipermail/libraries/2006-July/005494.html - https://mail.haskell.org/pipermail/libraries/2012-July/018228.html

Is there anything blocking this discussion & implementation? Anything that can be done to unblock it?

-- Saurabh.

_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

Reply

Sign in to reply online Use email software

Henning Thielemann

10:47 p.m.

On Fri, 2 Nov 2018, Dan Burton wrote:

What about just adding Data.List.Split to base?

... and then splitting 'base'? :-)

Reply

Sign in to reply online Use email software

Elliot Cameron

10:49 p.m.

Ah in the context of splitting base this seems like a backward move. The solution must really be to have tooling that can pull in libraries with minimal friction. On Fri, Nov 2, 2018 at 10:47 AM Henning Thielemann < lemming@henning-thielemann.de> wrote:

On Fri, 2 Nov 2018, Dan Burton wrote:

...
What about just adding Data.List.Split to base?

... and then splitting 'base'? :-)

Reply

Sign in to reply online Use email software

Dan Burton

11:42 p.m.

If and when base is split, then just include Data.List.Split with whatever package all the other List stuff gets put in. My point is, this module should live in the same package where the other list functions live. I'm in favor of splitting base, but things should not be so broken up to the extreme of having a package just for left-pad. It is possible to find middle ground. -- Dan Burton On Fri, Nov 2, 2018 at 10:49 AM Elliot Cameron <eacameron@gmail.com> wrote:

Ah in the context of splitting base this seems like a backward move. The solution must really be to have tooling that can pull in libraries with minimal friction.

On Fri, Nov 2, 2018 at 10:47 AM Henning Thielemann < lemming@henning-thielemann.de> wrote:

...
On Fri, 2 Nov 2018, Dan Burton wrote:

...
What about just adding Data.List.Split to base?

... and then splitting 'base'? :-)

Reply

Sign in to reply online Use email software

Bryan Richter

3 Nov 3 Nov

10:48 p.m.

+1 to adding a single function that splits a list by a multi-element delimiter, e.g. the hypothetical

...
...
*Data.List.split [a, b] [c, a, b, d, a, b, e, a] [[c], [d], [e, a]]

The split package seems to heavyweight for base (I know I'd always have to look up the differences between splitOn, split, chop, and divvy), and more sophisticated needs should probably be filled by a special-purpose parser. I would even say it might make sense to just restrict the function to Strings, unless there is widespread need for supporting Lists in general. On Fri, Nov 2, 2018 at 4:42 PM Dan Burton <danburton.email@gmail.com> wrote:

If and when base is split, then just include Data.List.Split with whatever package all the other List stuff gets put in. My point is, this module should live in the same package where the other list functions live.

I'm in favor of splitting base, but things should not be so broken up to the extreme of having a package just for left-pad. It is possible to find middle ground.

-- Dan Burton

On Fri, Nov 2, 2018 at 10:49 AM Elliot Cameron <eacameron@gmail.com> wrote:

...
Ah in the context of splitting base this seems like a backward move. The solution must really be to have tooling that can pull in libraries with minimal friction.

On Fri, Nov 2, 2018 at 10:47 AM Henning Thielemann < lemming@henning-thielemann.de> wrote:

...
On Fri, 2 Nov 2018, Dan Burton wrote:

...
What about just adding Data.List.Split to base?

... and then splitting 'base'? :-)

_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

Reply

Sign in to reply online Use email software

2827

Age (days ago)

2829

Last active (days ago)

Download

10 comments

8 participants

tags

participants (8)

Bryan Richter
Dan Burton
Edward Kmett
Elliot Cameron
Henning Thielemann
Saurabh Nanda
Theodore Lief Gannon
Vanessa McHale