Proposal: adding 'tailDropWhile' to Data.List

older
Haskell Platform 2011.4.0.0 in...

Kazu Yamamoto

28 Sep 2011 28 Sep '11

9:19 a.m.

Hello, Many languages provide the 'chop' or 'trip' function but Data.List does not. I would like to add a new function called 'tailDropWhile' so that people can easily implement 'chop' for String: chop :: String -> String chop = tailDropWhile isSpace The definition of tailDropWhile is as follows: {- wren ng thornton's version. This is lazier than Aoe's one. Out and inP implement push-down automata. -} tailDropWhile :: (a -> Bool) -> [a] -> [a] tailDropWhile p = out where out [] = [] out (x:xs) | p x = inP [x] xs | otherwise = x : out xs inP _ [] = [] inP ss (x:xs) | p x = inP (x:ss) xs | otherwise = reverse ss ++ x : out xs {- Mitsutoshi Aoe's version. This is faster is many cases but more strict. Just for reference. tailDropWhile :: (a -> Bool) -> [a] -> [a] tailDropWhile p = foldr go [] where go x xs | p x && null xs = [] | otherwise = x:xs -} For more information, please read: http://www.mail-archive.com/haskell-cafe@haskell.org/msg93192.html Discussion period: 2 weeks. Regards, --Kazu

Show replies by date

Ivan Lazar Miljenovic

28 Sep 28 Sep

9:35 a.m.

On 28 September 2011 19:19, Kazu Yamamoto wrote:

...

Hello,

Many languages provide the 'chop' or 'trip' function but Data.List does not. I would like to add a new function called 'tailDropWhile' so that people can easily implement 'chop' for String:

I don't think that "chop" is a very good name for this kind of function; by the name, I would think more a function of type Int -> [a] -> [[a]] that "chops up" a list into sub-lists of size n. "strip" (which I presume is what you meant as your second suggestion) is a bit better... That said, I think there's a good reason not to add this: if you're doing stuff at the end of a list, then you're probably doing something wrong. For this kind of textual task, you _really_ should be using text anyway, and this function is already implemented for you: http://hackage.haskell.org/packages/archive/text/0.11.1.5/doc/html/Data-Text...

...

chop :: String -> String chop = tailDropWhile isSpace

The definition of tailDropWhile is as follows:

{- wren ng thornton's version. This is lazier than Aoe's one. Out and inP implement push-down automata. -} tailDropWhile :: (a -> Bool) -> [a] -> [a] tailDropWhile p = out where out [] = [] out (x:xs) | p x = inP [x] xs | otherwise = x : out xs inP _ [] = [] inP ss (x:xs) | p x = inP (x:ss) xs | otherwise = reverse ss ++ x : out xs

{- Mitsutoshi Aoe's version. This is faster is many cases but more strict. Just for reference. tailDropWhile :: (a -> Bool) -> [a] -> [a] tailDropWhile p = foldr go [] where go x xs | p x && null xs = [] | otherwise = x:xs -}

For more information, please read: http://www.mail-archive.com/haskell-cafe@haskell.org/msg93192.html

Discussion period: 2 weeks.

-1 -- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com IvanMiljenovic.wordpress.com

Kazu Yamamoto

9:44 a.m.

Hello Ivan,

...

I don't think that "chop" is a very good name for this kind of function; by the name, I would think more a function of type Int -> [a] -> [[a]] that "chops up" a list into sub-lists of size n. "strip" (which I presume is what you meant as your second suggestion) is a bit better...

I didn't propose to add 'chop'. I proposed to add 'tailDropWhile'.

...

For this kind of textual task, you _really_ should be using text anyway, and this function is already implemented for you: http://hackage.haskell.org/packages/archive/text/0.11.1.5/doc/html/Data-Text...

For real use, you are right. But to teach list programming to beginners, I think this is necessary. For instance, I would explain how to process String without regular expressions. --Kazu

Ivan Lazar Miljenovic

9:55 a.m.

On 28 September 2011 19:44, Kazu Yamamoto wrote:

...

I didn't propose to add 'chop'. I proposed to add 'tailDropWhile'.

I understood that you wanting to be able to define `chop' was the motivation behind this proposal.

...

For real use, you are right. But to teach list programming to beginners, I think this is necessary. For instance, I would explain how to process String without regular expressions.

I still argue that this is a bad idea. To an extent, functional programmers tend to focus too much on lists as the be-all and end-all of data structures. Whilst lists are great at what they do and generally serve as a good "intermediary" data structure (e.g. converting a Map k a to a Seq (k,a) ), we should not be abusing them to do _everything_. In my opinion, teaching students to do a large amount of "fiddling" with the ends of lists is getting them used to bad habits. If you _really_ need to do a lot of stuff at the end of a list-like data structure, use a Seq or something. If you still want to use this to teach students (maybe because you're building them up to "but in general you shouldn't do this") then provide this function for their use elsewhere (either as something to copy/paste, or an auxiliary library for your class). But I don't think it belongs in Data.List. -- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com IvanMiljenovic.wordpress.com

Kazu Yamamoto

10:14 a.m.

...

In my opinion, teaching students to do a large amount of "fiddling" with the ends of lists is getting them used to bad habits. If you _really_ need to do a lot of stuff at the end of a list-like data structure, use a Seq or something.

My tutorial introduced trees and combinators also. ;-) Also, I would point out that even "Real World Haskell" starts with a list.

...

If you still want to use this to teach students (maybe because you're building them up to "but in general you shouldn't do this") then provide this function for their use elsewhere (either as something to copy/paste, or an auxiliary library for your class). But I don't think it belongs in Data.List.

So, what about 'reverse'? Both 'reverse' and 'tailDropWhile' are inefficient. Data.List has 'reverse'. Why not 'tailDropWhile'? P.S. I don't think students who do not understand lists can understand trees. --Kazu

Joachim Breitner

10:58 a.m.

Hi, Am Mittwoch, den 28.09.2011, 19:14 +0900 schrieb Kazu Yamamoto:

...

...
If you still want to use this to teach students (maybe because you're building them up to "but in general you shouldn't do this") then provide this function for their use elsewhere (either as something to copy/paste, or an auxiliary library for your class). But I don't think it belongs in Data.List.

So, what about 'reverse'?

Both 'reverse' and 'tailDropWhile' are inefficient. Data.List has 'reverse'. Why not 'tailDropWhile'?

also, it’s performance is not too bad, and definitely not as bad as reverse: If tailDropWhile is implemented in a way that allows list fusion (should be possible, I think), and I know that the suffix is not large (e.g. stripping at most trailing "\n"), then tailDropWhile should be ok to use. Greetings, Joachim -- Joachim "nomeata" Breitner mail@joachim-breitner.de | nomeata@debian.org | GPG: 0x4743206C xmpp: nomeata@joachim-breitner.de | http://www.joachim-breitner.de/

Maciej Marcin Piechotka

9:22 p.m.

On Wed, 2011-09-28 at 12:58 +0200, Joachim Breitner wrote:

...

Hi,

Am Mittwoch, den 28.09.2011, 19:14 +0900 schrieb Kazu Yamamoto:

...
...
If you still want to use this to teach students (maybe because you're building them up to "but in general you shouldn't do this") then provide this function for their use elsewhere (either as something to copy/paste, or an auxiliary library for your class). But I don't think it belongs in Data.List.

So, what about 'reverse'?

Both 'reverse' and 'tailDropWhile' are inefficient. Data.List has 'reverse'. Why not 'tailDropWhile'?

also, it’s performance is not too bad, and definitely not as bad as reverse: If tailDropWhile is implemented in a way that allows list fusion (should be possible, I think), and I know that the suffix is not large (e.g. stripping at most trailing "\n"), then tailDropWhile should be ok to use.

Greetings, Joachim

Using stream-fusion package types [I know too little to know how the ghc uses stream-fusion - but I think it shows it is possible to implement this] (not tested): tailDropWhile :: (a -> Bool) -> Stream a -> Stream a tailDropWhile f (Stream next0 s01) = Stream next (Right s0, []) where next (Right s1, vs) = case next0 s1 of Done -> Done Skip s2 -> Skip (Right s2, vs) Yield v s2 | f v -> Skip (Right s2, v:vs) | otherwise -> case vs of [] -> Yield v (Right s2, []) (x:xs) -> Yield x (Left (s2, v), xs) next (Left (s1, v), []) = Yield v (Right s2) next (Left (s1, v), (x:xs)) = Yield v (Left (s1, v), xs) Regards

wren ng thornton

29 Sep 29 Sep

10:10 p.m.

On 9/28/11 6:58 AM, Joachim Breitner wrote:

...

also, it’s performance is not too bad, and definitely not as bad as reverse: If tailDropWhile is implemented in a way that allows list fusion (should be possible, I think), and I know that the suffix is not large (e.g. stripping at most trailing "\n"), then tailDropWhile should be ok to use.

Aoe's version uses foldr, so that'll give you a good consumer. The following version gives a good producer. The definition of rev is taken from GHC.List.reverse, but abstracting over cons and nil. Also we manually fuse away the (++) at the call site for rev, because it's trivial to do so. import GHC.Exts (build) tailDropWhile :: (a -> Bool) -> [a] -> [a] tailDropWhile p = \xs0 -> build (builder xs0) where builder xs0 cons nil = out xs0 where out [] = nil out (x:xs) | p x = inP [x] xs | otherwise = x `cons` out xs inP _ [] = nil inP ys (x:xs) | p x = inP (x:ys) xs | otherwise = rev ys (x `cons` out xs) rev [] zs = zs rev (y:ys) zs = rev ys (y `cons` zs) Providing both a good consumer and a good producer is trickier. If we could abstract over the `null` predicate alongside the cons and nil, then Aoe's version could be defined as a build over foldr: tailDropWhile p xs0 = build $ \cons nil -> let go x xs | p x && null xs = nil | otherwise = x `cons` xs in foldr go nil xs0 Though I'm not sure whether that'd still be a good consumer, as we desire. It ought to be, but we'd want to verify that GHC's rewrite rules still fire on it appropriately. A more transparent approach (which preserves laziness) would be to follow the route of stream-fusion: passing elements through one at a time, pausing and building up the buffer whenever the predicate holds, and flushing the buffer if the predicate fails before the end of list is reached. -- Live well, ~wren

Evan Laforge

28 Sep 28 Sep

4:11 p.m.

...

...
For real use, you are right. But to teach list programming to beginners, I think this is necessary. For instance, I would explain how to process String without regular expressions.

I still argue that this is a bad idea. To an extent, functional programmers tend to focus too much on lists as the be-all and end-all of data structures. Whilst lists are great at what they do and generally serve as a good "intermediary" data structure (e.g. converting a Map k a to a Seq (k,a) ), we should not be abusing them to do _everything_.

I don't think that's a realistic attitude at the moment. tailDropWhile was one of the first things I put in my personal library, along with various string functions like strip, split, and join, and I still use them all the time on strings. Certainly the performance sensitive parts of code use ByteString or Text, but the vast majority of strings are not in critical sections or are quite small and using Text everywhere would be awkward syntactically for no measurable gain. Also a lot of libraries only accept and produce Strings.

Evan Laforge

4:20 p.m.

On Wed, Sep 28, 2011 at 9:11 AM, Evan Laforge wrote:

...

...
...
For real use, you are right. But to teach list programming to beginners, I think this is necessary. For instance, I would explain how to process String without regular expressions.

I still argue that this is a bad idea. To an extent, functional programmers tend to focus too much on lists as the be-all and end-all of data structures. Whilst lists are great at what they do and generally serve as a good "intermediary" data structure (e.g. converting a Map k a to a Seq (k,a) ), we should not be abusing them to do _everything_.

Also, as an aside, I tend to use lists for everything at first and then convert to a more appropriate data structure later if profiling shows a performance problem. The thing is, several times I've switched to a theoretically more appropriate type (e.g. dlist for repeated appends, Sequence for adds and drops from both ends, AppendList for large appends on both ends) and performance got significantly worse. I don't fully understand why, but my theories are that lists have some nice properties, like 'xs++[]' can immediately reduce to 'xs', and their simplicity counts for a lot (e.g. an AppendList may have multiple pointers per element).

Roman Leshchinskiy

9:53 p.m.

On 28/09/2011, at 17:11, Evan Laforge wrote:

...

I don't think that's a realistic attitude at the moment. tailDropWhile was one of the first things I put in my personal library, along with various string functions like strip, split, and join, and I still use them all the time on strings. Certainly the performance sensitive parts of code use ByteString or Text, but the vast majority of strings are not in critical sections or are quite small and using Text everywhere would be awkward syntactically for no measurable gain. Also a lot of libraries only accept and produce Strings.

If performance isn't a concern, what's wrong with using reverse . dropWhile f . reverse? Another easy definition is this: foldr (\x xs -> if null xs && f x then [] else x:xs) [] Roman

Henning Thielemann

10:17 p.m.

On Wed, 28 Sep 2011, Roman Leshchinskiy wrote:

...

If performance isn't a concern, what's wrong with using reverse . dropWhile f . reverse? Another easy definition is this:

foldr (\x xs -> if null xs && f x then [] else x:xs) []

Ah, this is meant with tailDropWhile! I have it already in my utility package: http://hackage.haskell.org/packages/archive/utility-ht/0.0.7/doc/html/Data-L...

Evan Laforge

10:49 p.m.

...

If performance isn't a concern, what's wrong with using reverse . dropWhile f . reverse? Another easy definition is this:

foldr (\x xs -> if null xs && f x then [] else x:xs) []

Nothing wrong, in fact that's what I have bound to rDropWhile (the foldr line, not the double reverse line). Probably almost everyone who writes a significant amount of haskell has this function in their utility libraries.

David Virebayre

11:28 a.m.

2011/9/28 Ivan Lazar Miljenovic :

...

For this kind of textual task, you _really_ should be using text anyway, and this function is already implemented for you: http://hackage.haskell.org/packages/archive/text/0.11.1.5/doc/html/Data-Text...

Of course, until you work with GUIs (or any other library) that all use String (well at least wxHaskell does), and you have your code full of T.unpack and T.pack, yepeeeee ! David.

Ben Millwood

2:06 p.m.

On Wed, Sep 28, 2011 at 10:19 AM, Kazu Yamamoto wrote:

...

Hello,

Many languages provide the 'chop' or 'trip' function but Data.List does not. I would like to add a new function called 'tailDropWhile' so that people can easily implement 'chop' for String:

chop :: String -> String chop = tailDropWhile isSpace

The definition of tailDropWhile is as follows:

{- wren ng thornton's version. This is lazier than Aoe's one. Out and inP implement push-down automata. -} tailDropWhile :: (a -> Bool) -> [a] -> [a] tailDropWhile p = out where out [] = [] out (x:xs) | p x = inP [x] xs | otherwise = x : out xs inP _ [] = [] inP ss (x:xs) | p x = inP (x:ss) xs | otherwise = reverse ss ++ x : out xs

{- Mitsutoshi Aoe's version. This is faster is many cases but more strict. Just for reference. tailDropWhile :: (a -> Bool) -> [a] -> [a] tailDropWhile p = foldr go [] where go x xs | p x && null xs = [] | otherwise = x:xs -}

For more information, please read: http://www.mail-archive.com/haskell-cafe@haskell.org/msg93192.html

Discussion period: 2 weeks.

Regards,

--Kazu

_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

I'd implement it as: tailDropWhile :: (a -> Bool) -> [a] -> [a] tailDropWhile p xs = go id xs where go _ [] = [] go k (x:xs) | p x = go (k . (x:)) xs | otherwise = k (x : go id xs) which is as lazy as possible and doesn't involve any reversing. It's a function I've needed to use before, but I don't feel strongly about its inclusion particularly. While Ivan raises legitimate concerns, the version I've given above is a bit more streamy, and anyway many of us acknowledge the value of (!!) and related "bad" list functions in some circumstances.

Ben Millwood

29 Sep 29 Sep

2:31 a.m.

On Wed, Sep 28, 2011 at 3:06 PM, Ben Millwood wrote:

...

On Wed, Sep 28, 2011 at 10:19 AM, Kazu Yamamoto wrote:

...
Hello,

Many languages provide the 'chop' or 'trip' function but Data.List does not. I would like to add a new function called 'tailDropWhile' so that people can easily implement 'chop' for String:

chop :: String -> String chop = tailDropWhile isSpace

The definition of tailDropWhile is as follows:

{- wren ng thornton's version. This is lazier than Aoe's one. Out and inP implement push-down automata. -} tailDropWhile :: (a -> Bool) -> [a] -> [a] tailDropWhile p = out where out [] = [] out (x:xs) | p x = inP [x] xs | otherwise = x : out xs inP _ [] = [] inP ss (x:xs) | p x = inP (x:ss) xs | otherwise = reverse ss ++ x : out xs

{- Mitsutoshi Aoe's version. This is faster is many cases but more strict. Just for reference. tailDropWhile :: (a -> Bool) -> [a] -> [a] tailDropWhile p = foldr go [] where go x xs | p x && null xs = [] | otherwise = x:xs -}

For more information, please read: http://www.mail-archive.com/haskell-cafe@haskell.org/msg93192.html

Discussion period: 2 weeks.

Regards,

--Kazu

_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

I'd implement it as:

tailDropWhile :: (a -> Bool) -> [a] -> [a] tailDropWhile p xs = go id xs where go _ [] = [] go k (x:xs) | p x = go (k . (x:)) xs | otherwise = k (x : go id xs)

which is as lazy as possible and doesn't involve any reversing.

It's a function I've needed to use before, but I don't feel strongly about its inclusion particularly. While Ivan raises legitimate concerns, the version I've given above is a bit more streamy, and anyway many of us acknowledge the value of (!!) and related "bad" list functions in some circumstances.

Hmm. I wrote this attempting to satisfy the goals of "lazy and efficient", but on second thoughts it seems like Aoe's original function was both. You seem to be under the impression that it's more strict, but I think it's just as lazy as the others. Can you provide evidence of its strictness? (In my (admittedly crude) benchmarks it seems Aoe's was narrowly faster than mine, which was significantly faster than wren's).

Kazu Yamamoto

30 Sep 30 Sep

2:35 a.m.

Hello,

...

Hmm. I wrote this attempting to satisfy the goals of "lazy and efficient", but on second thoughts it seems like Aoe's original function was both. You seem to be under the impression that it's more strict, but I think it's just as lazy as the others. Can you provide evidence of its strictness?

In my definition, if tailDropWhile isSpace $ " foo bar " ++ undefined returns " foo bar" and then an error, it's lazy. If it causes an error without " foo bar", it's strict. This is my fault. Aoe's one is also lazy. --Kazu

wren ng thornton

29 Sep 29 Sep

10:31 p.m.

On 9/28/11 10:06 AM, Ben Millwood wrote:

...

I'd implement it as:

tailDropWhile :: (a -> Bool) -> [a] -> [a] tailDropWhile p xs = go id xs where go _ [] = [] go k (x:xs) | p x = go (k . (x:)) xs | otherwise = k (x : go id xs)

which is as lazy as possible and doesn't involve any reversing.

The reversing isn't the problematic part. Since GHC.List defines reverse xs = rev xs [], the problematic bit is the concatenation. Rather than writing (reverse xs ++ ys) which is (rev xs [] ++ ys), what we really want is (rev xs ys) so that we don't need to traverse the results of rev in order to copy them onto the front of ys. And this is exactly the fusion you've done when converting to difference-list style. The version I provided in the previous thread was aimed mainly at being clear about the fact that it's just a PDA (since the thread was asking about how we could possibly define such a function). As I recall, I mentioned using difference lists as one of the many ways of optimizing the version I presented. Another of the optimizations I mentioned is that you can optimize away the need to keep track of the machine's state, since it's so simple; as you demonstrate, you only really need a single state (and the stack/continuation). The good-producer version with difference lists is straightforward: tailDropWhile p = \xs0 -> build (builder xs0) where builder xs0 cons nil = go id xs0 where go _ [] = nil go k (x:xs) | p x = go (k . cons x) xs | otherwise = k (x `cons` go id xs) -- Live well, ~wren

Bryan O'Sullivan

3:34 a.m.

On Wed, Sep 28, 2011 at 9:19 AM, Kazu Yamamoto wrote:

...

Many languages provide the 'chop' or 'trip' function but Data.List does not. I would like to add a new function called 'tailDropWhile' so that people can easily implement 'chop' for String:

I don't have an opinion on whether or not this should go in, but if the proposal is accepted, the name of the function should follow the existing Haskell Platform precedent from the "text" package, and be "dropWhileEnd" instead of "tailDropWhile".

Ivan Lazar Miljenovic

3:58 a.m.

On 29 September 2011 13:34, Bryan O'Sullivan wrote:

...

On Wed, Sep 28, 2011 at 9:19 AM, Kazu Yamamoto wrote:

...
Many languages provide the 'chop' or 'trip' function but Data.List does not. I would like to add a new function called 'tailDropWhile' so that people can easily implement 'chop' for String:

I don't have an opinion on whether or not this should go in, but if the proposal is accepted, the name of the function should follow the existing Haskell Platform precedent from the "text" package, and be "dropWhileEnd" instead of "tailDropWhile".

I agree with Bryan regarding the name (though I'm still dubious as to whether it should be in Data.List). -- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com IvanMiljenovic.wordpress.com

Kazu Yamamoto

4:01 a.m.

...

...
I don't have an opinion on whether or not this should go in, but if the proposal is accepted, the name of the function should follow the existing Haskell Platform precedent from the "text" package, and be "dropWhileEnd" instead of "tailDropWhile".

I agree with Bryan regarding the name (though I'm still dubious as to whether it should be in Data.List).

I agree with the name too. --Kazu

5034

Age (days ago)

5036

Last active (days ago)

List overview

Download

20 comments

11 participants

participants (11)

Ben Millwood
Bryan O'Sullivan
David Virebayre
Evan Laforge
Henning Thielemann
Ivan Lazar Miljenovic
Joachim Breitner
Kazu Yamamoto
Maciej Marcin Piechotka
Roman Leshchinskiy
wren ng thornton