
Hello Cafe, I would like to have an efficient implementation of the chop function. As you guess, the chop function drops spaces in the tail of a list. chop " foo bar baz " -> " foo bar baz" A naive implementation is as follows: chopReverse :: String -> String chopReverse = reverse . dropWhile isSpace . reverse But this is not elegant. foldr version is as follows: chopFoldr :: String -> String chopFoldr = foldr f [] where f c [] | isSpace c = [] | otherwise = c:[] f c cs = c:cs But this code is slower than chopReverse in some cases. Are there any more efficient implementations of chop? Any suggestions? --Kazu

This was a recent question on StackOverflow:
http://stackoverflow.com/questions/6270324/in-haskell-how-do-you-trim-whites...
Where I started:
If you have serious text processing needs then use the text package
from hackage.
And concluded:
A quick Criterion benchmark tells me that (for a particularly long
string of words with spaces and ~200 pre and post spaces) my trim
takes 1.6 ms, the trim using reverse takes 3.5ms, and Data.Text.strip
takes 0.0016 ms.
Cheers,
Thomas
On Tue, Sep 13, 2011 at 8:03 PM, Kazu Yamamoto
Hello Cafe,
I would like to have an efficient implementation of the chop function. As you guess, the chop function drops spaces in the tail of a list.
chop " foo bar baz " -> " foo bar baz"
A naive implementation is as follows:
chopReverse :: String -> String chopReverse = reverse . dropWhile isSpace . reverse
But this is not elegant. foldr version is as follows:
chopFoldr :: String -> String chopFoldr = foldr f [] where f c [] | isSpace c = [] | otherwise = c:[] f c cs = c:cs
But this code is slower than chopReverse in some cases.
Are there any more efficient implementations of chop? Any suggestions?
--Kazu
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Hello,
Of course, I use ByteString or Text for real programming. But I would
like to know whether or not there are any efficient methods to remove
a tail part of a list.
--Kazu
From: Thomas DuBuisson
This was a recent question on StackOverflow:
http://stackoverflow.com/questions/6270324/in-haskell-how-do-you-trim-whites...
Where I started:
If you have serious text processing needs then use the text package from hackage.
And concluded:
A quick Criterion benchmark tells me that (for a particularly long string of words with spaces and ~200 pre and post spaces) my trim takes 1.6 ms, the trim using reverse takes 3.5ms, and Data.Text.strip takes 0.0016 ms.
Cheers, Thomas
On Tue, Sep 13, 2011 at 8:03 PM, Kazu Yamamoto
wrote: Hello Cafe,
I would like to have an efficient implementation of the chop function. As you guess, the chop function drops spaces in the tail of a list.
chop " foo bar baz " -> " foo bar baz"
A naive implementation is as follows:
chopReverse :: String -> String chopReverse = reverse . dropWhile isSpace . reverse
But this is not elegant. foldr version is as follows:
chopFoldr :: String -> String chopFoldr = foldr f [] where f c [] | isSpace c = [] | otherwise = c:[] f c cs = c:cs
But this code is slower than chopReverse in some cases.
Are there any more efficient implementations of chop? Any suggestions?
--Kazu
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On 14 September 2011 13:29, Kazu Yamamoto
Hello,
Of course, I use ByteString or Text for real programming. But I would like to know whether or not there are any efficient methods to remove a tail part of a list.
I doubt it; lists aren't that great a data type if you want to keep manipulating the end all the time. You'd have more luck with a Sequence or some other data type. -- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com IvanMiljenovic.wordpress.com

On Sep 14, 2011, at 5:29 AM, Kazu Yamamoto (山本和彦) wrote:
Hello,
Of course, I use ByteString or Text for real programming. But I would like to know whether or not there are any efficient methods to remove a tail part of a list.
--Kazu
In that case, I would prefer this version, since it is lazier: lazyChop :: String -> String lazyChop s = pref ++ if null s' then [] else (mid_sp ++ lazyChop s') where (pref,sp_suf) = break isSpace s (mid_sp,s') = span isSpace sp_suf By "lazier" I mean: *Main> chopReverse $ "hello world " ++ undefined "*** Exception: Prelude.undefined *Main> chopFoldr $ "hello world " ++ undefined "*** Exception: Prelude.undefined *Main> lazyChop $ "hello world " ++ undefined "hello world*** Exception: Prelude.undefined Daniel

Hello, My friend reached the following version: chop :: String -> String chop = foldr go [] where go x xs | isSpace x && null xs = [] | otherwise = x:xs This version is faster than the reverse version in most cases. The point is checking "isSpace" first and falling into "otherwise" in many cases, which is a natural co-recursion. Thanks anyway. --Kazu
Hello,
Of course, I use ByteString or Text for real programming. But I would like to know whether or not there are any efficient methods to remove a tail part of a list.
--Kazu
In that case, I would prefer this version, since it is lazier:
lazyChop :: String -> String lazyChop s = pref ++ if null s' then [] else (mid_sp ++ lazyChop s') where (pref,sp_suf) = break isSpace s (mid_sp,s') = span isSpace sp_suf
By "lazier" I mean:
*Main> chopReverse $ "hello world " ++ undefined "*** Exception: Prelude.undefined *Main> chopFoldr $ "hello world " ++ undefined "*** Exception: Prelude.undefined *Main> lazyChop $ "hello world " ++ undefined "hello world*** Exception: Prelude.undefined
Daniel

* Kazu Yamamoto
Hello,
My friend reached the following version:
chop :: String -> String chop = foldr go [] where go x xs | isSpace x && null xs = [] | otherwise = x:xs
This version is faster than the reverse version in most cases. The point is checking "isSpace" first and falling into "otherwise" in many cases, which is a natural co-recursion.
This was exactly my first attempt on rewriting your foldr version. Unfortunately, it doesn't seem faster at all (foldr2 below). The test input was replicate 100 'a' ++ replicate 100 ' '. GHC 7, -O2. benchmarking all/foldr mean: 2.808462 us, lb 2.807047 us, ub 2.810520 us, ci 0.950 std dev: 8.620822 ns, lb 6.535738 ns, ub 11.59552 ns, ci 0.950 benchmarking all/reverse mean: 4.128217 us, lb 4.125409 us, ub 4.134086 us, ci 0.950 std dev: 20.07591 ns, lb 11.47572 ns, ub 38.33738 ns, ci 0.950 benchmarking all/foldr2 mean: 6.701714 us, lb 6.692093 us, ub 6.711627 us, ci 0.950 std dev: 50.06638 ns, lb 42.25004 ns, ub 64.84223 ns, ci 0.950 -- Roman I. Cheplyaka :: http://ro-che.info/

You can find the results of my friend: https://gist.github.com/1215660 Please ignore the Japanese text. Please read the code and the results. I'm not sure why you had the different result. --Kazu
This was exactly my first attempt on rewriting your foldr version.
Unfortunately, it doesn't seem faster at all (foldr2 below). The test input was replicate 100 'a' ++ replicate 100 ' '. GHC 7, -O2.
benchmarking all/foldr mean: 2.808462 us, lb 2.807047 us, ub 2.810520 us, ci 0.950 std dev: 8.620822 ns, lb 6.535738 ns, ub 11.59552 ns, ci 0.950
benchmarking all/reverse mean: 4.128217 us, lb 4.125409 us, ub 4.134086 us, ci 0.950 std dev: 20.07591 ns, lb 11.47572 ns, ub 38.33738 ns, ci 0.950
benchmarking all/foldr2 mean: 6.701714 us, lb 6.692093 us, ub 6.711627 us, ci 0.950 std dev: 50.06638 ns, lb 42.25004 ns, ub 64.84223 ns, ci 0.950
-- Roman I. Cheplyaka :: http://ro-che.info/

On Wednesday 14 September 2011, 09:17:16, Kazu Yamamoto wrote:
You can find the results of my friend:
https://gist.github.com/1215660
Please ignore the Japanese text. Please read the code and the results. I'm not sure why you had the different result.
Input size. The lazy foldr combinator gets compiled to a bigger, more complicated function. When the input is short, the code size makes it slower. But when the input is long, the lazy foldr wins because it can produce incremental results while the strict foldr combinator and revChop need to traverse the entire list before they can produce anything - except for the case of 'spaces', where indeed the strict foldr combinator is (slightly) faster.

On Wed, Sep 14, 2011 at 05:03, Kazu Yamamoto wrote:
I would like to have an efficient implementation of the chop function.
[...]
Are there any more efficient implementations of chop? Any suggestions?
chop xs = go xs id "" where go "" _ = id go (c:cs) ss | isSpace c = go cs (ss . (:) c) go (c:cs) ss | otherwise = ss . (:) c . go cs id I haven't looked at the performance, but I would like to know how well it fares. Regards, Sean

On 15 September 2011 01:24, Sean Leather
On Wed, Sep 14, 2011 at 05:03, Kazu Yamamoto wrote:
I would like to have an efficient implementation of the chop function.
[...]
Are there any more efficient implementations of chop? Any suggestions?
chop xs = go xs id "" where go "" _ = id go (c:cs) ss | isSpace c = go cs (ss . (:) c) go (c:cs) ss | otherwise = ss . (:) c . go cs id
Why the extra case for go? The otherwise guard can be part of the second case... -- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com IvanMiljenovic.wordpress.com

Quoth Ivan Lazar Miljenovic
Why the extra case for go? The otherwise guard can be part of the second case...
I noticed that myself, so I thought "let's see if it's just a matter of style that comes out the same after compilation ..." ... and after a few minutes fooling around with that, I'm none the wiser. I could not, within the time allotted, make out what's going on in the -fvia-C .hc files. I guess the way to answer questions like this is to apply the function in question to massive amounts of data and infer the answer from resulting performance? Donn

On Wed, Sep 14, 2011 at 17:31, Ivan Lazar Miljenovic wrote:
On 15 September 2011 01:24, Sean Leather wrote:
On Wed, Sep 14, 2011 at 05:03, Kazu Yamamoto wrote:
I would like to have an efficient implementation of the chop function.
[...]
Are there any more efficient implementations of chop? Any suggestions?
chop xs = go xs id "" where go "" _ = id go (c:cs) ss | isSpace c = go cs (ss . (:) c) go (c:cs) ss | otherwise = ss . (:) c . go cs id
Why the extra case for go? The otherwise guard can be part of the second case...
Do you mean "why include 'go (c:cs) ss' twice"? That's merely because I was working through several versions and forgot to merge the second and third cases before sending the email. Nothing sinister. Sean

On 9/13/11 11:03 PM, Kazu Yamamoto (山本和彦) wrote:
Hello Cafe,
I would like to have an efficient implementation of the chop function. As you guess, the chop function drops spaces in the tail of a list.
chop " foo bar baz " -> " foo bar baz"
A naive implementation is as follows:
chopReverse :: String -> String chopReverse = reverse . dropWhile isSpace . reverse
But this is not elegant.
Just consider it as an automaton/transducer problem, namely a PDA: chop = go 0 [] where -- go :: State -> Stack -> String -> String go 0 _ [] = [] go 0 _ (x:xs) | isSpace x = go 1 [x] xs | otherwise = x : go 0 xs go 1 ss [] = [] go 1 ss (x:xs) | isSpace c = go 1 (x:ss) xs | otherwise = reverse ss ++ x : go 0 xs Of course you can optimize this implementation. You don't actually need the state argument, since it's encoded by the emptiness/non-emptiness of the remembered spaces. And if you only care about (==' ') instead of isSpace, then you don't need to reverse the spaces when adding them back on. Also, if you only care about (==' '), you could just keep track of the number of spaces instead of the actual list of them; or if you do care about isSpace you could also use a difference list instead of doing the reversal--- though I'm not sure which is more optimal here. As a transducer, this version can produce the output with minimal delays; whereas your foldr implementation needs to traverse the whole string before it can return the first character. If you want proper list-fusion (instead of just laziness), you could also use the `build` function to abstract over (:) and []. -- Live well, ~wren
participants (9)
-
Daniel Fischer
-
Daniel Gorín
-
Donn Cave
-
Ivan Lazar Miljenovic
-
Kazu Yamamoto
-
Roman Cheplyaka
-
Sean Leather
-
Thomas DuBuisson
-
wren ng thornton