Adding scanl'

Niklas Hambüchen

11 Sep 2012 11 Sep '12

4:20 p.m.

I would like to propose adding a the functions scanl' and scanl1' to Data.List. The presence and regular use of foldl' and foldl1' suggests that corresponding scan functions should be available.

...

scanl' :: (a -> b -> a) -> a -> [b] -> [a] scanl' f q ls = q `seq` (q : (case ls of [] -> [] x:xs -> scanl' f (f q x) xs))

What do you think? Niklas

Show replies by date

Henning Thielemann

11 Sep 11 Sep

6:43 p.m.

On Tue, 11 Sep 2012, Niklas Hambüchen wrote:

...

I would like to propose adding a the functions scanl' and scanl1' to Data.List.

The presence and regular use of foldl' and foldl1' suggests that corresponding scan functions should be available.

...
scanl' :: (a -> b -> a) -> a -> [b] -> [a] scanl' f q ls = q `seq` (q : (case ls of [] -> [] x:xs -> scanl' f (f q x) xs))

What do you think?

I don't know whether this "scanl'" is of much use. "foldl'" is required because we can access its accumulator only after "foldl'" finished. But in "scanl'" you can and usually should access the interim accumulator values that are contained in the result list.

Evan Laforge

6:51 p.m.

On Tue, Sep 11, 2012 at 11:43 AM, Henning Thielemann wrote:

...

I don't know whether this "scanl'" is of much use. "foldl'" is required because we can access its accumulator only after "foldl'" finished. But in "scanl'" you can and usually should access the interim accumulator values that are contained in the result list.

I use it relatively frequently, e.g. 'intervals = scanl (+) 0 (cycle [2, 1, 2, 2, 1, 2, 2])'

Henning Thielemann

6:57 p.m.

On Tue, 11 Sep 2012, Evan Laforge wrote:

...

On Tue, Sep 11, 2012 at 11:43 AM, Henning Thielemann wrote:

...
I don't know whether this "scanl'" is of much use. "foldl'" is required because we can access its accumulator only after "foldl'" finished. But in "scanl'" you can and usually should access the interim accumulator values that are contained in the result list.

I use it relatively frequently, e.g. 'intervals = scanl (+) 0 (cycle [2, 1, 2, 2, 1, 2, 2])'

And what do you do with the elements of 'intervals'? I guess you consume them and then you force their evaluation this way. However, it might be that you filter out some elements and thus consume only some of the elements. In this case it might be better to have a "scanl'".

Evan Laforge

7:06 p.m.

On Tue, Sep 11, 2012 at 11:57 AM, Henning Thielemann wrote:

...

On Tue, 11 Sep 2012, Evan Laforge wrote:

...
On Tue, Sep 11, 2012 at 11:43 AM, Henning Thielemann wrote:

...
I don't know whether this "scanl'" is of much use. "foldl'" is required because we can access its accumulator only after "foldl'" finished. But in "scanl'" you can and usually should access the interim accumulator values that are contained in the result list.

I use it relatively frequently, e.g. 'intervals = scanl (+) 0 (cycle [2, 1, 2, 2, 1, 2, 2])'

And what do you do with the elements of 'intervals'? I guess you consume them and then you force their evaluation this way.

Yes, you're right, I don't rely on laziness in the result. And even if I did, in the case of (+) it probably isn't worth it anyway. Sorry, I missed that the key point was whether or not the interim values are used.

Niklas Hambüchen

7:24 p.m.

On 11/09/12 19:57, Henning Thielemann wrote:

...

However, it might be that you filter out some elements and thus consume only some of the elements. In this case it might be better to have a "scanl'".

Exactly. But the even more problematic case is when *you* don't do anything with it, but write a library for others. You don't know what they are going to do with the list you give them, and they don't know (and shouldn't need to know) about the implementation detail that you are using scanl - they only see the type, e.g. [Int], and can easily fall into the trap of calling last on that. I give a real-world example: I wrote a small statistics library that calculates moving sums, averages, standard deviations and so on. You give it the infinite list, it could give you the infinite list of moving sums of size n (sums of successive n elements). movingWindowSum :: Int -> [Int] -> [Int] Somewhere in movingWindowSum I use "scanl (+) 0". If the user of my library calls last on that, which is not unexpected for my use case, or is only interested in certain parts of that list, he gets a space leak due to my implementation detail. So I either have to disclose my implementation detail or hand-write a scanl' into my library. As a side note: Data.List.sum [1..100000000] eats all my memory in my ghci. Why does it use foldl, not foldl'?

Ben Millwood

9:23 p.m.

On Tue, Sep 11, 2012 at 8:24 PM, Niklas Hambüchen wrote:

...

As a side note: Data.List.sum [1..100000000] eats all my memory in my ghci. Why does it use foldl, not foldl'?

Strictly* speaking, it's possible for (+) to be lazy in its parameters; in any case, if you compile with optimisations you may find the memory leak goes away. * No pun intended.

Niklas Hambüchen

12 Sep 12 Sep

12:30 p.m.

On 11/09/12 22:23, Ben Millwood wrote:

...

if you compile with optimisations you may find the memory leak goes away.

Sure, but should users expect the space complexity being O(n) vs O(1) depending on a compiler switch? Was it intentional that sum is defined in terms of foldl and not foldl'?

Henning Thielemann

12:49 p.m.

On Wed, 12 Sep 2012, Niklas Hambüchen wrote:

...

On 11/09/12 22:23, Ben Millwood wrote:

...
if you compile with optimisations you may find the memory leak goes away.

Sure, but should users expect the space complexity being O(n) vs O(1) depending on a compiler switch?

Was it intentional that sum is defined in terms of foldl and not foldl'?

I think yes, because (+) can be lazy, as Ben mentioned. Think of the sum of power series. Actually, foldl' would also work for power series, because it evaluates only the top-most constructor. This is also surprising. For me foldl' is quite a hack, since it relies on the hack 'seq'.

wren ng thornton

13 Sep 13 Sep

11:18 p.m.

On 9/12/12 2:30 PM, Niklas Hambüchen wrote:

...

Was it intentional that sum is defined in terms of foldl and not foldl'?

Yes, because sum is defined in the Report as using foldl. This is silly, but it is what it is. With optimizations on, GHC often converts foldl into foldl' (when it's semantics-preserving) because of this and other functions which "must" use foldl. -- Live well, ~wren

Niklas Hambüchen

14 Sep 14 Sep

12:03 a.m.

...

Yes, because sum is defined in the Report as using foldl. This is silly, but it is what it is. With optimizations on, GHC often converts foldl into foldl' (when it's semantics-preserving) because of this and other functions which "must" use foldl.

I see. This, though, sounds like an argument for scanl' for me, so that I don't have to rely on it doing optimisations like this "often" (in last . scanl, it obviously doesn't), and not to have the usefulness of my library hovering between fast/constant-memory and eats-all-my-ram based on the hope that -O guesses what I want (doesn't work in ghci, of course) *and* that the user knows how I construct my lists internally.

4675

Age (days ago)

4678

Last active (days ago)

List overview

Download

10 comments

5 participants

participants (5)

Ben Millwood
Evan Laforge
Henning Thielemann
Niklas Hambüchen
wren ng thornton

Adding scanl'

wren ng thornton

tags

participants (5)