
I would like to propose adding a the functions scanl' and scanl1' to Data.List. The presence and regular use of foldl' and foldl1' suggests that corresponding scan functions should be available.
scanl' :: (a -> b -> a) -> a -> [b] -> [a] scanl' f q ls = q `seq` (q : (case ls of [] -> [] x:xs -> scanl' f (f q x) xs))
What do you think? Niklas

On Tue, 11 Sep 2012, Niklas Hambüchen wrote:
I would like to propose adding a the functions scanl' and scanl1' to Data.List.
The presence and regular use of foldl' and foldl1' suggests that corresponding scan functions should be available.
scanl' :: (a -> b -> a) -> a -> [b] -> [a] scanl' f q ls = q `seq` (q : (case ls of [] -> [] x:xs -> scanl' f (f q x) xs))
What do you think?
I don't know whether this "scanl'" is of much use. "foldl'" is required because we can access its accumulator only after "foldl'" finished. But in "scanl'" you can and usually should access the interim accumulator values that are contained in the result list.

On Tue, Sep 11, 2012 at 11:43 AM, Henning Thielemann
I don't know whether this "scanl'" is of much use. "foldl'" is required because we can access its accumulator only after "foldl'" finished. But in "scanl'" you can and usually should access the interim accumulator values that are contained in the result list.
I use it relatively frequently, e.g. 'intervals = scanl (+) 0 (cycle [2, 1, 2, 2, 1, 2, 2])'

On Tue, 11 Sep 2012, Evan Laforge wrote:
On Tue, Sep 11, 2012 at 11:43 AM, Henning Thielemann
wrote: I don't know whether this "scanl'" is of much use. "foldl'" is required because we can access its accumulator only after "foldl'" finished. But in "scanl'" you can and usually should access the interim accumulator values that are contained in the result list.
I use it relatively frequently, e.g. 'intervals = scanl (+) 0 (cycle [2, 1, 2, 2, 1, 2, 2])'
And what do you do with the elements of 'intervals'? I guess you consume them and then you force their evaluation this way. However, it might be that you filter out some elements and thus consume only some of the elements. In this case it might be better to have a "scanl'".

On Tue, Sep 11, 2012 at 11:57 AM, Henning Thielemann
On Tue, 11 Sep 2012, Evan Laforge wrote:
On Tue, Sep 11, 2012 at 11:43 AM, Henning Thielemann
wrote: I don't know whether this "scanl'" is of much use. "foldl'" is required because we can access its accumulator only after "foldl'" finished. But in "scanl'" you can and usually should access the interim accumulator values that are contained in the result list.
I use it relatively frequently, e.g. 'intervals = scanl (+) 0 (cycle [2, 1, 2, 2, 1, 2, 2])'
And what do you do with the elements of 'intervals'? I guess you consume them and then you force their evaluation this way.
Yes, you're right, I don't rely on laziness in the result. And even if I did, in the case of (+) it probably isn't worth it anyway. Sorry, I missed that the key point was whether or not the interim values are used.

On 11/09/12 19:57, Henning Thielemann wrote:
However, it might be that you filter out some elements and thus consume only some of the elements. In this case it might be better to have a "scanl'".
Exactly. But the even more problematic case is when *you* don't do anything with it, but write a library for others. You don't know what they are going to do with the list you give them, and they don't know (and shouldn't need to know) about the implementation detail that you are using scanl - they only see the type, e.g. [Int], and can easily fall into the trap of calling last on that. I give a real-world example: I wrote a small statistics library that calculates moving sums, averages, standard deviations and so on. You give it the infinite list, it could give you the infinite list of moving sums of size n (sums of successive n elements). movingWindowSum :: Int -> [Int] -> [Int] Somewhere in movingWindowSum I use "scanl (+) 0". If the user of my library calls last on that, which is not unexpected for my use case, or is only interested in certain parts of that list, he gets a space leak due to my implementation detail. So I either have to disclose my implementation detail or hand-write a scanl' into my library. As a side note: Data.List.sum [1..100000000] eats all my memory in my ghci. Why does it use foldl, not foldl'?

On Tue, Sep 11, 2012 at 8:24 PM, Niklas Hambüchen
As a side note: Data.List.sum [1..100000000] eats all my memory in my ghci. Why does it use foldl, not foldl'?
Strictly* speaking, it's possible for (+) to be lazy in its parameters; in any case, if you compile with optimisations you may find the memory leak goes away. * No pun intended.

On 11/09/12 22:23, Ben Millwood wrote:
if you compile with optimisations you may find the memory leak goes away.
Sure, but should users expect the space complexity being O(n) vs O(1) depending on a compiler switch? Was it intentional that sum is defined in terms of foldl and not foldl'?

On Wed, 12 Sep 2012, Niklas Hambüchen wrote:
On 11/09/12 22:23, Ben Millwood wrote:
if you compile with optimisations you may find the memory leak goes away.
Sure, but should users expect the space complexity being O(n) vs O(1) depending on a compiler switch?
Was it intentional that sum is defined in terms of foldl and not foldl'?
I think yes, because (+) can be lazy, as Ben mentioned. Think of the sum of power series. Actually, foldl' would also work for power series, because it evaluates only the top-most constructor. This is also surprising. For me foldl' is quite a hack, since it relies on the hack 'seq'.

On 9/12/12 2:30 PM, Niklas Hambüchen wrote:
Was it intentional that sum is defined in terms of foldl and not foldl'?
Yes, because sum is defined in the Report as using foldl. This is silly, but it is what it is. With optimizations on, GHC often converts foldl into foldl' (when it's semantics-preserving) because of this and other functions which "must" use foldl. -- Live well, ~wren

Yes, because sum is defined in the Report as using foldl. This is silly, but it is what it is. With optimizations on, GHC often converts foldl into foldl' (when it's semantics-preserving) because of this and other functions which "must" use foldl.
I see. This, though, sounds like an argument for scanl' for me, so that I don't have to rely on it doing optimisations like this "often" (in last . scanl, it obviously doesn't), and not to have the usefulness of my library hovering between fast/constant-memory and eats-all-my-ram based on the hope that -O guesses what I want (doesn't work in ghci, of course) *and* that the user knows how I construct my lists internally.
participants (5)
-
Ben Millwood
-
Evan Laforge
-
Henning Thielemann
-
Niklas Hambüchen
-
wren ng thornton