Proposal: strictify foldl'

As Duncan Coutts explains toward the end of http://www.well-typed.com/blog/90/ (which proposes something else I personally *don't* endorse), foldl', the strict foldl, isn't actually strict enough. In particular, it's only conditionally strict in the initial value for the accumulator: foldl' (\_ x -> x) undefined [3] = 3 Why does this matter? Strictness analysis needs to look at (and be able to look at) the function passed to foldl' to determine whether the expression is strict in the initial value. foldl'-as-foldr tends to complicate this sort of analysis already. Proposal: make foldl' unconditionally strict in the initial accumulator value, both in GHC.List and in (the default definition in) Data.Foldable, and make foldr' in Data.Foldable unconditionally strict in the initial value of its accumulator. Specifically, foldl' k z0 xs = foldr (\v fn z -> z `seq` fn (k z v)) id xs z0 would change to foldl' k !z0 xs = foldr (\v fn z -> z `seq` fn (k z v)) id xs z0

On Mon, 3 Nov 2014, David Feuer wrote:
Proposal: make foldl' unconditionally strict in the initial accumulator value, both in GHC.List and in (the default definition in) Data.Foldable, and make foldr' in Data.Foldable unconditionally strict in the initial value of its accumulator.
I am worried about the general move from foldl to foldl', because foldl' is still not strict enough. E.g. if the accumulator is a list, only the leading (:) is evaluated. I would prefer a strict foldl provided by the deepseq package and deprecate foldl' in favor of deepseq:foldl. This new foldl could immediately implement the unconditional strictness for the initial accumulator value.

The deepseq package doesn't provide any sort of foldl. I'd personally be very much opposed to such a change in any case; Something built up by foldl' may only modify things near the root of the accumulator. On Mon, Nov 3, 2014 at 11:59 AM, Henning Thielemann < lemming@henning-thielemann.de> wrote:
On Mon, 3 Nov 2014, David Feuer wrote:
Proposal: make foldl' unconditionally strict in the initial accumulator
value, both in GHC.List and in (the default definition in) Data.Foldable, and make foldr' in Data.Foldable unconditionally strict in the initial value of its accumulator.
I am worried about the general move from foldl to foldl', because foldl' is still not strict enough. E.g. if the accumulator is a list, only the leading (:) is evaluated. I would prefer a strict foldl provided by the deepseq package and deprecate foldl' in favor of deepseq:foldl. This new foldl could immediately implement the unconditional strictness for the initial accumulator value.

On 03/11/14 17:59, Henning Thielemann wrote:
I would prefer a strict foldl provided by the deepseq package and deprecate foldl' in favor of deepseq:foldl. This new foldl could immediately implement the unconditional strictness for the initial accumulator value.
This has the problem though that `deepseq` will traverse the entire structure, and while seq traversals are quite fast, if done n times it will still take n*n time (this is the classic "why don't deepseq everywhere" problem, I've stumbled over it again recently). What I mean to say is: Using `deepseq` in `foldl'` might forbid you to write some recursive code, e.g. in the cases where you call an outer foldl' on something whose substructure you used foldl', like when you use foldl' to fold up a tree from the leaves up.

On Mon, 3 Nov 2014, Niklas Hambüchen wrote:
On 03/11/14 17:59, Henning Thielemann wrote:
I would prefer a strict foldl provided by the deepseq package and deprecate foldl' in favor of deepseq:foldl. This new foldl could immediately implement the unconditional strictness for the initial accumulator value.
This has the problem though that `deepseq` will traverse the entire structure, and while seq traversals are quite fast, if done n times it will still take n*n time
If 'seq' is faster than 'deepseq' it leaves things unevaluated and thus builds up unevaluated expressions, right?
What I mean to say is: Using `deepseq` in `foldl'` might forbid you to write some recursive code, e.g. in the cases where you call an outer foldl' on something whose substructure you used foldl', like when you use foldl' to fold up a tree from the leaves up.
I can't follow. :-( Do you have an example?

On 03/11/14 18:16, Henning Thielemann wrote:
If 'seq' is faster than 'deepseq' it leaves things unevaluated and thus builds up unevaluated expressions, right?
Yes. What I mean is you could write a function function (using repeated foldl' inside) that is written such that its return value is fully evaluated (by carefully using just as many seqs as necessary for that). If foldl' did a full deepseq on its return value, then this function would seq the same data more times than necessary. I will try to give a (probably not so excellent) example: Let's say you want to `fmap (*2)` over a tree and implement it in recursive `foldl' (:)` style over the children of each node. Then deepseq in foldl' would trigger a full traversal of every subtree in the tree, adding a (*n) runtime factor. Yes, in this case you could probably just use laziness all well, using only the current foldl' and do deepseq outside, but I can imagine that there are cases similar to my example where that doesn't work, and were you want explicit seq control (but I can't come up with a perfect example for this yet).

On Mon, Nov 3, 2014 at 12:35 PM, Niklas Hambüchen
On 03/11/14 18:16, Henning Thielemann wrote:
If 'seq' is faster than 'deepseq' it leaves things unevaluated and thus builds up unevaluated expressions, right?
Yes.
Not necessarily in a bad way. If I forcefully insert something at the end of a lazy finger tree, say, that won't force the entire structure of the tree, but it will force the root, which I believe is sufficient to prevent unbroken thunk nests, and probably what you'd actually want.

On Mon, 3 Nov 2014, Niklas Hambüchen wrote:
On 03/11/14 18:16, Henning Thielemann wrote:
If 'seq' is faster than 'deepseq' it leaves things unevaluated and thus builds up unevaluated expressions, right?
Yes.
What I mean is you could write a function function (using repeated foldl' inside) that is written such that its return value is fully evaluated (by carefully using just as many seqs as necessary for that).
If foldl' did a full deepseq on its return value, then this function would seq the same data more times than necessary.
I see.

On Mon, Nov 3, 2014 at 6:51 PM, David Feuer
As Duncan Coutts explains toward the end of http://www.well-typed.com/blog/90/ (which proposes something else I personally *don't* endorse), foldl', the strict foldl, isn't actually strict enough. In particular, it's only conditionally strict in the initial value for the accumulator:
foldl' (\_ x -> x) undefined [3] = 3
Why does this matter? Strictness analysis needs to look at (and be able to look at) the function passed to foldl' to determine whether the expression is strict in the initial value. foldl'-as-foldr tends to complicate this sort of analysis already.
Proposal: make foldl' unconditionally strict in the initial accumulator value, both in GHC.List and in (the default definition in) Data.Foldable, and make foldr' in Data.Foldable unconditionally strict in the initial value of its accumulator.
Specifically,
foldl' k z0 xs = foldr (\v fn z -> z `seq` fn (k z v)) id xs z0
would change to
foldl' k !z0 xs = foldr (\v fn z -> z `seq` fn (k z v)) id xs z0
There are valid[1] uses of `foldl'` that would be broken by this change, e.g.: {-# LANGUAGE BangPatterns #-} import Data.List (foldl') foldl2' k !z0 xs = foldr (\v fn z -> z `seq` fn (k z v)) id xs z0 last1 = foldl' (flip const) (error "last1") last2 = foldl2' (flip const) (error "last2") main :: IO () main = do let list = [1, 2, 3] :: [Int] print $ last1 list print $ last2 list The current foldl' allows us to implement a `last` function, the new one does not. You can argue that there are far better ways to write `last` (and Duncan points that out in his blog post). But I'd like to have a better understanding of how much (silent) breakage this change would introduce before we move ahead with it, as well as how much of a benefit it might provide. Michael [1] For some definition of valid.

Your example is not exactly valid, because your version of `last` forces
the whole list. I think you do make a decent point: checking separately to
see if a list is empty in order to produce an error will destroy fusion
opportunities. This may be trickier than I anticipated.
On Mon, Nov 3, 2014 at 12:23 PM, Michael Snoyman
On Mon, Nov 3, 2014 at 6:51 PM, David Feuer
wrote: As Duncan Coutts explains toward the end of http://www.well-typed.com/blog/90/ (which proposes something else I personally *don't* endorse), foldl', the strict foldl, isn't actually strict enough. In particular, it's only conditionally strict in the initial value for the accumulator:
foldl' (\_ x -> x) undefined [3] = 3
Why does this matter? Strictness analysis needs to look at (and be able to look at) the function passed to foldl' to determine whether the expression is strict in the initial value. foldl'-as-foldr tends to complicate this sort of analysis already.
Proposal: make foldl' unconditionally strict in the initial accumulator value, both in GHC.List and in (the default definition in) Data.Foldable, and make foldr' in Data.Foldable unconditionally strict in the initial value of its accumulator.
Specifically,
foldl' k z0 xs = foldr (\v fn z -> z `seq` fn (k z v)) id xs z0
would change to
foldl' k !z0 xs = foldr (\v fn z -> z `seq` fn (k z v)) id xs z0
There are valid[1] uses of `foldl'` that would be broken by this change, e.g.:
{-# LANGUAGE BangPatterns #-} import Data.List (foldl')
foldl2' k !z0 xs = foldr (\v fn z -> z `seq` fn (k z v)) id xs z0
last1 = foldl' (flip const) (error "last1") last2 = foldl2' (flip const) (error "last2")
main :: IO () main = do let list = [1, 2, 3] :: [Int] print $ last1 list print $ last2 list
The current foldl' allows us to implement a `last` function, the new one does not. You can argue that there are far better ways to write `last` (and Duncan points that out in his blog post). But I'd like to have a better understanding of how much (silent) breakage this change would introduce before we move ahead with it, as well as how much of a benefit it might provide.
Michael
[1] For some definition of valid.

Hi all, FYI, foldl's in containers already have this property being strict in the initial value. Therefore, Data.Map.foldl' (\_ x -> x) undefined $ Data.Map.fromList [(4,2)] triggers *** Exception: Prelude.undefined Nevertheless, silent change in Data.List.foldl' semantics sounds kinda dangerous. Cheers, Milan
-----Original message----- From: David Feuer
Sent: 3 Nov 2014, 11:51 As Duncan Coutts explains toward the end of http://www.well-typed.com/blog/90/ (which proposes something else I personally *don't* endorse), foldl', the strict foldl, isn't actually strict enough. In particular, it's only conditionally strict in the initial value for the accumulator:
foldl' (\_ x -> x) undefined [3] = 3
Why does this matter? Strictness analysis needs to look at (and be able to look at) the function passed to foldl' to determine whether the expression is strict in the initial value. foldl'-as-foldr tends to complicate this sort of analysis already.
Proposal: make foldl' unconditionally strict in the initial accumulator value, both in GHC.List and in (the default definition in) Data.Foldable, and make foldr' in Data.Foldable unconditionally strict in the initial value of its accumulator.
Specifically,
foldl' k z0 xs = foldr (\v fn z -> z `seq` fn (k z v)) id xs z0
would change to
foldl' k !z0 xs = foldr (\v fn z -> z `seq` fn (k z v)) id xs z0
_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

On Mon, Nov 3, 2014 at 12:39 PM, Milan Straka
Hi all,
FYI, foldl's in containers already have this property being strict in the initial value. Therefore, Data.Map.foldl' (\_ x -> x) undefined $ Data.Map.fromList [(4,2)] triggers *** Exception: Prelude.undefined
OUCH. This is a really nasty potential gotcha for Foldable users. Whichever way things go (and I think Michael Snoyman almost has me convinced that my proposal is bad), I think uniform treatment of the initial value across Foldable instances would be a very good thing. If that means making the containers ones lazier, then so be it.
participants (5)
-
David Feuer
-
Henning Thielemann
-
Michael Snoyman
-
Milan Straka
-
Niklas Hambüchen