Is scanl' safe for foldr/build fusion?

In a comment on trac #9345, nomeata proposed the implementation of Data.List.inits below. He indicates that the performance is very good, and wrote it in this fashion so it could fuse. My concern is that since seq is used in the argument to build, we need to make sure that fusion on the left will be safe. If I did the calculations right, this specifically requires that for all f, cons, nil, and bs, and all a≠_|_, the following "correct" expression e1 = a `cons` foldr cons nil $ foldr (\b g x -> (let b' = f x b in b' `seq` (b' : g b'))) (\b -> b `seq` []) bs a gives the same result as the fused expression e2 = a `cons` foldr (\b g x -> (let b' = f x b in b' `seq` (b' `cons` g b'))) (\b -> b `seq` nil) bs a I haven't been able to figure out how to prove this or give a counterexample, so far, but I have very little experience with such. myScanl' :: (b -> a -> b) -> b -> [a] -> [b] myScanl' f a bs = build $ \c n -> a `seq` a `c` foldr (\b g x -> (let b' = f x b in b' `seq` (b' `c` g b'))) (\b -> b `seq` n) bs a initsQ2 :: [a] -> [[a]] initsQ2 = map toListQ . myScanl' snocQ emptyQ

Hi David, Am Mittwoch, den 23.07.2014, 16:17 -0400 schrieb David Feuer:
In a comment on trac #9345, nomeata proposed the implementation of Data.List.inits below. He indicates that the performance is very good, and wrote it in this fashion so it could fuse. My concern is that since seq is used in the argument to build, we need to make sure that fusion on the left will be safe. If I did the calculations right, this specifically requires that for all f, cons, nil, and bs, and all a≠_| _, the following "correct" expression
e1 = a `cons` foldr cons nil $ foldr (\b g x -> (let b' = f x b in b' `seq` (b' : g b'))) (\b -> b `seq` []) bs a
gives the same result as the fused expression
e2 = a `cons` foldr (\b g x -> (let b' = f x b in b' `seq` (b' `cons` g b'))) (\b -> b `seq` nil) bs a
I haven't been able to figure out how to prove this or give a counterexample, so far, but I have very little experience with such.
myScanl' :: (b -> a -> b) -> b -> [a] -> [b] myScanl' f a bs = build $ \c n -> a `seq` a `c` foldr (\b g x -> (let b' = f x b in b' `seq` (b' `c` g b'))) (\b -> b `seq` n) bs a
I don’t have much experience either, but let’s try. First of all, let’s simplify it a bit. It is unlikely that the problem will appear for lists of varying length. Since all finite lists are one-element-lists¹, let us substitute bs = [b]: myScanl' f a [b] == build $ \c n -> a `seq` a `c` foldr (\b g x -> (let b' = f x b in b' `seq` (b' `c` g b'))) (\b -> b `seq` n) (b : []) a == simplifying foldr build $ \c n -> a `seq` a `c` (\ x -> let b' = f x b in b' `seq` (b' `c` (b' `seq` n))) a == more β-reduction, redundant seq build $ \c n -> a `seq` a `c` let b' = f a b in b' `seq` (b' `c` n) Now let’s calculate the expression foldr c n $ myScanl' f a [b] without and with fusion. before_fusion == foldr c n $ myScanl' f a [b] == foldr c n $ a `seq` (a : let b' = f a b in b' `seq` (b' : [])) == foldr is strict in its third argument, so it commutes with seq a `seq` foldr c n $ (a : let b' = f a b in b' `seq` (b' : [])) == foldr on cons a `seq` a `c` (foldr c n $ let b' = f a b in b' `seq` (b' : [])) == foldr commutes with let and seq a `seq` a `c` (let b' = f a b in b' `seq` (foldr c n $ b' : [])) == foldr on cons and [] a `seq` a `c` (let b' = f a b in b' `seq` (b' `c` n)) after_fusion == (\c n -> a `seq` a `c` let b' = f a b in b' `seq` (b' `c` n)) c n == a `seq` a `c` (let b' = f a b in b' `seq` (b' `c` n)) So they yield the same results, independent of c and n. This includes the counter example from “Free Theorems in the Presence of seq”, (c = seq, n = []). Judging from this I conclude that nothing can go wrong for finite lists. What about partial lists, i.e. bs = b : ⊥? myScanl' f a (b : ⊥) == build $ \c n -> a `seq` a `c` foldr (\b g x -> (let b' = f x b in b' `seq` (b' `c` g b'))) (\b -> b `seq` n) (b : ⊥) a == simplifying foldr build $ \c n -> a `seq` a `c` (\ x -> let b' = f x b in b' `seq` (b' `c` (b' `seq` ⊥))) a == more β-reduction, redundant seq build $ \c n -> a `seq` a `c` let b' = f a b in b' `seq` (b' `c` ⊥) before_fusion == foldr c n $ myScanl' f a (b:⊥) == foldr c n $ a `seq` (a : let b' = f a b in b' `seq` (b' : ⊥)) == foldr is strict in its third argument, so it commutes with seq a `seq` foldr c n $ (a : let b' = f a b in b' `seq` (b' : ⊥)) == foldr on cons a `seq` a `c` (foldr c n $ let b' = f a b in b' `seq` (b' : ⊥)) == foldr commutes with let and seq a `seq` a `c` (let b' = f a b in b' `seq` (foldr c n $ b' : ⊥)) == foldr on cons and [] a `seq` a `c` (let b' = f a b in b' `seq` (b' `c` ⊥)) after_fusion == (\c n -> a `seq` a `c` let b' = f a b in b' `seq` (b' `c` ⊥)) c n == a `seq` a `c` (let b' = f a b in b' `seq` (b' `c` ⊥)) So nothing fancy happening here either. I conclude that we are safe, and my gut feeling tell me the reason is likely that we don’t `seq` anything built from the unknown c and n. Greetings, Joachim ¹ citation needed -- Joachim “nomeata” Breitner mail@joachim-breitner.de • http://www.joachim-breitner.de/ Jabber: nomeata@joachim-breitner.de • GPG-Key: 0xF0FBF51F Debian Developer: nomeata@debian.org

Your approach points the way very clearly toward the proper inductive
argument as well, so I will look into that when I get home from the
dentist. Assuming all goes as planned, you seem to have a winner. One thing
that caught my eye in your benchmark report: it looked to me like initsQ2
was very slightly slower than initsQ on the NF tests. Do you think you
could do some bigger tests there and see if it's real? If so, my *guess* is
that the scanl' is wasting time forcing things that have already been
forced, which we probably can't do anything about but should bear in mind.
David Feuer
On Jul 24, 2014 10:02 AM, "Joachim Breitner"
Hi David,
Am Mittwoch, den 23.07.2014, 16:17 -0400 schrieb David Feuer:
In a comment on trac #9345, nomeata proposed the implementation of Data.List.inits below. He indicates that the performance is very good, and wrote it in this fashion so it could fuse. My concern is that since seq is used in the argument to build, we need to make sure that fusion on the left will be safe. If I did the calculations right, this specifically requires that for all f, cons, nil, and bs, and all a≠_| _, the following "correct" expression
e1 = a `cons` foldr cons nil $ foldr (\b g x -> (let b' = f x b in b' `seq` (b' : g b'))) (\b -> b `seq` []) bs a
gives the same result as the fused expression
e2 = a `cons` foldr (\b g x -> (let b' = f x b in b' `seq` (b' `cons` g b'))) (\b -> b `seq` nil) bs a
I haven't been able to figure out how to prove this or give a counterexample, so far, but I have very little experience with such.
myScanl' :: (b -> a -> b) -> b -> [a] -> [b] myScanl' f a bs = build $ \c n -> a `seq` a `c` foldr (\b g x -> (let b' = f x b in b' `seq` (b' `c` g b'))) (\b -> b `seq` n) bs a
I don’t have much experience either, but let’s try.
First of all, let’s simplify it a bit. It is unlikely that the problem will appear for lists of varying length. Since all finite lists are one-element-lists¹, let us substitute bs = [b]:
myScanl' f a [b] == build $ \c n -> a `seq` a `c` foldr (\b g x -> (let b' = f x b in b' `seq` (b' `c` g b'))) (\b -> b `seq` n) (b : []) a == simplifying foldr build $ \c n -> a `seq` a `c` (\ x -> let b' = f x b in b' `seq` (b' `c` (b' `seq` n))) a == more β-reduction, redundant seq build $ \c n -> a `seq` a `c` let b' = f a b in b' `seq` (b' `c` n)
Now let’s calculate the expression foldr c n $ myScanl' f a [b] without and with fusion.
before_fusion == foldr c n $ myScanl' f a [b] == foldr c n $ a `seq` (a : let b' = f a b in b' `seq` (b' : [])) == foldr is strict in its third argument, so it commutes with seq a `seq` foldr c n $ (a : let b' = f a b in b' `seq` (b' : [])) == foldr on cons a `seq` a `c` (foldr c n $ let b' = f a b in b' `seq` (b' : [])) == foldr commutes with let and seq a `seq` a `c` (let b' = f a b in b' `seq` (foldr c n $ b' : [])) == foldr on cons and [] a `seq` a `c` (let b' = f a b in b' `seq` (b' `c` n))
after_fusion == (\c n -> a `seq` a `c` let b' = f a b in b' `seq` (b' `c` n)) c n == a `seq` a `c` (let b' = f a b in b' `seq` (b' `c` n))
So they yield the same results, independent of c and n. This includes the counter example from “Free Theorems in the Presence of seq”, (c = seq, n = []).
Judging from this I conclude that nothing can go wrong for finite lists. What about partial lists, i.e. bs = b : ⊥?
myScanl' f a (b : ⊥) == build $ \c n -> a `seq` a `c` foldr (\b g x -> (let b' = f x b in b' `seq` (b' `c` g b'))) (\b -> b `seq` n) (b : ⊥) a == simplifying foldr build $ \c n -> a `seq` a `c` (\ x -> let b' = f x b in b' `seq` (b' `c` (b' `seq` ⊥))) a == more β-reduction, redundant seq build $ \c n -> a `seq` a `c` let b' = f a b in b' `seq` (b' `c` ⊥)
before_fusion == foldr c n $ myScanl' f a (b:⊥) == foldr c n $ a `seq` (a : let b' = f a b in b' `seq` (b' : ⊥)) == foldr is strict in its third argument, so it commutes with seq a `seq` foldr c n $ (a : let b' = f a b in b' `seq` (b' : ⊥)) == foldr on cons a `seq` a `c` (foldr c n $ let b' = f a b in b' `seq` (b' : ⊥)) == foldr commutes with let and seq a `seq` a `c` (let b' = f a b in b' `seq` (foldr c n $ b' : ⊥)) == foldr on cons and [] a `seq` a `c` (let b' = f a b in b' `seq` (b' `c` ⊥))
after_fusion == (\c n -> a `seq` a `c` let b' = f a b in b' `seq` (b' `c` ⊥)) c n == a `seq` a `c` (let b' = f a b in b' `seq` (b' `c` ⊥))
So nothing fancy happening here either. I conclude that we are safe, and my gut feeling tell me the reason is likely that we don’t `seq` anything built from the unknown c and n.
Greetings, Joachim
¹ citation needed
-- Joachim “nomeata” Breitner mail@joachim-breitner.de • http://www.joachim-breitner.de/ Jabber: nomeata@joachim-breitner.de • GPG-Key: 0xF0FBF51F Debian Developer: nomeata@debian.org
_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

Ah, never mind that last remark. If it allocates less, then even that tiny
slowdown applies only to benchmarks.
On Jul 24, 2014 10:41 AM, "David Feuer"
Your approach points the way very clearly toward the proper inductive argument as well, so I will look into that when I get home from the dentist. Assuming all goes as planned, you seem to have a winner. One thing that caught my eye in your benchmark report: it looked to me like initsQ2 was very slightly slower than initsQ on the NF tests. Do you think you could do some bigger tests there and see if it's real? If so, my *guess* is that the scanl' is wasting time forcing things that have already been forced, which we probably can't do anything about but should bear in mind.
David Feuer On Jul 24, 2014 10:02 AM, "Joachim Breitner"
wrote: Hi David,
Am Mittwoch, den 23.07.2014, 16:17 -0400 schrieb David Feuer:
In a comment on trac #9345, nomeata proposed the implementation of Data.List.inits below. He indicates that the performance is very good, and wrote it in this fashion so it could fuse. My concern is that since seq is used in the argument to build, we need to make sure that fusion on the left will be safe. If I did the calculations right, this specifically requires that for all f, cons, nil, and bs, and all a≠_| _, the following "correct" expression
e1 = a `cons` foldr cons nil $ foldr (\b g x -> (let b' = f x b in b' `seq` (b' : g b'))) (\b -> b `seq` []) bs a
gives the same result as the fused expression
e2 = a `cons` foldr (\b g x -> (let b' = f x b in b' `seq` (b' `cons` g b'))) (\b -> b `seq` nil) bs a
I haven't been able to figure out how to prove this or give a counterexample, so far, but I have very little experience with such.
myScanl' :: (b -> a -> b) -> b -> [a] -> [b] myScanl' f a bs = build $ \c n -> a `seq` a `c` foldr (\b g x -> (let b' = f x b in b' `seq` (b' `c` g b'))) (\b -> b `seq` n) bs a
I don’t have much experience either, but let’s try.
First of all, let’s simplify it a bit. It is unlikely that the problem will appear for lists of varying length. Since all finite lists are one-element-lists¹, let us substitute bs = [b]:
myScanl' f a [b] == build $ \c n -> a `seq` a `c` foldr (\b g x -> (let b' = f x b in b' `seq` (b' `c` g b'))) (\b -> b `seq` n) (b : []) a == simplifying foldr build $ \c n -> a `seq` a `c` (\ x -> let b' = f x b in b' `seq` (b' `c` (b' `seq` n))) a == more β-reduction, redundant seq build $ \c n -> a `seq` a `c` let b' = f a b in b' `seq` (b' `c` n)
Now let’s calculate the expression foldr c n $ myScanl' f a [b] without and with fusion.
before_fusion == foldr c n $ myScanl' f a [b] == foldr c n $ a `seq` (a : let b' = f a b in b' `seq` (b' : [])) == foldr is strict in its third argument, so it commutes with seq a `seq` foldr c n $ (a : let b' = f a b in b' `seq` (b' : [])) == foldr on cons a `seq` a `c` (foldr c n $ let b' = f a b in b' `seq` (b' : [])) == foldr commutes with let and seq a `seq` a `c` (let b' = f a b in b' `seq` (foldr c n $ b' : [])) == foldr on cons and [] a `seq` a `c` (let b' = f a b in b' `seq` (b' `c` n))
after_fusion == (\c n -> a `seq` a `c` let b' = f a b in b' `seq` (b' `c` n)) c n == a `seq` a `c` (let b' = f a b in b' `seq` (b' `c` n))
So they yield the same results, independent of c and n. This includes the counter example from “Free Theorems in the Presence of seq”, (c = seq, n = []).
Judging from this I conclude that nothing can go wrong for finite lists. What about partial lists, i.e. bs = b : ⊥?
myScanl' f a (b : ⊥) == build $ \c n -> a `seq` a `c` foldr (\b g x -> (let b' = f x b in b' `seq` (b' `c` g b'))) (\b -> b `seq` n) (b : ⊥) a == simplifying foldr build $ \c n -> a `seq` a `c` (\ x -> let b' = f x b in b' `seq` (b' `c` (b' `seq` ⊥))) a == more β-reduction, redundant seq build $ \c n -> a `seq` a `c` let b' = f a b in b' `seq` (b' `c` ⊥)
before_fusion == foldr c n $ myScanl' f a (b:⊥) == foldr c n $ a `seq` (a : let b' = f a b in b' `seq` (b' : ⊥)) == foldr is strict in its third argument, so it commutes with seq a `seq` foldr c n $ (a : let b' = f a b in b' `seq` (b' : ⊥)) == foldr on cons a `seq` a `c` (foldr c n $ let b' = f a b in b' `seq` (b' : ⊥)) == foldr commutes with let and seq a `seq` a `c` (let b' = f a b in b' `seq` (foldr c n $ b' : ⊥)) == foldr on cons and [] a `seq` a `c` (let b' = f a b in b' `seq` (b' `c` ⊥))
after_fusion == (\c n -> a `seq` a `c` let b' = f a b in b' `seq` (b' `c` ⊥)) c n == a `seq` a `c` (let b' = f a b in b' `seq` (b' `c` ⊥))
So nothing fancy happening here either. I conclude that we are safe, and my gut feeling tell me the reason is likely that we don’t `seq` anything built from the unknown c and n.
Greetings, Joachim
¹ citation needed
-- Joachim “nomeata” Breitner mail@joachim-breitner.de • http://www.joachim-breitner.de/ Jabber: nomeata@joachim-breitner.de • GPG-Key: 0xF0FBF51F Debian Developer: nomeata@debian.org
_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries
participants (2)
-
David Feuer
-
Joachim Breitner