Proposal: (breaking change to avoid fragile breakage) change the definition of foldr2, zip, and zipWith

BACKGROUND TRAC: #9495 GHC's definition of foldr2 violates the spec of the Haskell Report, making some programs less defined than they should be, and does so in a way that is quite fragile, dependent on a number of different factors—that is, it is likely hard to debug. Details: The current "baseline" definition of foldr2 (before rewrite rules) does what the Report requires: foldr2 :: (a -> b -> c -> c) -> c -> [a] -> [b] -> c foldr2 k z = go where go [] _ys = z go _xs [] = z go (x:xs) (y:ys) = k x y (go xs ys) There are two rewrite rules, foldr2/left and foldr2/right, designed to allow foldr2 to fuse with build appearing in either the first or second list argument. foldr2/left is perfectly safe (as long as the build argument is legitimate). foldr2/right is not, as comments in the source partially explain. The problem shows up if you have something shaped like zip [1,7,42,5] (1:8:4:11:undefined) but where the second list (ending in undefined) is actually produced using build, and GHC chooses to fuse foldr2 with it. The Report requires the above to produce [(1,1),(7,8),(42,4),(5,11)], but GHC will instead produce (1,1):(7,8):(42,4):(5,11):_|_ This issue can make a program that works with -O0 fail with -O, and can cause fragility depending on exactly what fuses. SCOPE One obvious question is what kinds of build forms can cause problems. Many, like map and fromEnumTo, don't have enough power to be likely to cause trouble. Joachim Breitner, however, realized that the current implementation of filter has enough power to break things. So here's a complete example of code that will break with -O but not with -O0: {-# NOINLINE don'tFuse #-} don'tFuse = id p x = 100 `mod` x < 20 main = print $ zip (don'tFuse [10,9..1]) (filter p [10,9,..]) Things will only get worse if I succeed in my plan to bring unfoldr into the framework. SOLUTIONS 1. One solution, of course, is to eliminate unfoldr2/right, bringing GHC into compliance with the Report. I really like this idea, but Joachim thinks it is not worth the potential performance impact on code written before or without regard for the change. We both agree that a reasonable alternative is 3. Modify the baseline definition of unfoldr2 as follows: foldr2 :: (a -> b -> c -> c) -> c -> [a] -> [b] -> c foldr2 k z = go where go [] ys = ys `seq` z go _xs [] = z go (x:xs) (y:ys) = k x y (go xs ys) This should, we believe, make the baseline definition fail where the one fused by foldr2/right would fail, giving consistent semantics. WHAT MIGHT BREAK Code that currently works but will break with this change has two properties: 1. It relies on the asymmetry in foldr, zipWith, or zip to avoid running into bottom. 2. foldr2/right does not fire, either because the second list is not a good producer or because GHC applies the foldr2/left rule instead. That is, most of the code that this change will break is fragile under the current scheme. DISCUSSION PERIOD Standard two weeks.

It's been a couple weeks now, and no one's responded. I'd like to
submit a patch for this one ASAP to update source and/or
documentation, but I would really like to get some guidance from
anyone else who may care.
On Sun, Aug 24, 2014 at 3:22 PM, David Feuer
BACKGROUND
TRAC: #9495
GHC's definition of foldr2 violates the spec of the Haskell Report, making some programs less defined than they should be, and does so in a way that is quite fragile, dependent on a number of different factors—that is, it is likely hard to debug.
Details:
The current "baseline" definition of foldr2 (before rewrite rules) does what the Report requires:
foldr2 :: (a -> b -> c -> c) -> c -> [a] -> [b] -> c foldr2 k z = go where go [] _ys = z go _xs [] = z go (x:xs) (y:ys) = k x y (go xs ys)
There are two rewrite rules, foldr2/left and foldr2/right, designed to allow foldr2 to fuse with build appearing in either the first or second list argument. foldr2/left is perfectly safe (as long as the build argument is legitimate). foldr2/right is not, as comments in the source partially explain. The problem shows up if you have something shaped like
zip [1,7,42,5] (1:8:4:11:undefined)
but where the second list (ending in undefined) is actually produced using build, and GHC chooses to fuse foldr2 with it. The Report requires the above to produce [(1,1),(7,8),(42,4),(5,11)], but GHC will instead produce (1,1):(7,8):(42,4):(5,11):_|_
This issue can make a program that works with -O0 fail with -O, and can cause fragility depending on exactly what fuses.
SCOPE
One obvious question is what kinds of build forms can cause problems. Many, like map and fromEnumTo, don't have enough power to be likely to cause trouble. Joachim Breitner, however, realized that the current implementation of filter has enough power to break things. So here's a complete example of code that will break with -O but not with -O0:
{-# NOINLINE don'tFuse #-} don'tFuse = id
p x = 100 `mod` x < 20
main = print $ zip (don'tFuse [10,9..1]) (filter p [10,9,..])
Things will only get worse if I succeed in my plan to bring unfoldr into the framework.
SOLUTIONS
1. One solution, of course, is to eliminate unfoldr2/right, bringing GHC into compliance with the Report. I really like this idea, but Joachim thinks it is not worth the potential performance impact on code written before or without regard for the change. We both agree that a reasonable alternative is
3. Modify the baseline definition of unfoldr2 as follows:
foldr2 :: (a -> b -> c -> c) -> c -> [a] -> [b] -> c foldr2 k z = go where go [] ys = ys `seq` z go _xs [] = z go (x:xs) (y:ys) = k x y (go xs ys)
This should, we believe, make the baseline definition fail where the one fused by foldr2/right would fail, giving consistent semantics.
WHAT MIGHT BREAK
Code that currently works but will break with this change has two properties:
1. It relies on the asymmetry in foldr, zipWith, or zip to avoid running into bottom. 2. foldr2/right does not fire, either because the second list is not a good producer or because GHC applies the foldr2/left rule instead.
That is, most of the code that this change will break is fragile under the current scheme.
DISCUSSION PERIOD
Standard two weeks.

Dear David, Am Montag, den 08.09.2014, 02:58 -0400 schrieb David Feuer:
It's been a couple weeks now, and no one's responded.
that can happen. Usually not because noone cares, but because noone knows what’s best. And also there was ICFP last week, which probably kept people busy ... so don’t be discouraged by silence, and keep following up on it.
On Sun, Aug 24, 2014 at 3:22 PM, David Feuer
wrote: BACKGROUND
TRAC: #9495
SOLUTIONS
1. One solution, of course, is to eliminate unfoldr2/right, bringing GHC into compliance with the Report. I really like this idea, but Joachim thinks it is not worth the potential performance impact on code written before or without regard for the change. We both agree that a reasonable alternative is
3. Modify the baseline definition of unfoldr2 as follows:
foldr2 :: (a -> b -> c -> c) -> c -> [a] -> [b] -> c foldr2 k z = go where go [] ys = ys `seq` z go _xs [] = z go (x:xs) (y:ys) = k x y (go xs ys)
This should, we believe, make the baseline definition fail where the one fused by foldr2/right would fail, giving consistent semantics.
You already said that I agree with that solution, but I can re-state it here :-)
WHAT MIGHT BREAK
Code that currently works but will break with this change has two properties:
1. It relies on the asymmetry in foldr, zipWith, or zip to avoid running into bottom. 2. foldr2/right does not fire, either because the second list is not a good producer or because GHC applies the foldr2/left rule instead.
That is, most of the code that this change will break is fragile under the current scheme.
DISCUSSION PERIOD
Standard two weeks.
Given that nobody complained about the standard-non-conformance so far I think having a symmetric zip where the RULES are semantics-preserving is more useful than strictly following the report. But I could be convinced otherwise. David, I think you can go ahead and prepare a patch. People can still speak up before (and even after) it is applied if they disagree. Greetings, Joachim -- Joachim “nomeata” Breitner mail@joachim-breitner.de • http://www.joachim-breitner.de/ Jabber: nomeata@joachim-breitner.de • GPG-Key: 0xF0FBF51F Debian Developer: nomeata@debian.org

| David, I think you can go ahead and prepare a patch. People can still
| speak up before (and even after) it is applied if they disagree.
I'm way behind the curve here, but this is really a matter for the Core Libraries Committee, and I gather you are already in good communication with Edward. You can use GHC's Trac for Core Libraries tickets; just use "Core Libraries" for the "Component" field.
Simon
| -----Original Message-----
| From: Libraries [mailto:libraries-bounces@haskell.org] On Behalf Of
| Joachim Breitner
| Sent: 08 September 2014 09:25
| To: David Feuer
| Cc: Haskell Libraries
| Subject: Re: Proposal: (breaking change to avoid fragile breakage)
| change the definition of foldr2, zip, and zipWith
|
| Dear David,
|
|
| Am Montag, den 08.09.2014, 02:58 -0400 schrieb David Feuer:
| > It's been a couple weeks now, and no one's responded.
|
| that can happen. Usually not because noone cares, but because noone
| knows what’s best. And also there was ICFP last week, which probably
| kept people busy ... so don’t be discouraged by silence, and keep
| following up on it.
|
| > On Sun, Aug 24, 2014 at 3:22 PM, David Feuer

Hi, Am Montag, den 08.09.2014, 10:24 +0200 schrieb Joachim Breitner:
Given that nobody complained about the standard-non-conformance so far I think having a symmetric zip where the RULES are semantics-preserving is more useful than strictly following the report. But I could be convinced otherwise.
David, I think you can go ahead and prepare a patch. People can still speak up before (and even after) it is applied if they disagree.
nobody complained, so I’m going to apply David’s patch at https://ghc.haskell.org/trac/ghc/attachment/ticket/9495/foldr2.diff now. Greetings, Joachim -- Joachim “nomeata” Breitner mail@joachim-breitner.de • http://www.joachim-breitner.de/ Jabber: nomeata@joachim-breitner.de • GPG-Key: 0xF0FBF51F Debian Developer: nomeata@debian.org
participants (3)
-
David Feuer
-
Joachim Breitner
-
Simon Peyton Jones