BACKGROUND

TRAC: #9495

GHC's definition of foldr2 violates the spec of the Haskell Report, making some programs less defined than they should be, and does so in a way that is quite fragile, dependent on a number of different factors—that is, it is likely hard to debug.

Details:

The current "baseline" definition of foldr2 (before rewrite rules) does what the Report requires:

foldr2 :: (a -> b -> c -> c) -> c -> [a] -> [b] -> c
foldr2 k z = go
  where
    go [] _ys = z
    go _xs [] = z
    go (x:xs) (y:ys) = k x y (go xs ys)

There are two rewrite rules, foldr2/left and foldr2/right, designed to allow foldr2 to fuse with build appearing in either the first or second list argument. foldr2/left is perfectly safe (as long as the build argument is legitimate). foldr2/right is not, as comments in the source partially explain. The problem shows up if you have something shaped like

zip [1,7,42,5] (1:8:4:11:undefined)

but where the second list (ending in undefined) is actually produced using build, and GHC chooses to fuse foldr2 with it. The Report requires the above to produce [(1,1),(7,8),(42,4),(5,11)], but GHC will instead produce (1,1):(7,8):(42,4):(5,11):_|_

This issue can make a program that works with -O0 fail with -O, and can cause fragility depending on exactly what fuses.

SCOPE

One obvious question is what kinds of build forms can cause problems. Many, like map and fromEnumTo, don't have enough power to be likely to cause trouble. Joachim Breitner, however, realized that the current implementation of filter has enough power to break things. So here's a complete example of code that will break with -O but not with -O0:

{-# NOINLINE don'tFuse #-}
don'tFuse = id

p x = 100 `mod` x < 20

main = print $ zip (don'tFuse [10,9..1]) (filter p [10,9,..])

Things will only get worse if I succeed in my plan to bring unfoldr into the framework.

SOLUTIONS

1. One solution, of course, is to eliminate unfoldr2/right, bringing GHC into compliance with the Report. I really like this idea, but Joachim thinks it is not worth the potential performance impact on code written before or without regard for the change. We both agree that a reasonable alternative is

3. Modify the baseline definition of unfoldr2 as follows:

foldr2 :: (a -> b -> c -> c) -> c -> [a] -> [b] -> c
foldr2 k z = go
  where
    go [] ys = ys `seq` z
    go _xs [] = z
    go (x:xs) (y:ys) = k x y (go xs ys)

This should, we believe, make the baseline definition fail where the one fused by foldr2/right would fail, giving consistent semantics.

WHAT MIGHT BREAK

Code that currently works but will break with this change has two properties:

1. It relies on the asymmetry in foldr, zipWith, or zip to avoid running into bottom.
2. foldr2/right does not fire, either because the second list is not a good producer or because GHC applies the foldr2/left rule instead.

That is, most of the code that this change will break is fragile under the current scheme.

DISCUSSION PERIOD

Standard two weeks.