
Hi, some months ago I tried to make foldl a good consumer in the common case. The starting point is always to write foldl k a xs = foldr (\v f a -> f (v `k` a)) id xs a and then somehow make GHC produce good code with this. I came up with two solutions: A more sophisticated compiler analysis (Call Arity), or an explicit annotation in the form of foldlB k a xs = foldr (\v f -> oneShot (\a -> f (v `k` a))) id xs a where oneShot :: (a -> b) -> (a -> b) is a built-in function, semantically the identity, but telling the compiler that it can assume that the (\a -> ...) is called at most once. Back then, we decided to use Call Arity, on the grounds that it might improve other code as well, despite not having a lot of evindence that this may happen. Then recently David Feuer built on my work by making more functions fuse, including functions like scanl that are not built on foldl, but benefit from the same analysis. This supports the usefulness of Call Arity. But he also found cases where Call Arity is not powerful enough and GHC would produce bad code. So I wanted to properly compare Call Arity with the oneShot approach. Based on todays master (0855b249), I disabled Call Arity and changed the definitions of foldl, foldl', scanl and scanl' to use oneShot, and ran nofib. The result are mixed. With the current code, the oneShot machinery does not always work as expected: Program Size Allocs Runtime Elapsed TotalMem Min -0.1% -1.5% -2.8% -2.8% -5.8% Max +0.4% +4.7% +5.8% +5.6% +5.4% Geometric Mean -0.0% +0.1% +0.3% +0.3% +0.1% The biggest loser is calendar, which uses scanl. I am not fully sure what went wrong here: Either the one-shot annotation on the lambda’s variable got lost somewhere in the pipeline, or despite it being there, the normal arity analysis did not use it. But there is also a winner, fft2, with -1.5% allocations. Here Call Arity was not good enough, but oneShot did the jobs. There is also the option of combining both. Then we do not get the regression, but still the improvement for fft2: Min -0.1% -1.5% -3.9% -3.8% -5.8% Max +0.2% +0.1% +6.4% +6.3% +13.1% Geometric Mean -0.0% -0.0% +0.0% +0.0% +0.1% The oneShot code is on the branch wip/oneShot. The changes are clearly not ready to be merged. In particular, there is the question of how to best keep the oneShot annotation in the unfoldings: The isOneShotLambda flag is currently not stored in the interface. I work around this by making sure that the oneShot function is never inlined in unfoldings, but maybe it would be better to serialize the isOneShotLambda flag in interfaces, which might have other good effects? If we want as much performance as possible, we should simply include both approaches. But there might be other things to consider... so not sure what the best thing to do is. Greetings, Joachim -- Joachim “nomeata” Breitner mail@joachim-breitner.de • http://www.joachim-breitner.de/ Jabber: nomeata@joachim-breitner.de • GPG-Key: 0xF0FBF51F Debian Developer: nomeata@debian.org