Call Arity, oneShot or both

25 Oct 2014

      Hi,

some months ago I tried to make foldl a good consumer in the common
case. The starting point is always to write

        foldl k a xs = foldr (\v f a -> f (v `k` a)) id xs a

and then somehow make GHC produce good code with this. I came up with
two solutions: A more sophisticated compiler analysis (Call Arity), or
an explicit annotation in the form of

        foldlB k a xs = foldr (\v f -> oneShot (\a -> f (v `k` a))) id
        xs a

where oneShot :: (a -> b) -> (a -> b) is a built-in function,
semantically the identity, but telling the compiler that it can assume
that the (\a -> ...) is called at most once.

Back then, we decided to use Call Arity, on the grounds that it might
improve other code as well, despite not having a lot of evindence that
this may happen.

Then recently David Feuer built on my work by making more functions
fuse, including functions like scanl that are not built on foldl, but
benefit from the same analysis. This supports the usefulness of Call
Arity.

But he also found cases where Call Arity is not powerful enough and GHC
would produce bad code. So I wanted to properly compare Call Arity with
the oneShot approach.

Based on todays master (0855b249), I disabled Call Arity and changed the
definitions of foldl, foldl', scanl and scanl' to use oneShot, and ran
nofib.

The result are mixed. With the current code, the oneShot machinery does
not always work as expected:

        Program           Size    Allocs   Runtime   Elapsed  TotalMem
            Min          -0.1%     -1.5%     -2.8%     -2.8%     -5.8%
            Max          +0.4%     +4.7%     +5.8%     +5.6%     +5.4%
 Geometric Mean          -0.0%     +0.1%     +0.3%     +0.3%     +0.1%

The biggest loser is calendar, which uses scanl. I am not fully sure
what went wrong here: Either the one-shot annotation on the lambda’s
variable got lost somewhere in the pipeline, or despite it being there,
the normal arity analysis did not use it.

But there is also a winner, fft2, with -1.5% allocations. Here Call
Arity was not good enough, but oneShot did the jobs.

There is also the option of combining both. Then we do not get the
regression, but still the improvement for fft2:

            Min          -0.1%     -1.5%     -3.9%     -3.8%     -5.8%
            Max          +0.2%     +0.1%     +6.4%     +6.3%    +13.1%
 Geometric Mean          -0.0%     -0.0%     +0.0%     +0.0%     +0.1%

The oneShot code is on the branch wip/oneShot. The changes are clearly
not ready to be merged. In particular, there is the question of how to
best keep the oneShot annotation in the unfoldings: The isOneShotLambda
flag is currently not stored in the interface. I work around this by
making sure that the oneShot function is never inlined in unfoldings,
but maybe it would be better to serialize the isOneShotLambda flag in
interfaces, which might have other good effects?

If we want as much performance as possible, we should simply include
both approaches. But there might be other things to consider... so not
sure what the best thing to do is.

Greetings,
Joachim

-- 
Joachim “nomeata” Breitner
  mail@joachim-breitner.de • http://www.joachim-breitner.de/
  Jabber: nomeata@joachim-breitner.de  • GPG-Key: 0xF0FBF51F
  Debian Developer: nomeata@debian.org

Joachim Breitner

Johan Tibell

Simon Peyton Jones

Joachim Breitner

Simon Peyton Jones

Joachim Breitner

Joachim Breitner

Simon Peyton Jones

Joachim Breitner

tags

participants (3)