Re: [GHC] #12808: For closures, Loop Invariant Code Flow related to captured free values not lifted outside the loop...

11 Jan 2017

      ...
...
As I have stated, I believe there are many use cases, even to the
 common worker/wrapper pattern that seeks to reduce the amount of parameter
#12808: For closures, Loop Invariant Code Flow related to captured free values not
lifted outside the loop...
-------------------------------------+-------------------------------------
        Reporter:  GordonBGood       |                Owner:
            Type:  bug               |               Status:  new
        Priority:  normal            |            Milestone:  8.2.1
       Component:  Compiler          |              Version:  8.0.1
      Resolution:                    |             Keywords:  JoinPoints
Operating System:  Unknown/Multiple  |         Architecture:
 Type of failure:  Runtime           |  Unknown/Multiple
  performance bug                    |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:                    |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by GordonBGood):

 Replying to [comment:19 simonpj]:
 passing by enclosing a recursive worker closure inside a wrapper; any
 gains from this pattern could be cancelled and more if the wrapped
 enclosure is less efficient.
...
Let me say what I ''think'' you are saying.  Consider
{{{
f x p q = let g y = ...g y'...x...
          in (g p, g q)
}}}
Left as-is we allocate a closure for `g` every time we call `f`.  But
...
{{{
g2 x y = ...g2 x y'...x...
f x p q = (g2 x p, g2 x q)
}}}
Now we don't allocate a closure for `g`.  That is good.  Is this what
 you mean?
We don't want to do this in general, early in optimisation, because we
 get huge benefits from being able to "see" the binding site of `g`'s free
 variable `x`.  But these benefits are over when it comes to code
 generation.
So we have experimented with so-called "late lambda lifting" (LLF).
 There's a whole wiki page about it: [wiki:LateLamLift].  It can be a real
 win.
One obstacle to LLF is, ironically, that it can destroy join points (see
instead we could lambda-lift `g`:
 the wiki page).  A second benefit of Luke's new join-point work is that it
 becomes much easier to ensure that LLF doesn't destroy join points, and
 thus renders it much more practical.  I think Luke will turn his attention
 to it once join points are solidly in.

 Yes, Simon, I manually lifted the closure by turning all the otherwise
 free variables into arguments to the function; although I didn't actually
 move the code to the top level it could have been, as it is no longer a
 closure capturing free variables but rather just a free function.

 However, my point in doing so was not to show that even this early manual
 lambda lifting is effective but that the code generated inside the
 function became so much more efficient due to seeing the Loop Invariant
 Code Flow (LICF) and lifting the register loading outside of the loop; in
 my mind the compiler should have done this whether the code was a closure
 or not.

 As the article on lambda lifting says, this lambda lifting at an early
 stage can prevent some optimizations in some cases, and definitely there
 will be some cost in this case in run time overhead in passing all of
 those extra arguments to the function each and every time the function is
 called; however, as the recursive calls are eliminated by the tail call
 optimization of making a loop internal to the function, the function only
 gets called very few times so as to have a negligible overall impact here.
 There may be other negative effects of the very early lambda lifting, but
 again they are negligible compared to the gains made in efficiency of the
 internal loop due to properly using LICF lifting.

 So the question I pose is "Why isn't LICF lifting used for the code
 internal to closures when it is used for non-closures?".

 Your Lambda Lifting and Join Point Analysis can serve to reduce this
 problem by eliminating closures, but the problem is still there for cases
 where the closures can't be eliminated and/or shouldn't be lifted.

 I sometimes wonder whether this is the problem that makes LL and JPA
 appear to be so effective:  this problem makes the code that doesn't use
 JPA/LL much slower than it would otherwise be so that if this problem were
 not there, the gains made from LL/JPA in eliminating some/many closures
 would not likely be so great.

 I brought up the wrapper/worker pattern because its whole point is to
 reduce the number of passed parameters that need to be tail call optimized
 away, but the "worker" then must needs be a closure; with the problem of
 this issue, in many cases the resulting wrapper/worker will be slower than
 if we didn't factor in the closure "worker".  In the sieve code of this
 thread, the "nxtc" closure as originally written is essentially a "worker"
 to the "nxtp" wrapper function and manually lifting it as I did means it
 is no longer a "worker".  I should have gotten slightly worse performance
 in doing this but instead got much better performance because the compiler
 now did LICF optimization on the non-closure, where it currently doesn't
 seem to be able to do it on a non-closure.

--
Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12808#comment:20
GHC http://www.haskell.org/ghc/
The Glasgow Haskell Compiler