
As I have stated, I believe there are many use cases, even to the common worker/wrapper pattern that seeks to reduce the amount of parameter
#12808: For closures, Loop Invariant Code Flow related to captured free values not lifted outside the loop... -------------------------------------+------------------------------------- Reporter: GordonBGood | Owner: Type: bug | Status: new Priority: normal | Milestone: 8.2.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by GordonBGood): Replying to [comment:19 simonpj]: passing by enclosing a recursive worker closure inside a wrapper; any gains from this pattern could be cancelled and more if the wrapped enclosure is less efficient.
Let me say what I ''think'' you are saying. Consider {{{ f x p q = let g y = ...g y'...x... in (g p, g q) }}} Left as-is we allocate a closure for `g` every time we call `f`. But
{{{ g2 x y = ...g2 x y'...x...
f x p q = (g2 x p, g2 x q) }}} Now we don't allocate a closure for `g`. That is good. Is this what you mean?
We don't want to do this in general, early in optimisation, because we get huge benefits from being able to "see" the binding site of `g`'s free variable `x`. But these benefits are over when it comes to code generation.
So we have experimented with so-called "late lambda lifting" (LLF). There's a whole wiki page about it: [wiki:LateLamLift]. It can be a real win.
One obstacle to LLF is, ironically, that it can destroy join points (see
instead we could lambda-lift `g`: the wiki page). A second benefit of Luke's new join-point work is that it becomes much easier to ensure that LLF doesn't destroy join points, and thus renders it much more practical. I think Luke will turn his attention to it once join points are solidly in. Yes, Simon, I manually lifted the closure by turning all the otherwise free variables into arguments to the function; although I didn't actually move the code to the top level it could have been, as it is no longer a closure capturing free variables but rather just a free function. However, my point in doing so was not to show that even this early manual lambda lifting is effective but that the code generated inside the function became so much more efficient due to seeing the Loop Invariant Code Flow (LICF) and lifting the register loading outside of the loop; in my mind the compiler should have done this whether the code was a closure or not. As the article on lambda lifting says, this lambda lifting at an early stage can prevent some optimizations in some cases, and definitely there will be some cost in this case in run time overhead in passing all of those extra arguments to the function each and every time the function is called; however, as the recursive calls are eliminated by the tail call optimization of making a loop internal to the function, the function only gets called very few times so as to have a negligible overall impact here. There may be other negative effects of the very early lambda lifting, but again they are negligible compared to the gains made in efficiency of the internal loop due to properly using LICF lifting. So the question I pose is "Why isn't LICF lifting used for the code internal to closures when it is used for non-closures?". Your Lambda Lifting and Join Point Analysis can serve to reduce this problem by eliminating closures, but the problem is still there for cases where the closures can't be eliminated and/or shouldn't be lifted. I sometimes wonder whether this is the problem that makes LL and JPA appear to be so effective: this problem makes the code that doesn't use JPA/LL much slower than it would otherwise be so that if this problem were not there, the gains made from LL/JPA in eliminating some/many closures would not likely be so great. I brought up the wrapper/worker pattern because its whole point is to reduce the number of passed parameters that need to be tail call optimized away, but the "worker" then must needs be a closure; with the problem of this issue, in many cases the resulting wrapper/worker will be slower than if we didn't factor in the closure "worker". In the sieve code of this thread, the "nxtc" closure as originally written is essentially a "worker" to the "nxtp" wrapper function and manually lifting it as I did means it is no longer a "worker". I should have gotten slightly worse performance in doing this but instead got much better performance because the compiler now did LICF optimization on the non-closure, where it currently doesn't seem to be able to do it on a non-closure. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12808#comment:20 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler