
#9476: Implement late lambda-lifting -------------------------------------+------------------------------------- Reporter: simonpj | Owner: sgraf Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.8.2 Resolution: | Keywords: LateLamLift Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #8763 #13286 | Differential Rev(s): Wiki Page: LateLamLift | -------------------------------------+------------------------------------- Comment (by sgraf): Replying to [comment:28 simonpj]:
-fstg-lift-lams-known: Allow turning known into unknown calls.
In your example, you say that we might float `f` but not `g`. But why don't we float `g`?
Because at the time we handled `g`, it (hypothetically) was deemed non- beneficial to lift. It might lead to more allocation or hurt some other heuristic. Of course one could argue that lifting ''both'' bindings could be a win. Our decision strategy is greedy in that regard; There's no backtracking involved. We transform expressions going from outer to inner, at each step taking into account the lifting decisions we already made (and which we don't question or undo). So, for that example in particular, the assumption was that we examined `g`, decided not to lift it and then look at `f` knowing that `g` won't be lifted.
lifting go to top-level will result in abstracting over `$warg`,
Why didn't we float `$warg`?
Same reasoning. Because we decided earlier that the lift might not be worthwhile. I could look at some debug output to see what was the reason(s) for `$warg` if you are really interested. Replying to [comment:29 simonpj]:
Ah, but one of them (the last, in particular) is a 'void' parameter (so, slow call pattern pppppv), which is completely erased from the resulting Cmm!
OK, so the conclusion is: don't count void args when counting args and comparing with the threshold? I agree!
Yes, exactly. See https://github.com/sgraf812/ghc/blob/9b9260c1d45d127edf9ebdfe04a3daaff24a9de... Replying to [comment:30 simonpj]:
The non-recursive case
So it's a bit surprising to me that even with a threshold of 10 we have such a small hit on instruction count.
I was wondering the same thing. I ''think'' the other heuristics (https://github.com/sgraf812/ghc/blob/9b9260c1d45d127edf9ebdfe04a3daaff24a9de..., for completeness) fire way too often to allow non-recursive calls with 20 arguments. Consider the impact on allocations: When we lift such a function, all 20ish free variables would have to be made available at all call sites, leading to a massive increase in closure allocation for closures around the actual call sites. Example {{{ let f = [x1...x20] \[a b c] -> ... g = [f y] \z -> f y y z in map g [1,2,3] }}} Assuming for the sake of the example that `g` wouldn't be lifted to top- level itself (it's just a PAP of `f`), lifting `f` would just shift the whole closure into `g`'s closure. In general, there could be arbitrary more closures between `f`s binding site and its call sites, each of which would have to carry `f`s former free variables, should we decide to lift it. I could try and measure with the allocation heuristic and the max args heuristic turned off, to see if we get more chaotic results. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9476#comment:33 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler