
#9476: Implement late lambda-lifting -------------------------------------+------------------------------------- Reporter: simonpj | Owner: sgraf Type: feature request | Status: closed Priority: normal | Milestone: 8.8.1 Component: Compiler | Version: 7.8.2 Resolution: fixed | Keywords: LateLamLift Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #8763 #13286 | Differential Rev(s): Phab:D5224 Wiki Page: LateLamLift | -------------------------------------+------------------------------------- Comment (by sgraf): OK, it turns out that `paraffins` regresses by 8% in counted instructions when allowing arbitrary closure growth because of the very same function I pinned down as the reason for the alleged speed-up in comment:56. It seems like we don't want to lift the inner loop of such a list comprehension. On the other hand, there's `gen_regexp` which improves by 2-5% in runtime when we lift the next-to-inner function of a list comprehension (`go_s7zy` below): {{{ let { go_s7zy = sat-only \r [x1_s7zz] case ># [x1_s7zz y_s7zx] of { __DEFAULT -> case chr# [x1_s7zz] of ds13_s7zB { __DEFAULT -> let { ds14_s7zC = CCCS C#! [ds13_s7zB]; } in let { z_s7zD = \u [] case +# [x1_s7zz 1#] of sat_s7zE { __DEFAULT -> go_s7zy sat_s7zE; }; } in let { go1_s7zF = sat-only \r [ds15_s7zG] case ds15_s7zG of { [] -> z_s7zD; : y1_s7zI ys_s7zJ -> let { sat_s7zL = \u [] go1_s7zF ys_s7zJ; } in let { sat_s7zK = CCCS :! [ds14_s7zC y1_s7zI]; } in : [sat_s7zK sat_s7zL]; }; } in go1_s7zF lvl13_s7zw; }; 1# -> [] []; }; } in case ord# [c1_s7za] of sat_s7zM { __DEFAULT -> go_s7zy sat_s7zM; }; }}} Lifting `go_s7zy` leads to a very slight increase in allocation (so would be disallowed from the perspective of closure growth), but a decrease in runtime. Maybe this is some low-level thing again, I don't know. The difference in Cmm is as expected and I wouldn't have expected a speed-up. So, to sum this up: Deactivating closure growth checks, thus allowing arbitrary closure growth resulting from lambda lifting - leads to an improvement of ~0.1% in runtime and counted instructions (though the two don't necessarily coincide). - also leads to unpredictable regressions in total allocations, like +28.6% in `wheel-sieve1`. From the regressions I investigated (`gen_regexp`, for example), the bindings the lifting decisions of which are responsible for the biggest regression in allocations (+10% due to the inner loop above, `go1_s7zF`) are not necessarily the same bindings which contribute the speed-ups (`go_s7zy` above). I'm not sure how to come up with a criterion that separates the two classes (beneficial to lift vs. not) of bindings, so I suggest we stick to the current closure growth heuristic. Which is a little more conservative than necessary, but avoids arbitrary regressions in allocations on STG level. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9476#comment:76 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler