Re: [GHC] #9476: Implement late lambda-lifting

11 Jan 2019

      #9476: Implement late lambda-lifting
-------------------------------------+-------------------------------------
        Reporter:  simonpj           |                Owner:  sgraf
            Type:  feature request   |               Status:  closed
        Priority:  normal            |            Milestone:  8.8.1
       Component:  Compiler          |              Version:  7.8.2
      Resolution:  fixed             |             Keywords:  LateLamLift
Operating System:  Unknown/Multiple  |         Architecture:
 Type of failure:  Runtime           |  Unknown/Multiple
  performance bug                    |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:  #8763 #13286      |  Differential Rev(s):  Phab:D5224
       Wiki Page:  LateLamLift       |
-------------------------------------+-------------------------------------

Comment (by sgraf):

 OK, it turns out that `paraffins` regresses by 8% in counted instructions
 when allowing arbitrary closure growth because of the very same function I
 pinned down as the reason for the alleged speed-up in comment:56. It seems
 like we don't want to lift the inner loop of such a list comprehension.

 On the other hand, there's `gen_regexp` which improves by 2-5% in runtime
 when we lift the next-to-inner function of a list comprehension (`go_s7zy`
 below):

 {{{
 let {
   go_s7zy =
       sat-only \r [x1_s7zz]
           case ># [x1_s7zz y_s7zx] of {
             __DEFAULT ->
                 case chr# [x1_s7zz] of ds13_s7zB {
                   __DEFAULT ->
                       let { ds14_s7zC = CCCS C#! [ds13_s7zB]; } in
                       let { z_s7zD = \u [] case +# [x1_s7zz 1#] of
 sat_s7zE { __DEFAULT -> go_s7zy sat_s7zE; }; } in
                       let {
                         go1_s7zF =
                             sat-only \r [ds15_s7zG]
                                 case ds15_s7zG of {
                                   [] -> z_s7zD;
                                   : y1_s7zI ys_s7zJ ->
                                       let { sat_s7zL = \u [] go1_s7zF
 ys_s7zJ; } in
                                       let { sat_s7zK = CCCS :! [ds14_s7zC
 y1_s7zI]; } in  : [sat_s7zK sat_s7zL];
                                 };
                       } in  go1_s7zF lvl13_s7zw;
                 };
             1# -> [] [];
           };
 } in  case ord# [c1_s7za] of sat_s7zM { __DEFAULT -> go_s7zy sat_s7zM; };
 }}}

 Lifting `go_s7zy` leads to a very slight increase in allocation (so would
 be disallowed from the perspective of closure growth), but a decrease in
 runtime. Maybe this is some low-level thing again, I don't know. The
 difference in Cmm is as expected and I wouldn't have expected a speed-up.

 So, to sum this up: Deactivating closure growth checks, thus allowing
 arbitrary closure growth resulting from lambda lifting

 - leads to an improvement of ~0.1% in runtime and counted instructions
 (though the two don't necessarily coincide).
 - also leads to unpredictable regressions in total allocations, like
 +28.6% in `wheel-sieve1`. From the regressions I investigated
 (`gen_regexp`, for example), the bindings the lifting decisions of which
 are responsible for the biggest regression in allocations (+10% due to the
 inner loop above, `go1_s7zF`) are not necessarily the same bindings which
 contribute the speed-ups (`go_s7zy` above).

 I'm not sure how to come up with a criterion that separates the two
 classes (beneficial to lift vs. not) of bindings, so I suggest we stick to
 the current closure growth heuristic. Which is a little more conservative
 than necessary, but avoids arbitrary regressions in allocations on STG
 level.

-- 
Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9476#comment:76
GHC http://www.haskell.org/ghc/
The Glasgow Haskell Compiler