Re: [GHC] #13253: Exponential compilation time with RWST & ReaderT stack with `-02`

16 Oct 2018

      #13253: Exponential compilation time with RWST & ReaderT stack with `-02`
-------------------------------------+-------------------------------------
        Reporter:  phadej            |                Owner:  bgamari, osa1
            Type:  bug               |               Status:  new
        Priority:  normal            |            Milestone:  8.8.1
       Component:  Compiler          |              Version:  8.0.1
      Resolution:                    |             Keywords:
Operating System:  Unknown/Multiple  |         Architecture:
 Type of failure:  Compile-time      |  Unknown/Multiple
  performance bug                    |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:  #15630            |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by tdammers):

 I've instrumented GHC to dump more intermediate information during
 SpecConstr. Specifically, I've added the following logic:

 In `scExpr`, capture the expression to be specialized before and after
 specialization; then compare number of terms to detect a blowup (a factor
 of 2 is enough to weed out the non-blowups), and if this particular
 incantation does in fact blow up, dump the expression before the blowup.

 The resulting dump is a bit too large to attach, but it shows a typical
 pattern: the expressions that blow up all look very much alike; as we
 progress through the compilation, the "before" size stays at ~6000 terms,
 while the "after" size progressively increases, and it looks like
 expressions from earlier incantations get inlined into expressions later
 in the process.

 Which fits the hypothesis from the GHC HQ meeting, namely, that the
 inlining heuristic only looks at the size of the inlinee, but not at the
 inlining context, so when something gets inlined "top-down", then
 functions to be inlined (which don't have their own dependencies inlined
 yet) are all still small, and so the inliner happily copies them many
 times, and then in the next round, it inlines exponentially more
 invocations of the inlined functions' dependencies, and so on.

 For example, given:

 {{{#!haskell
 f = g1 + g2 + g3 + g4 + g5

 g1 = a1 + a2 + a3 + a4

 g2 = b1 + b2 + b3 + b4

 ...
 }}}

 ...the inlining heuristic will first look at `f`, conclude that each of
 `g1` through `g5` is small, and can thus be inlined; then after inlining,
 it will look at f again, and conclude that each of `a1` etc. is small, and
 inline those; and if those have further dependencies following the same
 matter, it will happily keep inlining all those small things, not
 realizing that it is creating a monstrosity. And because all the inlinees
 involved are different, there will not be any opportunities for
 optimizations that might shrink things back down, so the resulting program
 just keeps growing exponentially.

 One fun thing I'll try now is this:

 Considering that I already have code in place to detect blowups, I can
 trivially use this logic to just say "if this blows up, then throw away
 the specialized Core, and use the original expression instead". So I'll
 try that, and see what that does to various performance metrics.

-- 
Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13253#comment:41
GHC http://www.haskell.org/ghc/
The Glasgow Haskell Compiler