
#13316: Bad inlining cascade leads to slow optimisation -------------------------------------+------------------------------------- Reporter: simonpj | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Keywords: | Operating System: Unknown/Multiple Architecture: | Type of failure: None/Unknown Unknown/Multiple | Test Case: | Blocked By: Blocking: | Related Tickets: Differential Rev(s): | Wiki Page: -------------------------------------+------------------------------------- Consider this simple program {{{ {-# INLINE [0] f #-} f x y = case y of True -> reverse x False -> reverse (reverse x) foo a b = let x = [a,a,a,a] in f x b }}} If you compile with -DDEBUG you'll see {{{ bash$ ghc-stage1 Foo.hs -O -fforce-recomp -c WARNING: file compiler/simplCore/SimplCore.hs, line 670 Simplifier bailing out after 4 iterations [17, 1, 1, 1] Size = {terms: 60, types: 49, coercions: 0, joins: 0/0}}}} }}} Yikes! One transformation per simplifier iteration! What is going on? Before inlining `f` we have {{{ (f [a,a,a,a] b) }}} Then, we inline, beta-reduce, and build let bindings thus {{{ let x1 = a : [] x2 = a : x1 x3 = a : x2 x4 = a : x3 in case b of True -> reverse x4 False -> reverse (reverse x4) }}} So far so good. But then, bizarrely, we do `postInlineUnconditionally` on `x4` (see comments in that function). But not on `x3` because its occ- info is "once inside lambda". Then in the next iteration we `postInlineUnconditionally` `x3`, and so on. Terrible, terrible. My thought: revisit these comments in `postInlineUnconditionally`: {{{ OneOcc { occ_in_lam = in_lam, occ_int_cxt = int_cxt } -- OneOcc => no code-duplication issue -> smallEnoughToInline dflags unfolding -- Small enough to dup -- ToDo: consider discount on smallEnoughToInline if int_cxt is true -- -- NB: Do NOT inline arbitrarily big things, even if one_br is True -- Reason: doing so risks exponential behaviour. We simplify a big -- expression, inline it, and simplify it again. But if the -- very same thing happens in the big expression, we get -- exponential cost! -- PRINCIPLE: when we've already simplified an expression once, -- make sure that we only inline it if it's reasonably small. && (not in_lam || -- Outside a lambda, we want to be reasonably aggressive -- about inlining into multiple branches of case -- e.g. let x = <non-value> -- in case y of { C1 -> ..x..; C2 -> ..x..; C3 -> ... } -- Inlining can be a big win if C3 is the hot- spot, even if -- the uses in C1, C2 are not 'interesting' -- An example that gets worse if you add int_cxt here is 'clausify' }}} What is particularly annoying in this case is that `x` is used in all code paths, so all this inlining simply duplicates code while gaining nothing. Moreover, if we had {{{ let x = blah in case y of A -> ...x... B -> ..x..x... C -> }}} then the multiple uses of `x` in the `B` branch would disable this entire `preInlineUnconditionally` thing, even though it might be a good idea to push the allocation of `x` into the `A` and `B` branches, to avoid the `C` code path. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13316 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler