[GHC] #13316: Bad inlining cascade leads to slow optimisation

#13316: Bad inlining cascade leads to slow optimisation -------------------------------------+------------------------------------- Reporter: simonpj | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Keywords: | Operating System: Unknown/Multiple Architecture: | Type of failure: None/Unknown Unknown/Multiple | Test Case: | Blocked By: Blocking: | Related Tickets: Differential Rev(s): | Wiki Page: -------------------------------------+------------------------------------- Consider this simple program {{{ {-# INLINE [0] f #-} f x y = case y of True -> reverse x False -> reverse (reverse x) foo a b = let x = [a,a,a,a] in f x b }}} If you compile with -DDEBUG you'll see {{{ bash$ ghc-stage1 Foo.hs -O -fforce-recomp -c WARNING: file compiler/simplCore/SimplCore.hs, line 670 Simplifier bailing out after 4 iterations [17, 1, 1, 1] Size = {terms: 60, types: 49, coercions: 0, joins: 0/0}}}} }}} Yikes! One transformation per simplifier iteration! What is going on? Before inlining `f` we have {{{ (f [a,a,a,a] b) }}} Then, we inline, beta-reduce, and build let bindings thus {{{ let x1 = a : [] x2 = a : x1 x3 = a : x2 x4 = a : x3 in case b of True -> reverse x4 False -> reverse (reverse x4) }}} So far so good. But then, bizarrely, we do `postInlineUnconditionally` on `x4` (see comments in that function). But not on `x3` because its occ- info is "once inside lambda". Then in the next iteration we `postInlineUnconditionally` `x3`, and so on. Terrible, terrible. My thought: revisit these comments in `postInlineUnconditionally`: {{{ OneOcc { occ_in_lam = in_lam, occ_int_cxt = int_cxt } -- OneOcc => no code-duplication issue -> smallEnoughToInline dflags unfolding -- Small enough to dup -- ToDo: consider discount on smallEnoughToInline if int_cxt is true -- -- NB: Do NOT inline arbitrarily big things, even if one_br is True -- Reason: doing so risks exponential behaviour. We simplify a big -- expression, inline it, and simplify it again. But if the -- very same thing happens in the big expression, we get -- exponential cost! -- PRINCIPLE: when we've already simplified an expression once, -- make sure that we only inline it if it's reasonably small. && (not in_lam || -- Outside a lambda, we want to be reasonably aggressive -- about inlining into multiple branches of case -- e.g. let x = <non-value> -- in case y of { C1 -> ..x..; C2 -> ..x..; C3 -> ... } -- Inlining can be a big win if C3 is the hot- spot, even if -- the uses in C1, C2 are not 'interesting' -- An example that gets worse if you add int_cxt here is 'clausify' }}} What is particularly annoying in this case is that `x` is used in all code paths, so all this inlining simply duplicates code while gaining nothing. Moreover, if we had {{{ let x = blah in case y of A -> ...x... B -> ..x..x... C -> }}} then the multiple uses of `x` in the `B` branch would disable this entire `preInlineUnconditionally` thing, even though it might be a good idea to push the allocation of `x` into the `A` and `B` branches, to avoid the `C` code path. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13316 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13316: Bad inlining cascade leads to slow optimisation -------------------------------------+------------------------------------- Reporter: simonpj | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by ethercrow): * cc: ethercrow (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13316#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13316: Bad inlining cascade leads to slow optimisation -------------------------------------+------------------------------------- Reporter: simonpj | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): Here's my work-in-progress {{{ Modified compiler/simplCore/SimplUtils.hs diff --git a/compiler/simplCore/SimplUtils.hs b/compiler/simplCore/SimplUtils.hs index 7deaf5b..2f9e2e9 100644 --- a/compiler/simplCore/SimplUtils.hs +++ b/compiler/simplCore/SimplUtils.hs @@ -1200,7 +1200,7 @@ postInlineUnconditionally dflags env top_lvl bndr occ_info rhs unfolding -- PRINCIPLE: when we've already simplified an expression once, -- make sure that we only inline it if it's reasonably small. - && (not in_lam || + && ( (not in_lam && not (isValueUnfolding unfolding)) || -- Outside a lambda, we want to be reasonably aggressive -- about inlining into multiple branches of case -- e.g. let x = <non-value> @@ -1209,7 +1209,7 @@ postInlineUnconditionally dflags env top_lvl bndr occ_info rhs unfolding -- the uses in C1, C2 are not 'interesting' -- An example that gets worse if you add int_cxt here is 'clausify' - (isCheapUnfolding unfolding && int_cxt)) + (int_cxt && isCheapUnfolding unfolding)) -- isCheap => acceptable work duplication; in_lam may be true -- int_cxt to prevent us inlining inside a lambda without some -- good reason. See the notes on int_cxt in preInlineUnconditionally }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13316#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13316: Bad inlining cascade leads to slow optimisation -------------------------------------+------------------------------------- Reporter: simonpj | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by MikolajKonarski): * cc: MikolajKonarski (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13316#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13316: Bad inlining cascade leads to slow optimisation -------------------------------------+------------------------------------- Reporter: simonpj | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): I've just discovered that this inline cascade happens in a module of GHC itself: `StgCmmBind`. Here's the output if you compile with `-fmax- simplifier-iterations=8`: {{{ WARNING: file compiler/simplCore/SimplCore.hs, line 700 Simplifier bailing out after 8 iterations [2478, 266, 16, 1, 1, 1, 2, 1] Size = {terms: 2,793, types: 8,470, coercions: 529, joins: 4/77} WARNING: file compiler/simplCore/SimplCore.hs, line 700 Simplifier bailing out after 8 iterations [990, 166, 9, 4, 2, 4, 2, 3] Size = {terms: 4,605, types: 13,273, coercions: 302, joins: 21/138} }}} The cascade of iterations with only one or two ticks each time comes from exactly the kind of nested data constructor applications in the description. To be specific, in `closureCodeBody`, at some point we see code like this, and those `wild` variables get inlined one by one: {{{ let { wild1_so1n :: CLabel [LclId, Unf=Unf{Src=<vanilla>, TopLvl=False, Value=False, ConLike=False, WorkFree=False, Expandable=False, Guidance=IF_ARGS [] 30 0}] wild1_so1n = closureLocalEntryLabel ipv_anWK cl_info_aegJ } in let { wild1_so1m :: CmmLit [LclId, Unf=Unf{Src=<vanilla>, TopLvl=False, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 10 20}] wild1_so1m = CmmExpr.CmmLabel wild1_so1n } in let { wild1_so1l :: CmmExpr [LclId, Unf=Unf{Src=<vanilla>, TopLvl=False, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 10 20}] wild1_so1l = CmmExpr.CmmLit wild1_so1m } in let { wild1_so1k :: CmmNode hoopl-3.10.2.2:Compiler.Hoopl.Block.O hoopl-3.10.2.2:Compiler.Hoopl.Block.C [LclId, Unf=Unf{Src=<vanilla>, TopLvl=False, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 10 70}] wild1_so1k = CmmNode.CmmCall @ hoopl-3.10.2.2:Compiler.Hoopl.Block.O @ hoopl-3.10.2.2:Compiler.Hoopl.Block.C @~ (hoopl-3.10.2.2:Compiler.Hoopl.Block.O_N :: (hoopl-3.10.2.2:Compiler.Hoopl.Block.O :: *) ghc-prim-0.5.0.0:GHC.Prim.~# (hoopl-3.10.2.2:Compiler.Hoopl.Block.O :: *)) @~ (hoopl-3.10.2.2:Compiler.Hoopl.Block.C_N :: (hoopl-3.10.2.2:Compiler.Hoopl.Block.C :: *) ghc-prim-0.5.0.0:GHC.Prim.~# (hoopl-3.10.2.2:Compiler.Hoopl.Block.C :: *)) wild1_so1l (GHC.Base.Nothing @ BlockId) ww2_ao10 ww1_ao0Z MkGraph.mkFinalCall1 updfr_off_alrO } in case wild_ao13 of wild2_ao15 { __DEFAULT -> OrdList.Snoc @ CgStmt wild2_ao15 (MkGraph.CgLast wild1_so1k); OrdList.One a1_ao1b -> OrdList.Cons @ CgStmt a1_ao1b (OrdList.One @ CgStmt (MkGraph.CgLast wild1_so1k)) }; }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13316#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC