[GHC] #9660: unnecessary indirect jump when returning a case scrutinee

#9660: unnecessary indirect jump when returning a case scrutinee -------------------------------------+------------------------------------- Reporter: rwbarton | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.8.3 (CodeGen) | Operating System: Keywords: | Unknown/Multiple Architecture: Unknown/Multiple | Type of failure: Runtime Difficulty: Unknown | performance bug Blocked By: | Test Case: Related Tickets: | Blocking: | Differential Revisions: -------------------------------------+------------------------------------- I happened to be looking at the Cmm for this code (ghc 7.8.3, -O2) {{{#!hs f :: Int -> Int f x = if x < 0 then x else x+1 }}} and I noticed something a bit funny about it: {{{ c12e: if ((Sp + -8) < SpLim) goto c12z; else goto c12A; c12z: R2 = R2; R1 = Test.f_closure; call (stg_gc_fun)(R2, R1) args: 8, res: 0, upd: 8; c12A: I64[Sp - 8] = c12b; R1 = R2; Sp = Sp - 8; if (R1 & 7 != 0) goto c12b; else goto c12c; c12c: call (I64[R1])(R1) returns to c12b, args: 8, res: 8, upd: 8; c12b: Hp = Hp + 16; if (Hp > HpLim) goto c12y; else goto c12x; c12y: HpAlloc = 16; R1 = R1; call stg_gc_unpt_r1(R1) returns to c12b, args: 8, res: 8, upd: 8; c12x: _s11Q::I64 = I64[R1 + 7]; if (%MO_S_Lt_W64(_s11Q::I64, 0)) goto c12u; else goto c12v; c12u: Hp = Hp - 16; R1 = R1 & (-8); /* <--- */ Sp = Sp + 8; call (I64[R1])(R1) args: 8, res: 0, upd: 8; /* <--- */ c12v: I64[Hp - 8] = GHC.Types.I#_con_info; I64[Hp] = _s11Q::I64 + 1; R1 = Hp - 7; Sp = Sp + 8; call (P64[Sp])(R1) args: 8, res: 0, upd: 8; }}} On the two marked lines, we untag R1 (which is `x`) and enter it. However, we know at this point that `x` is already in WHNF so we could simply return it by replacing the two lines with `call (P64[Sp])(R1)`, if I'm not mistaken. That will save a load and an indirect jump (which we actually know is to `I#_con_info`, which would just retag R1 and return to the address on the stack anyways). I think the same optimization should be available any time we do an algebraic `case` and in a branch simply return the scrutinee. I looked at what it would take to fix this. It looks almost easy: if we add a new `LambdaFormInfo` constructor `LFUnknownCon` meaning that we know the identifier is bound to a saturated application of an unknown constructor, then we could set the `cg_lf` of the case binder variable of an algebraic case statement to `LFUnknownCon`, and return `ReturnIt` for `LFUnknownCon` variables in `getCallMethod`. I think that would do it. Does that sound right? Is there a better way? (In my original example we actually know the constructor has to be `I#`. But if the case was on a type with more than one constructor we wouldn't know statically which one we got, just that it has to be one of them.) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9660 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9660: unnecessary indirect jump when returning a case scrutinee -------------------------------------+------------------------------------- Reporter: rwbarton | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.8.3 (CodeGen) | Keywords: Resolution: | Architecture: Unknown/Multiple Operating System: | Difficulty: Unknown Unknown/Multiple | Blocked By: Type of failure: Runtime | Related Tickets: performance bug | Test Case: | Blocking: | Differential Revisions: | -------------------------------------+------------------------------------- Changes (by dfeuer): * cc: dfeuer (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9660#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9660: unnecessary indirect jump when returning a case scrutinee -------------------------------------+------------------------------------- Reporter: rwbarton | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.8.3 (CodeGen) | Keywords: Resolution: | Architecture: Unknown/Multiple Operating System: | Difficulty: Unknown Unknown/Multiple | Blocked By: Type of failure: Runtime | Related Tickets: performance bug | Test Case: | Blocking: | Differential Revisions: | -------------------------------------+------------------------------------- Changes (by rwbarton): * cc: dfeuer (removed) Comment: In fact it doesn't seem strictly necessary to add a new constructor to LambdaFormInfo... this proof-of-concept patch validates and produces the expected improved Cmm; nofib coming next... {{{ diff --git a/compiler/codeGen/StgCmmClosure.hs b/compiler/codeGen/StgCmmClosure.hs index b65d56b..df0f6be 100644 --- a/compiler/codeGen/StgCmmClosure.hs +++ b/compiler/codeGen/StgCmmClosure.hs @@ -22,7 +22,7 @@ module StgCmmClosure ( LambdaFormInfo, -- Abstract StandardFormInfo, -- ...ditto... mkLFThunk, mkLFReEntrant, mkConLFInfo, mkSelectorLFInfo, - mkApLFInfo, mkLFImported, mkLFArgument, mkLFLetNoEscape, + mkApLFInfo, mkLFImported, mkLFArgument, mkLFLetNoEscape, mkLFUnknownCon, lfDynTag, maybeIsLFCon, isLFThunk, isLFReEntrant, lfUpdatable, @@ -241,6 +241,9 @@ mkLFThunk thunk_ty top fvs upd_flag NonStandardThunk (might_be_a_function thunk_ty) +mkLFUnknownCon :: LambdaFormInfo +mkLFUnknownCon = LFCon $ error "tried to access constructor of case binder at compile time" + -------------- might_be_a_function :: Type -> Bool -- Return False only if we are *sure* it's a data type diff --git a/compiler/codeGen/StgCmmExpr.hs b/compiler/codeGen/StgCmmExpr.hs index b2b64f8..6b6ffef 100644 --- a/compiler/codeGen/StgCmmExpr.hs +++ b/compiler/codeGen/StgCmmExpr.hs @@ -441,7 +441,11 @@ cgCase scrut bndr alt_type alts ; let sequel = AssignTo alt_regs do_gc{- Note [scrut sequel] -} ; ret_kind <- withSequel sequel (cgExpr scrut) ; restoreCurrentCostCentre mb_cc - ; _ <- bindArgsToRegs ret_bndrs + ; case alt_type of + AlgAlt _ -> case ret_bndrs of + [ret_bndr] -> void $ bindToReg ret_bndr mkLFUnknownCon + _ -> error "wat" + _ -> void $ bindArgsToRegs ret_bndrs ; cgAlts (gc_plan,ret_kind) (NonVoid bndr) alt_type alts } }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9660#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9660: unnecessary indirect jump when returning a case scrutinee -------------------------------------+------------------------------------- Reporter: rwbarton | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.8.3 (CodeGen) | Keywords: Resolution: | Architecture: Unknown/Multiple Operating System: | Difficulty: Unknown Unknown/Multiple | Blocked By: Type of failure: Runtime | Related Tickets: performance bug | Test Case: | Blocking: | Differential Revisions: | -------------------------------------+------------------------------------- Changes (by rwbarton): * cc: dfeuer (added) Comment: (Oops, edit conflict...) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9660#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9660: unnecessary indirect jump when returning a case scrutinee -------------------------------------+------------------------------------- Reporter: rwbarton | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.8.3 (CodeGen) | Keywords: Resolution: | Architecture: Unknown/Multiple Operating System: | Difficulty: Unknown Unknown/Multiple | Blocked By: Type of failure: Runtime | Related Tickets: performance bug | Test Case: | Blocking: | Differential Revisions: | -------------------------------------+------------------------------------- Comment (by rwbarton): Well, initial results are somewhat inconclusive: {{{ Min -0.1% -0.1% -6.3% -6.2% -7.4% Max +0.1% +0.0% +7.5% +7.5% +3.6% Geometric Mean -0.0% -0.0% -0.2% -0.2% -0.0% }}} I believe the large variations are due to the same kind of issues I encountered in #8279. Got sucked into investigating that for a while, with no real progress. Anyways it seems like it ought to be a clear win; what's the best way to implement this for real? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9660#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9660: unnecessary indirect jump when returning a case scrutinee -------------------------------------+------------------------------------- Reporter: rwbarton | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.8.3 (CodeGen) | Keywords: Resolution: | Architecture: Unknown/Multiple Operating System: | Difficulty: Unknown Unknown/Multiple | Blocked By: Type of failure: Runtime | Related Tickets: performance bug | Test Case: | Blocking: | Differential Revisions: | -------------------------------------+------------------------------------- Comment (by simonmar): It ought to be a clear win. To be extra sure I'd compare the sizes of object files in nofib built both ways, find some object files that differ, and then compare the assembly output for those modules, just to be sure that nothing unexpected is happening. I'd also double-check any nofib results that appear to vary by more than a few percent. You can use perf to count instructions, which should go down. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9660#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9660: unnecessary indirect jump when returning a case scrutinee -------------------------------------+------------------------------------- Reporter: rwbarton | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.8.3 (CodeGen) | Keywords: Resolution: | Architecture: Unknown/Multiple Operating System: | Difficulty: Unknown Unknown/Multiple | Blocked By: Type of failure: Runtime | Related Tickets: performance bug | Test Case: | Blocking: | Differential Revisions: | -------------------------------------+------------------------------------- Comment (by schyler): Could you show the core difference (for us who don't really understand the original ticket that well)? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9660#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

Could you show the core difference (for us who don't really understand
#9660: unnecessary indirect jump when returning a case scrutinee -------------------------------------+------------------------------------- Reporter: rwbarton | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.8.3 (CodeGen) | Keywords: Resolution: | Architecture: Unknown/Multiple Operating System: | Difficulty: Unknown Unknown/Multiple | Blocked By: Type of failure: Runtime | Related Tickets: performance bug | Test Case: | Blocking: | Differential Revisions: | -------------------------------------+------------------------------------- Comment (by rwbarton): Replying to [comment:6 schyler]: the original ticket that well)? There is no difference at the Core level. In either case the core is {{{ Test.f = \ (x_arU :: GHC.Types.Int) -> case x_arU of wild_a10j { GHC.Types.I# x1_a10l -> case GHC.Prim.tagToEnum# @ GHC.Types.Bool (GHC.Prim.<# x1_a10l 0) of _ [Occ=Dead] { GHC.Types.False -> GHC.Types.I# (GHC.Prim.+# x1_a10l 1); GHC.Types.True -> wild_a10j } } }}} The difference is that in the generated Cmm, we don't re-enter `wild_a10j` in the True case, since we know it has already been reduced to WHNF by the first `case`.
Also, if you dump and attach the ASM I'd be happy to sift through and find the problem.
I don't have a copy of the instances I was looking at any more, but in one case the change seemed to affect the unique names chosen, which apparently led to laying out functions in a different order... annoying for trying to diff. My main question here is whether it's actually a good idea to use an error/panic in the LambdaFormInfo, or is it better to add a new constructor. I sort of like the error in that if something goes wrong, it's more likely to do so at compile time than at run time; but it's also a bit ugly. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9660#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC