[GHC] #8326: Place heap checks common in case alternatives before the case

#8326: Place heap checks common in case alternatives before the case ------------------------------------+------------------------------------- Reporter: jstolarek | Owner: Type: task | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.7 Keywords: | Operating System: Unknown/Multiple Architecture: Unknown/Multiple | Type of failure: None/Unknown Difficulty: Unknown | Test Case: Blocked By: | Blocking: 8317 Related Tickets: #1498 | ------------------------------------+------------------------------------- We would like to have functions that check whether an `Int#` is a valid tag to represent `Bool` (see Note [Optimizing isTrue#] in ghc-prim): {{{ isTrue# :: Int# -> Bool isTrue# 1# = True isTrue# _ = False isFalse# :: Int# -> Bool isFalse# 0# = True isFalse# _ = False }}} We could use them with comparison primops like this: {{{ f :: Int# -> Int f x | isTrue# (x ># 0#) = I# x | otherwise = -(I# x) }}} `isTrue#` is optimized away at the Core level: {{{ A.f = \ (x_aqM :: GHC.Prim.Int#) -> case GHC.Prim.># x_aqM 0 of _ { __DEFAULT -> GHC.Types.I# (GHC.Prim.negateInt# x_aqM); 1 -> GHC.Types.I# x_aqM } }}} but the code genrator produces very bad Cmm code, because it pushes heap checks into case alternatives: {{{ {offset cFd: // stack check if ((Sp + -16) < SpLim) goto cFr; else goto cFs; cFr: // not enough place on the stack, call GC R2 = R2; R1 = A.f_closure; call (stg_gc_fun)(R2, R1) args: 8, res: 0, upd: 8; cFs: // scrutinize (x ># 0#) _sEU::I64 = R2; _sEV::I64 = %MO_S_Gt_W64(R2, 0); if (_sEV::I64 != 1) goto cFg; else goto cFo; cFg: // False branch Hp = Hp + 16; if (Hp > HpLim) goto cFy; else goto cFx; cFy: // not enough heap, call GC HpAlloc = 16; I64[Sp - 16] = cFf; R1 = _sEV::I64; I64[Sp - 8] = _sEU::I64; Sp = Sp - 16; call stg_gc_unbx_r1(R1) returns to cFf, args: 8, res: 8, upd: 8; cFf: // re-do the False branch _sEU::I64 = I64[Sp + 8]; Sp = Sp + 16; _sEV::I64 = R1; goto cFg; cFx: // RHS of False branch I64[Hp - 8] = GHC.Types.I#_con_info; I64[Hp] = -_sEU::I64; R1 = Hp - 7; call (P64[Sp])(R1) args: 8, res: 0, upd: 8; cFo: // True branch Hp = Hp + 16; if (Hp > HpLim) goto cFv; else goto cFu; cFv: // not enough heap, call GC HpAlloc = 16; I64[Sp - 16] = cFn; R1 = _sEV::I64; I64[Sp - 8] = _sEU::I64; Sp = Sp - 16; call stg_gc_unbx_r1(R1) returns to cFn, args: 8, res: 8, upd: 8; cFn: // re-do the True branch _sEU::I64 = I64[Sp + 8]; Sp = Sp + 16; _sEV::I64 = R1; goto cFo; cFu: // RHS of True branch I64[Hp - 8] = GHC.Types.I#_con_info; I64[Hp] = _sEU::I64; R1 = Hp - 7; call (P64[Sp])(R1) args: 8, res: 0, upd: 8; } }}} This results in average 2.5% increase in binary size. By contrast, if we use `tagToEnum#` instead of `isTrue#` heap check will be placed before `case` expression and the code will be significantly shorter (this is done by a special case-on-bool optimization in the code generator - see #8317). What we would like to do here is: 1. compile case alternatives without placing heap checks inside them 2. each compiled alternative should return amount of heap it needs to allocate 3. code generator inspects amounts of heap needed by each alternative and either adds heap checks in alternatives or puts a single check before the case expression. Getting this right might be a bit tricky. 1. if all branches allocate some heap then we can just put a common heap check before the case. Note that we must allocate the higgest amount required by any of the alternatives and then alternatives that use less heap must retract the heap pointer accordingly. 2. if we have two alternatives, one of which allocates heap and the other does not, we should place the heap check only in the alternative that allocates the stack. This will solve #1498. 3. it is not clear to me what to do if we have combination of the above (more than one branch that allocates heap and at least one branch that does not). If we place heap check before the `case` expression we lose optimization of recursive functions and face the problem described in #1498. If we push heap checks into branches that allocate heap then we get code duplication, i.e. the problem that we're addressing in this ticket. I guess the only way to make correct decission here is to try different aproaches and measure their performance. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8326 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8326: Place heap checks common in case alternatives before the case
-------------------------------------+------------------------------------
Reporter: jstolarek | Owner:
Type: task | Status: new
Priority: normal | Milestone:
Component: Compiler | Version: 7.7
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture: Unknown/Multiple
Type of failure: None/Unknown | Difficulty: Unknown
Test Case: | Blocked By:
Blocking: 8317 | Related Tickets: #1498
-------------------------------------+------------------------------------
Comment (by Jan Stolarek

#8326: Place heap checks common in case alternatives before the case -------------------------------------+------------------------------------ Reporter: jstolarek | Owner: Type: task | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Unknown/Multiple Type of failure: None/Unknown | Difficulty: Unknown Test Case: | Blocked By: Blocking: 8317 | Related Tickets: #1498 -------------------------------------+------------------------------------ Description changed by jstolarek: Old description:
We would like to have functions that check whether an `Int#` is a valid tag to represent `Bool` (see Note [Optimizing isTrue#] in ghc-prim):
{{{ isTrue# :: Int# -> Bool isTrue# 1# = True isTrue# _ = False
isFalse# :: Int# -> Bool isFalse# 0# = True isFalse# _ = False }}}
We could use them with comparison primops like this:
{{{ f :: Int# -> Int f x | isTrue# (x ># 0#) = I# x | otherwise = -(I# x) }}}
`isTrue#` is optimized away at the Core level:
{{{ A.f = \ (x_aqM :: GHC.Prim.Int#) -> case GHC.Prim.># x_aqM 0 of _ { __DEFAULT -> GHC.Types.I# (GHC.Prim.negateInt# x_aqM); 1 -> GHC.Types.I# x_aqM } }}}
but the code genrator produces very bad Cmm code, because it pushes heap checks into case alternatives:
{{{ {offset cFd: // stack check if ((Sp + -16) < SpLim) goto cFr; else goto cFs; cFr: // not enough place on the stack, call GC R2 = R2; R1 = A.f_closure; call (stg_gc_fun)(R2, R1) args: 8, res: 0, upd: 8; cFs: // scrutinize (x ># 0#) _sEU::I64 = R2; _sEV::I64 = %MO_S_Gt_W64(R2, 0); if (_sEV::I64 != 1) goto cFg; else goto cFo; cFg: // False branch Hp = Hp + 16; if (Hp > HpLim) goto cFy; else goto cFx; cFy: // not enough heap, call GC HpAlloc = 16; I64[Sp - 16] = cFf; R1 = _sEV::I64; I64[Sp - 8] = _sEU::I64; Sp = Sp - 16; call stg_gc_unbx_r1(R1) returns to cFf, args: 8, res: 8, upd: 8; cFf: // re-do the False branch _sEU::I64 = I64[Sp + 8]; Sp = Sp + 16; _sEV::I64 = R1; goto cFg; cFx: // RHS of False branch I64[Hp - 8] = GHC.Types.I#_con_info; I64[Hp] = -_sEU::I64; R1 = Hp - 7; call (P64[Sp])(R1) args: 8, res: 0, upd: 8; cFo: // True branch Hp = Hp + 16; if (Hp > HpLim) goto cFv; else goto cFu; cFv: // not enough heap, call GC HpAlloc = 16; I64[Sp - 16] = cFn; R1 = _sEV::I64; I64[Sp - 8] = _sEU::I64; Sp = Sp - 16; call stg_gc_unbx_r1(R1) returns to cFn, args: 8, res: 8, upd: 8; cFn: // re-do the True branch _sEU::I64 = I64[Sp + 8]; Sp = Sp + 16; _sEV::I64 = R1; goto cFo; cFu: // RHS of True branch I64[Hp - 8] = GHC.Types.I#_con_info; I64[Hp] = _sEU::I64; R1 = Hp - 7; call (P64[Sp])(R1) args: 8, res: 0, upd: 8; } }}}
This results in average 2.5% increase in binary size. By contrast, if we use `tagToEnum#` instead of `isTrue#` heap check will be placed before `case` expression and the code will be significantly shorter (this is done by a special case-on-bool optimization in the code generator - see #8317). What we would like to do here is:
1. compile case alternatives without placing heap checks inside them 2. each compiled alternative should return amount of heap it needs to allocate 3. code generator inspects amounts of heap needed by each alternative and either adds heap checks in alternatives or puts a single check before the case expression.
Getting this right might be a bit tricky. 1. if all branches allocate some heap then we can just put a common heap check before the case. Note that we must allocate the higgest amount required by any of the alternatives and then alternatives that use less heap must retract the heap pointer accordingly. 2. if we have two alternatives, one of which allocates heap and the other does not, we should place the heap check only in the alternative that allocates the stack. This will solve #1498. 3. it is not clear to me what to do if we have combination of the above (more than one branch that allocates heap and at least one branch that does not). If we place heap check before the `case` expression we lose optimization of recursive functions and face the problem described in #1498. If we push heap checks into branches that allocate heap then we get code duplication, i.e. the problem that we're addressing in this ticket. I guess the only way to make correct decission here is to try different aproaches and measure their performance.
New description: We would like to have functions that check whether an `Int#` is a valid tag to represent `Bool` (see Note [Optimizing isTrue#] in ghc-prim): {{{ isTrue# :: Int# -> Bool isTrue# 1# = True isTrue# _ = False isFalse# :: Int# -> Bool isFalse# 0# = True isFalse# _ = False }}} We could use them with comparison primops like this: {{{ f :: Int# -> Int f x | isTrue# (x ># 0#) = I# x | otherwise = -(I# x) }}} `isTrue#` is optimized away at the Core level: {{{ A.f = \ (x_aqM :: GHC.Prim.Int#) -> case GHC.Prim.># x_aqM 0 of _ { __DEFAULT -> GHC.Types.I# (GHC.Prim.negateInt# x_aqM); 1 -> GHC.Types.I# x_aqM } }}} but the code genrator produces very bad Cmm code, because it pushes heap checks into case alternatives: {{{ {offset cFd: // stack check if ((Sp + -16) < SpLim) goto cFr; else goto cFs; cFr: // not enough place on the stack, call GC R2 = R2; R1 = A.f_closure; call (stg_gc_fun)(R2, R1) args: 8, res: 0, upd: 8; cFs: // scrutinize (x ># 0#) _sEU::I64 = R2; _sEV::I64 = %MO_S_Gt_W64(R2, 0); if (_sEV::I64 != 1) goto cFg; else goto cFo; cFg: // False branch Hp = Hp + 16; if (Hp > HpLim) goto cFy; else goto cFx; cFy: // not enough heap, call GC HpAlloc = 16; I64[Sp - 16] = cFf; R1 = _sEV::I64; I64[Sp - 8] = _sEU::I64; Sp = Sp - 16; call stg_gc_unbx_r1(R1) returns to cFf, args: 8, res: 8, upd: 8; cFf: // re-do the False branch _sEU::I64 = I64[Sp + 8]; Sp = Sp + 16; _sEV::I64 = R1; goto cFg; cFx: // RHS of False branch I64[Hp - 8] = GHC.Types.I#_con_info; I64[Hp] = -_sEU::I64; R1 = Hp - 7; call (P64[Sp])(R1) args: 8, res: 0, upd: 8; cFo: // True branch Hp = Hp + 16; if (Hp > HpLim) goto cFv; else goto cFu; cFv: // not enough heap, call GC HpAlloc = 16; I64[Sp - 16] = cFn; R1 = _sEV::I64; I64[Sp - 8] = _sEU::I64; Sp = Sp - 16; call stg_gc_unbx_r1(R1) returns to cFn, args: 8, res: 8, upd: 8; cFn: // re-do the True branch _sEU::I64 = I64[Sp + 8]; Sp = Sp + 16; _sEV::I64 = R1; goto cFo; cFu: // RHS of True branch I64[Hp - 8] = GHC.Types.I#_con_info; I64[Hp] = _sEU::I64; R1 = Hp - 7; call (P64[Sp])(R1) args: 8, res: 0, upd: 8; } }}} This results in average 2.5% increase in binary size. By contrast, if we use `tagToEnum#` instead of `isTrue#` heap check will be placed before `case` expression and the code will be significantly shorter (this is done by a special case-on-bool optimization in the code generator - see #8317). What we would like to do here is: 1. compile case alternatives without placing heap checks inside them 2. each compiled alternative should return amount of heap it needs to allocate 3. code generator inspects amounts of heap needed by each alternative and either adds heap checks in alternatives or puts a single check before the case expression. Getting this right might be a bit tricky. 1. if all branches allocate some heap then we can just put a common heap check before the case. Note that we must allocate the higgest amount required by any of the alternatives and then alternatives that use less heap must retract the heap pointer accordingly. 2. if we have two alternatives, one of which allocates heap and the other does not, we should place the heap check only in the alternative that allocates the stack. This will solve #1498. 3. it is not clear to me what to do if we have combination of the above (more than one branch that allocates heap and at least one branch that does not). If we place heap check before the `case` expression we lose optimization of recursive functions and face the problem described in #1498. If we push heap checks into branches that allocate heap then we get code duplication, i.e. the problem that we're addressing in this ticket. I guess the only way to make correct decission here is to try different aproaches and measure their performance. This ticket is mentioned [http://ghc.haskell.org/trac/ghc/wiki/PrimBool#Implementationdetails on this wiki page] and in the source code in Note [Optimizing isTrue#] in ghc-prim. Once this ticket is resolved we need to update these places accordingly. -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8326#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8326: Place heap checks common in case alternatives before the case -------------------------------------+------------------------------------- Reporter: jstolarek | Owner: Type: task | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.7 Resolution: | Keywords: Operating System: | Architecture: Unknown/Multiple Unknown/Multiple | Difficulty: Unknown Type of failure: | Blocked By: None/Unknown | Related Tickets: #1498 Test Case: | Blocking: 8317 | Differential Revisions: | -------------------------------------+------------------------------------- Comment (by dfeuer): I do not understand the theory here—what is the (not-yet-implemented version of) `isTrue#` supposed to accomplish? If I hand it `12#`, it will dutifully tell me `False`. What exactly have I learned from that? Either I meant for it to return `True` and I just got a wrong answer or I just used a very confusingly-named function to see if something equals `1#`. I could fake C-style booleans if I wanted using `isFalse`, but it would be much clearer to just explicitly compare something to zero. If you wanted to actually get some kind of safety, you'd need something more invasive, with more potential to slow things down, like maybe {{{#!hs isTrue# x | tagToEnum# ((x `orI#` 1#) ==# 1#) = tagToEnum# x | otherwise = error "Oops" }}} I just don't see how the proposed `isTrue#` offers any real advantage over `tagToEnum#`. Using primops is always playing with fire, and something that looks like a safety net but really isn't just invites careless errors. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8326#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8326: Place heap checks common in case alternatives before the case -------------------------------------+------------------------------------- Reporter: jstolarek | Owner: Type: task | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.7 Resolution: | Keywords: Operating System: | Architecture: Unknown/Multiple Unknown/Multiple | Difficulty: Unknown Type of failure: | Blocked By: None/Unknown | Related Tickets: #1498 Test Case: | Blocking: 8317 | Differential Revisions: | -------------------------------------+------------------------------------- Comment (by carter): @dfeuer, (my interpretation is) this ticket is articulating how we'd *like* to write isTrue# and noting some of the optimization engineering thats needed to support writing things in the desired high level fashion. Your safety concern is at least in some small part spurious, because some of this optimization/engineering is about how to transform high level type safe code into that branch free form as optimizations in Core, STG and CMM. That is, end users should not have to deal with this tomfoolery ever, but we'd still like to give them branch free code when its safe! -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8326#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8326: Place heap checks common in case alternatives before the case -------------------------------------+------------------------------------- Reporter: jstolarek | Owner: Type: task | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.7 Resolution: | Keywords: Operating System: | Architecture: Unknown/Multiple Unknown/Multiple | Difficulty: Unknown Type of failure: | Blocked By: None/Unknown | Related Tickets: #1498 Test Case: | Blocking: 8317 | Differential Revisions: | -------------------------------------+------------------------------------- Comment (by carter): In the mean time, things like `isTrue#` are not meant to be safe wrapped things, but tools that make performance engineering manageable and tractable. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8326#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8326: Place heap checks common in case alternatives before the case -------------------------------------+------------------------------------- Reporter: jstolarek | Owner: Type: task | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.7 Resolution: | Keywords: Operating System: | Architecture: Unknown/Multiple Unknown/Multiple | Difficulty: Unknown Type of failure: | Blocked By: None/Unknown | Related Tickets: #1498 Test Case: | Blocking: 8317 | Differential Revisions: | -------------------------------------+------------------------------------- Comment (by carter): more concretely, that change in definition has impact on other pieces of code generation that needs to be fixed in order have that nicer / safer definition to work out. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8326#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8326: Place heap checks common in case alternatives before the case -------------------------------------+------------------------------------- Reporter: jstolarek | Owner: Type: task | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.7 Resolution: | Keywords: Operating System: | Architecture: Unknown/Multiple Unknown/Multiple | Difficulty: Unknown Type of failure: | Blocked By: None/Unknown | Related Tickets: #1498 Test Case: | Blocking: 8317 | Differential Revisions: | -------------------------------------+------------------------------------- Comment (by jstolarek): Replying to [comment:3 dfeuer]:
If I hand it `12#`, it will dutifully tell me `False`. What exactly have I learned from that? That the argument you passed is not a valid tag for `True`.
This ticket is not really about `isTrue#` or `isFalse#` - which are just tools a programmer might want or not want to use - but about fixing the heap checks and thus fixing #8317. If you feel that `isTrue#` and `isFalse#` don't offer you any benefit you can still use `tagToEnum#`. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8326#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8326: Place heap checks common in case alternatives before the case -------------------------------------+------------------------------------- Reporter: jstolarek | Owner: Type: task | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.7 Resolution: | Keywords: Operating System: | Architecture: Unknown/Multiple Unknown/Multiple | Difficulty: Unknown Type of failure: | Blocked By: None/Unknown | Related Tickets: #1498 Test Case: | Blocking: 8317 | Differential Revisions: | -------------------------------------+------------------------------------- Comment (by dfeuer): Replying to [comment:7 jstolarek]:
Replying to [comment:3 dfeuer]:
If I hand it `12#`, it will dutifully tell me `False`. What exactly have I learned from that? That the argument you passed is not a valid tag for `True`.
This ticket is not really about `isTrue#` or `isFalse#` - which are just tools a programmer might want or not want to use - but about fixing the heap checks and thus fixing #8317. If you feel that `isTrue#` and `isFalse#` don't offer you any benefit you can still use `tagToEnum#`.
I'm not sure where the appropriate place is for this line of discussion, but it seems that in the wild (all over the library source), `isTrue#` is typically used as a function for converting from `Int#`, produced by a comparison operator, to `Bool`, rather than as a validity test for `True`. I have yet to see any explanation of why that is appropriate. Certainly, optimizing heap checks is an entirely different matter, and presumably a good idea regardless. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8326#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

it seems that in the wild (all over the library source), `isTrue#` is typically used as a function for converting from `Int#`, produced by a comparison operator, to `Bool`, rather than as a validity test for `True`. I have yet to see any explanation of why that is appropriate. Right, I see what you mean. So the only way the `isTrue#` and `isFalse#` functions are "safe" is because they promise to do what their names imply. You've focused on `isTrue#`, which indeed is identical to `tagToEnum#` but
#8326: Place heap checks common in case alternatives before the case -------------------------------------+------------------------------------- Reporter: jstolarek | Owner: Type: task | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.7 Resolution: | Keywords: Operating System: | Architecture: Unknown/Multiple Unknown/Multiple | Difficulty: Unknown Type of failure: | Blocked By: None/Unknown | Related Tickets: #1498 Test Case: | Blocking: 8317 | Differential Revisions: | -------------------------------------+------------------------------------- Comment (by jstolarek): Replying to [comment:8 dfeuer]: this is definitely not the case with `isFalse#`. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8326#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8326: Place heap checks common in case alternatives before the case -------------------------------------+------------------------------------- Reporter: jstolarek | Owner: jstolarek Type: task | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.7 Resolution: | Keywords: Operating System: | Architecture: Unknown/Multiple Unknown/Multiple | Difficulty: Unknown Type of failure: | Blocked By: None/Unknown | Related Tickets: #1498 Test Case: | Blocking: 8317 | Differential Revisions: | -------------------------------------+------------------------------------- Changes (by jstolarek): * owner: => jstolarek -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8326#comment:10 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8326: Place heap checks common in case alternatives before the case -------------------------------------+------------------------------------- Reporter: jstolarek | Owner: jstolarek Type: task | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.7 Resolution: | Keywords: Operating System: | Architecture: Unknown/Multiple Unknown/Multiple | Difficulty: Unknown Type of failure: | Blocked By: None/Unknown | Related Tickets: #1498 Test Case: | Blocking: 8317 | Differential Revisions: D343 | -------------------------------------+------------------------------------- Changes (by jstolarek): * differential: => D343 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8326#comment:11 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8326: Place heap checks common in case alternatives before the case -------------------------------------+------------------------------------- Reporter: jstolarek | Owner: jstolarek Type: task | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.7 Resolution: | Keywords: Operating System: | Architecture: Unknown/Multiple Unknown/Multiple | Difficulty: Unknown Type of failure: | Blocked By: None/Unknown | Related Tickets: #1498 Test Case: | Blocking: 8317 | Differential Revisions: Phab:D343 | -------------------------------------+------------------------------------- Changes (by jstolarek): * differential: D343 => Phab:D343 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8326#comment:12 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8326: Place heap checks common in case alternatives before the case -------------------------------------+------------------------------------- Reporter: jstolarek | Owner: jstolarek Type: task | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.7 Resolution: | Keywords: Operating System: | Architecture: Unknown/Multiple Unknown/Multiple | Difficulty: Unknown Type of failure: | Blocked By: None/Unknown | Related Tickets: #1498 Test Case: | Blocking: 8317 | Differential Revisions: Phab:D343 | -------------------------------------+------------------------------------- Comment (by simonpj): Responding to the writeup on Phab:D343. Before running off to make special cases for comparisons, look at the relevant code for `cgCase`: {{{ cgCase scrut bndr alt_type alts = -- the general case do { dflags <- getDynFlags ; up_hp_usg <- getVirtHp -- Upstream heap usage ; let ret_bndrs = chooseReturnBndrs bndr alt_type alts alt_regs = map (idToReg dflags) ret_bndrs ; simple_scrut <- isSimpleScrut scrut alt_type ; let do_gc | not simple_scrut = True | isSingleton alts = False | up_hp_usg > 0 = False | otherwise = True -- cf Note [Compiling case expressions] gc_plan = if do_gc then GcInAlts alt_regs else NoGcInAlts ; mb_cc <- maybeSaveCostCentre simple_scrut ; let sequel = AssignTo alt_regs do_gc{- Note [scrut sequel] -} ; ret_kind <- withSequel sequel (cgExpr scrut) ; restoreCurrentCostCentre mb_cc ; _ <- bindArgsToRegs ret_bndrs ; cgAlts (gc_plan,ret_kind) (NonVoid bndr) alt_type alts } }}} If `do_gc` is true, we put heap checks at the start of each branch. If `do_gc` is false, we take the max of the branches, and do the heap check before the `case`. I'll use a running example like this: {{{ f = \x -> let y = blah in case <scrut> of 0# -> <rhs1> DEFAULT -> <rhs2> }}} Things that affect the `do_gc` decision: * If the scrutinee `<scrut>` requires any non-trivial work, we MUST have `do_gc = True`. For example if `<scrut>` was `(g x)`, then calling `g` might result in lots of allocation, so any heap check done at the start of `f` is irrelevant to the branches. They must do their own checks. This is the `simple_scrut` check. It succeeds on simple finite computations like `x +# 1` or `x` (if `x` is unboxed). The other cases are all for the simple-srut situation: * If there is just one alternative, then it's always good to amalgamate * If there is heap allocation in the code before the case (`up_hp_usg > 0`), then we are going to do a heap-check upstream anyway. In that case, don't do one in the alterantives too. (The single check might allocate too much space, but the alterantives that use less space simply move `Hp` back down again, which only costs one instruction.) * Otherwise, if there no heap alloation upstream, put heap checks in each alternative. The resoning here was that if one alternative needs heap and the other one doesn't we don't want to pay the runtime for the heap check in the case where the heap-free alternative is taken. Now, what is happening in your example is that * There is no upstream heap usage * Both alternatives allocate Result: you get two heap checks instead of one. But if only ''one'' branch allocated, you'd probably ''want'' to have the heap check in that branch! So I think the criterion should be that (assuming no upstream allocation) * If all the branches allocate, do the heap check before the case * Otherwise pay the price of a heap check in each branch Or alterantively (less code size, slightly slower) * If more than one branch allocates, do the heap check before the case * If only one allocates, do it in the brcnch The difficulty here is that it's hard to find out whether the branches allocate without running the code generator on them, and that's now how the current setup is structured. (When you run the code generator on some code, the monad keeps track of how much allocation is done; see `StgCmmMonad.getHeapUsage`.) It might well be possible to move things around a bit, but it would need a little care. But before doing that, the first thing is to decide what the criteria should be. Simon -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8326#comment:13 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8326: Place heap checks common in case alternatives before the case -------------------------------------+------------------------------------- Reporter: jstolarek | Owner: jstolarek Type: task | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.7 Resolution: | Keywords: Operating System: | Architecture: Unknown/Multiple Unknown/Multiple | Difficulty: Unknown Type of failure: | Blocked By: None/Unknown | Related Tickets: #1498 Test Case: | Blocking: 8317 | Differential Revisions: Phab:D343 | -------------------------------------+------------------------------------- Comment (by jstolarek): Thanks for detailed explanation. I was a bit concerned about all that extra stuff like saving cost centres in the general case. Don't we need to worry that any of these will impact performance of the compiled code?
The difficulty here is that it's hard to find out whether the branches allocate without running the code generator on them
Yes, I believe this was our conclusion when we discussed that during my internship. I don't see how to use `StgCmmMonad.getHeapUsage` to implement the solution. We compile all the alternatives in one go by calling `cgAlts (gc_plan,ret_kind) (NonVoid bndr) alt_type alts`, whereas to apply any of the criteria that you proposed we would need to compile the alternatives one by one and analyse their heap usage. My understanding is that if we simply call `cgAlts` in a lambda passed to `getHeapUsage` we will only know how much heap total is used by the compiled alternatives but we won't have any detailed knowledge about each of the alternatives. Is that correct?
But before doing that, the first thing is to decide what the criteria should be.
It's a bit hard to tell without doing the actual implementation and measuring the results. I guess there will be some programs that work better with the first criterion and some that work better with the second criterion. That being said, I'd go with the first one. Slightly larger binaries are a small price to pay for possibility of having better performance. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8326#comment:14 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8326: Place heap checks common in case alternatives before the case -------------------------------------+------------------------------------- Reporter: jstolarek | Owner: jstolarek Type: task | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.7 Resolution: | Keywords: Operating System: | Architecture: Unknown/Multiple Unknown/Multiple | Difficulty: Unknown Type of failure: | Blocked By: None/Unknown | Related Tickets: #1498 Test Case: | Blocking: 8317 | Differential Revisions: Phab:D343 | -------------------------------------+------------------------------------- Comment (by simonpj): Well, of course the plumbing would need to change a bit. `cgAlts` would have to return something saying which branches allocated. The difficulty is that at the moment the `gc_plan` flag is passed ''into'' `cgAlts` whereas now we are proposing that the plan will depend on something ''returned'' by `cgAlts`. That might be ok, if we tied a recursive knot, provided `cgAlts` was sufficiently lazy in its "plan" parameter. More than that I cannot say without looking a lot harder at the code, something you can do just as well as I, perhaps better. Simon -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8326#comment:15 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8326: Place heap checks common in case alternatives before the case -------------------------------------+------------------------------------- Reporter: jstolarek | Owner: jstolarek Type: task | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.7 Resolution: | Keywords: Operating System: | Architecture: Unknown/Multiple Unknown/Multiple | Difficulty: Unknown Type of failure: | Blocked By: None/Unknown | Related Tickets: #1498 Test Case: | Blocking: 8317 | Differential Revisions: Phab:D343 | -------------------------------------+------------------------------------- Comment (by jstolarek):
Well, of course the plumbing would need to change a bit. (...)
Right. I just wanted to make sure that I understood things correctly. I'll try to figure out how to make that change. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8326#comment:16 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8326: Place heap checks common in case alternatives before the case -------------------------------------+------------------------------------- Reporter: jstolarek | Owner: jstolarek Type: task | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.7 Resolution: | Keywords: Operating System: | Architecture: Unknown/Multiple Unknown/Multiple | Difficulty: Unknown Type of failure: | Blocked By: None/Unknown | Related Tickets: #1498 Test Case: | Blocking: 8317 | Differential Revisions: Phab:D343 | -------------------------------------+------------------------------------- Comment (by jstolarek): Status update: [https://github.com/jstolarek/ghc/blob/T8326-heap-checks-alternative- plan/compiler/codeGen/StgCmmExpr.hs#L398 I tried knot-tying] but it doesn't work - `cgAlts` is strict in `gc_plan` and changing that doesn't look trivial. The only alternative I see at the moment is to: a) compile the alternatives without heap checks; b) examine heap usage of compiled alternatives c) create a GC plan d) add heap checks to compiled alternatives, if necessary That sounds simple but I have no idea how to leverage FCode monadery to add heap checks to compiled `CmmAGraph`. Am I right to think that currently there is no plumbing for compiling more code on top of already existing `CmmAGraph`? So far I was only able to came up with a prototype that [https://github.com/jstolarek/ghc/commit/266b1295abfc807f6aab2d6b3578f8d52e92... #diff-6f97a583ff892976f6b49509aec5ab28R403 implements point a-c above but instead of d) it re-compiles the alternatives from scratch]. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8326#comment:17 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8326: Place heap checks common in case alternatives before the case -------------------------------------+------------------------------------- Reporter: jstolarek | Owner: jstolarek Type: task | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.7 Resolution: | Keywords: Operating System: | Architecture: Unknown/Multiple Unknown/Multiple | Difficulty: Unknown Type of failure: | Blocked By: None/Unknown | Related Tickets: #1498 Test Case: | Blocking: 8317 | Differential Revisions: Phab:D343 | -------------------------------------+------------------------------------- Comment (by jstolarek): BTW. I think it would be a Good Thing to make `FCode` monad an instance of `MonadFix` and replace `StgCmmMonad.fixC` with `mfix`. I think this would make code easier to read. People know what `mfix` is, while implementation of `fixC` might not be immediately obvious. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8326#comment:18 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8326: Place heap checks common in case alternatives before the case -------------------------------------+------------------------------------- Reporter: jstolarek | Owner: jstolarek Type: task | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.7 Resolution: | Keywords: Operating System: | Architecture: Unknown/Multiple Unknown/Multiple | Difficulty: Unknown Type of failure: | Blocked By: None/Unknown | Related Tickets: #1498 Test Case: | Blocking: 8317 | Differential Revisions: Phab:D343 | -------------------------------------+------------------------------------- Comment (by simonpj): I'm sure this is do-able by knot-tying, but it'll need a bit of care. * Each alternative can start by emitting a pure blob of code that is a function of the GC plan * The GC plan is computed from the heap-allocation info from all the alternatives (this is the knot-tied bit) * So it must be possible to run the alternative `FCode` blobs without yet knowing the GC plan. * Currently `cgAlts` is strict in the plan, but I don't think it needs to be. After all, the code we generate for the alternatives does not depend on whether there's a heap check at the beginning. You probably need to allocate a label regardless since that is a monadic operation. So not trivial, but ought to be quite possible. ''Semantically'' the data dependencies are just fine! Simon -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8326#comment:19 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8326: Place heap checks common in case alternatives before the case -------------------------------------+------------------------------------- Reporter: jstolarek | Owner: jstolarek Type: task | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.7 Resolution: | Keywords: Operating System: | Architecture: Unknown/Multiple Unknown/Multiple | Difficulty: Unknown Type of failure: | Blocked By: None/Unknown | Related Tickets: #1498 Test Case: | Blocking: 8317 | Differential Revisions: Phab:D343 | -------------------------------------+------------------------------------- Comment (by jstolarek): Yes, data dependencies are fine. My current patch actually does the things outlined above but without knot-tying. And it's buggy at the moment. I'll try to debug my patch and if that fails I'll try the approach you just described above but I admit I'm not too keen on it. The problem is that making `cgAlts` non-strict in `gc_plan` seems very non-trivial and seems to require *a lot* of changes in the structure of the code. By contrast, my patch is quite well localized and confined to a small area of code.
You probably need to allocate a label regardless since that is a monadic operation.
I don't understand that bit. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8326#comment:20 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8326: Place heap checks common in case alternatives before the case -------------------------------------+------------------------------------- Reporter: jstolarek | Owner: Type: task | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: None/Unknown | Unknown/Multiple Blocked By: | Test Case: Related Tickets: #1498 | Blocking: 8317 | Differential Revisions: Phab:D343 -------------------------------------+------------------------------------- Changes (by jstolarek): * owner: jstolarek => Comment: I'm unable to make further progress on this ticket. Sorry. I'm unassigning this ticket - perhaps someone else can take over. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8326#comment:21 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8326: Place heap checks common in case alternatives before the case -------------------------------------+------------------------------------- Reporter: jstolarek | Owner: Type: task | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: None/Unknown | Unknown/Multiple Blocked By: | Test Case: Related Tickets: #1498 | Blocking: 8317 | Differential Revisions: Phab:D343 -------------------------------------+------------------------------------- Changes (by bgamari): * cc: bgamari (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8326#comment:22 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8326: Place heap checks common in case alternatives before the case -------------------------------------+------------------------------------- Reporter: jstolarek | Owner: Type: task | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: 8317 Related Tickets: #1498 | Differential Revisions: Phab:D343 -------------------------------------+------------------------------------- Changes (by rwbarton): * cc: rwbarton (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8326#comment:23 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8326: Place heap checks common in case alternatives before the case -------------------------------------+------------------------------------- Reporter: jstolarek | Owner: Type: task | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: 8317 Related Tickets: #1498 | Differential Revisions: Phab:D343 -------------------------------------+------------------------------------- Comment (by rwbarton): Even if it was right to do the heap checks in the alternatives there's another problem with this Cmm: `_sEV` is being kept live longer than it should be. See #10676. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8326#comment:24 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8326: Place heap checks common in case alternatives before the case -------------------------------------+------------------------------------- Reporter: jstolarek | Owner: Type: task | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: 8317 Related Tickets: #1498 | Differential Revisions: Phab:D343 -------------------------------------+------------------------------------- Changes (by rwbarton): * failure: None/Unknown => Runtime performance bug -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8326#comment:25 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8326: Place heap checks common in case alternatives before the case -------------------------------------+------------------------------------- Reporter: jstolarek | Owner: Type: task | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: 8317 Related Tickets: #1498 | Differential Revisions: Phab:D343 -------------------------------------+------------------------------------- Comment (by simonpj): Why this ticket #8326 is blocking #8317? I think precisely this: implementing #8317 on its own causes a 2.5% code size increase. Simon -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8326#comment:26 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8326: Place heap checks common in case alternatives before the case -------------------------------------+------------------------------------- Reporter: jstolarek | Owner: Type: task | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: 8317 Related Tickets: #1498 | Differential Revisions: Phab:D343 -------------------------------------+------------------------------------- Comment (by rwbarton): I'm not convinced that we shouldn't just always do the heap check outside the case, even when there is only one branch that allocates. In the worst case we do an extra memory read (HpLim, probably in L1 cache), a comparison and a branch (which is highly predictable). However * In simple functions (like the one in #10676) we don't need to allocate stack if we do the heap check up front. However a heap check in an alternative in this case requires allocating a stack frame (not sure whether occurs in all cases, or whether it is avoidable), so then we do have to do a stack check up front instead which is almost as expensive. (The difference is that SpLim is in a register while HpLim must be stored in memory.) In addition the code becomes substantially larger because we have two separate entries to the GC. * In functions which need to allocate stack anyways, putting the heap check outside the case does mean we have to an extra check when we take a branch that does not allocate. On the other hand, we can then use the same GC entry code for the stack check and the heap check, so the code is again much smaller. This may pay for the extra instructions of the heap check, especially if the non-allocating branch(es) are infrequent anyways. At any rate it's simpler than the knot-tying discussed in this ticket, so I will try it. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8326#comment:27 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8326: Place heap checks common in case alternatives before the case -------------------------------------+------------------------------------- Reporter: jstolarek | Owner: Type: task | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: 8317 Related Tickets: #1498 | Differential Revisions: Phab:D343 -------------------------------------+------------------------------------- Comment (by simonpj): Good point. Trying that is an excellent idea. Note that in a tree of (primop) conditionals, if even one branch allocates, then we'll do a heap check in all cases. It might be interesting to gather the following statistic in ticky-ticky profiling * Total number of heap checks * Total number of times that `Hp` is wound back to zero. This winding back is done by `adjustHpBackwards`. You can tell if you are winding back to zero because `vHp` is zero. If we wind back to zero, that means that we allocated nothing, so the original heap check was wasted. Then we can see what proportion of heap checks are wasted. I guess the worry is that this might happen a lot in some hot inner loop, but let's see. Thanks for doing this. If it looks reasonable it'd be a much simpler cleaner solution. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8326#comment:28 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8326: Place heap checks common in case alternatives before the case -------------------------------------+------------------------------------- Reporter: jstolarek | Owner: Type: task | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: 8317 Related Tickets: #1498 | Differential Revisions: Phab:D343 -------------------------------------+------------------------------------- Comment (by bgamari): Simon, indeed tying the knot would at very least require that two labels be unconditionally allocated and carried rather deeply into `StgCmmHeap`. Another issue is `heapCheck`, which is itself monadic as it ends up calling `getHeapUsage`. My initial thought was that `maybeAltHeapCheck` could look something like, {{{#!hs maybeAltHeapCheck :: (GcPlan,ReturnKind) -> FCode a -> FCode a maybeAltHeapCheck gc_plan code = do label1 <- allocLabelC label2 <- allocLabelC getHeapUsage $ \hpHw -> do emit $ case gc_plan of (NoGcInAlts, _) -> emptyOL (GcInAlts regs, AssignedDirectly) -> ... (GcInAlts regs, ReturnedTo lref off) -> ... code }}} Unfortunately this becomes quite invasive as you must pass those labels and the `CgInfoDownwards` pretty deeply into `StgCmmHeap`. It may be that a refactoring will help here but it seems a bit messy. In summary, as far as I can see there are several paths through the case alternative heap-check code (the code path column below shows the number of `newLabelC`s appearing in each function in parentheses), ||= `gc_plan` =||= `ret_kind` =||= canned entrypt =||= labels needed =||= code path =|| || `NoGcInAlts` || * || * || 0 || just run the code || || `GcInAlts` || `AssignedDirectly` || no || 1 || `altOrNoEscapeHeapCheck` -> `genericGC` (1) || || `GcInAlts` || `AssignedDirectly` || yes || 2 || `altOrNoEscapeHeapCheck` (2) -> `cannedGCReturnsTo` -> `heapCheck` || || `GcInAlts` || `ReturnedTo` || no || 1 || `altHeapCheckReturnsTo` -> `genericGC` (1) -> `heapCheck` || || `GcInAlts` || `ReturnedTo` || yes || 0 || `altHeapCheckReturnsTo` -> `cannedGCReturnsTo` -> `heapCheck` || -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8326#comment:29 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8326: Place heap checks common in case alternatives before the case -------------------------------------+------------------------------------- Reporter: jstolarek | Owner: Type: task | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: 8317 Related Tickets: #1498 | Differential Revisions: Phab:D343 -------------------------------------+------------------------------------- Comment (by simonpj): Before delving into this, let's try the simpler approach you suggested! simon -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8326#comment:30 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8326: Place heap checks common in case alternatives before the case -------------------------------------+------------------------------------- Reporter: jstolarek | Owner: Type: task | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: 8317 Related Tickets: #1498 | Differential Revisions: Phab:D343 -------------------------------------+------------------------------------- Comment (by bgamari): Actually, now that I look a bit more carefully `StgCmmMonad.codeOnly` may be exactly what is needed here. I'll need to think a bit more to make sure this breaks all of the potential cycles, but I am optimistic. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8326#comment:31 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8326: Place heap checks common in case alternatives before the case -------------------------------------+------------------------------------- Reporter: jstolarek | Owner: Type: task | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: 8317 Related Tickets: #1498 | Differential Revisions: Phab:D343 -------------------------------------+------------------------------------- Comment (by bgamari): I think the issue here is that while on paper the code we emit is independent of our `GcPlan`, in practice we emit the code in most cases in `heapCheck` and **how** we end up in `heapCheck` is very much dependent on the `GcPlan`. I believe this almost does what we want, {{{#!hs maybeAltHeapCheck :: (GcPlan,ReturnKind) -> FCode a -> FCode a maybeAltHeapCheck gc_plan code = do codeOnly $ case gc_plan of (NoGcInAlts,_) -> code (GcInAlts regs, AssignedDirectly) -> altHeapCheck regs code (GcInAlts regs, ReturnedTo lret off) -> altHeapCheckReturnsTo regs regs lret off code }}} The trouble is you have now thrown away the heap usage information from the code you emitted in the `NoGcInAlts` case. Alternatively, you can lift the code emission out of `heapCheck`, but this breaks its nice interface, {{{#!hs maybeAltHeapCheck :: (GcPlan,ReturnKind) -> FCode a -> FCode a maybeAltHeapCheck gc_plan code = do codeOnly $ case gc_plan of (NoGcInAlts,_) -> return () -- These now only compute the heap usage of 'code' and do not emit it (GcInAlts regs, AssignedDirectly) -> altHeapCheck regs code (GcInAlts regs, ReturnedTo lret off) -> altHeapCheckReturnsTo regs regs lret off code code }}} Anyways, let's see how Reid's approach performs; perhaps this is all irrelevant. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8326#comment:32 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8326: Place heap checks common in case alternatives before the case -------------------------------------+------------------------------------- Reporter: jstolarek | Owner: Type: task | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: 8317 Related Tickets: #1498 | Differential Revisions: Phab:D343 -------------------------------------+------------------------------------- Comment (by simonpj):
Anyways, let's see how Reid's approach performs; perhaps this is all irrelevant
Exactly! -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8326#comment:33 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8326: Place heap checks common in case alternatives before the case -------------------------------------+------------------------------------- Reporter: jstolarek | Owner: Type: task | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: 8317 Related Tickets: #1498 | Differential Revisions: Phab:D343 -------------------------------------+------------------------------------- Changes (by rwbarton): * Attachment "hc-nofib.txt" added. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8326 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8326: Place heap checks common in case alternatives before the case -------------------------------------+------------------------------------- Reporter: jstolarek | Owner: Type: task | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: 8317 Related Tickets: #1498 | Differential Revisions: Phab:D343 -------------------------------------+------------------------------------- Comment (by rwbarton): Attached nofib results are for the patch {{{ ; simple_scrut <- isSimpleScrut scrut alt_type ; let do_gc | not simple_scrut = True - | isSingleton alts = False - | up_hp_usg > 0 = False - | otherwise = True + | otherwise = False -- ticket:8326#comment:27 -- cf Note [Compiling case expressions] gc_plan = if do_gc then GcInAlts alt_regs else NoGcInAlts }}} As can be seen from the fact that very few Module Sizes changed, this actually rarely makes a difference. The reason is that there are two special cases of `cgCase` that always use NoGcInAlts. (I don't know why they do so, perhaps an oversight.) * If the case has algebraic alternatives, then either the scrutinee is not simple and we must GcInAlts, or the scrutinee is an application of `tagToEnum#` and the first special case applies and always uses NoGcInAlts. * If the case has primitive alternatives, then when the scrutinee is simply a variable, the second special case applies and always uses NoGcInAlts. So this patch only makes a difference when all of the following hold: * the case has primitive alternatives * the scrutinee is an application of a primop (that does not allocate, so it is simple, but most primops do not) * there is more than one alternative * there is no upstream heap check already * at least one alternative actually allocates (often CPR analysis has moved an allocation outside of the case) The combination of the second and third items is fairly rare, it means you are comparing the result of a primop against a constant. A typical example would be testing whether an Int is even or odd. Basically my conclusions are that * it's acceptable to apply this patch and always allocate outside the case here, since nofib did not find any significant regressions * it may still be worthwhile to try to make better decisions about whether to do heap checks in the alternatives, but then we should also do so in the special cases of `cgCase` -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8326#comment:34 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8326: Place heap checks common in case alternatives before the case -------------------------------------+------------------------------------- Reporter: jstolarek | Owner: Type: task | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: 8317 Related Tickets: #1498 | Differential Revisions: Phab:D343 -------------------------------------+------------------------------------- Comment (by simonpj): Terrific. Could you try the effect of re-enabling the fix in [ticket:8317#comment:2]. The code is still in `Simplify.hs`, just commented out. Look for "Disabled until we fix #8326". If that works (no perf cost, no binary size increase), then the next step is to remove the special case `Note [case on bool]` in `StgCmmExpr`; it will never be used because the `Simplify` change will catch the case first. Might you try those two steps? Thanks! Simon -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8326#comment:35 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8326: Place heap checks common in case alternatives before the case -------------------------------------+------------------------------------- Reporter: jstolarek | Owner: Type: task | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: 8317 Related Tickets: #1498 | Differential Revisions: Phab:D343 -------------------------------------+------------------------------------- Comment (by bgamari): Reid, any progress here? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8326#comment:36 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8326: Place heap checks common in case alternatives before the case -------------------------------------+------------------------------------- Reporter: jstolarek | Owner: Type: task | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: 8317 Related Tickets: #1498 | Differential Revisions: Phab:D343 -------------------------------------+------------------------------------- Comment (by rwbarton): I ran a nofib benchmark with these changes and there was mostly no effect, aside from a few changes in the plus-or-minus 5-10% range that I could not explain. When I find a larger block of free time I'll return to this subject (should be more pleasant now that ghcspeed will benchmark wip branches). -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8326#comment:37 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8326: Place heap checks common in case alternatives before the case -------------------------------------+------------------------------------- Reporter: jstolarek | Owner: Type: task | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: 8317 Related Tickets: #1498 | Differential Revisions: Phab:D343 -------------------------------------+------------------------------------- Comment (by jstolarek): Replying to [comment:37 rwbarton]:
a few changes in the plus-or-minus 5-10% range that I could not explain
Random idea: check ordering of assembly blocks (a.k.a #8082) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8326#comment:38 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8326: Place heap checks common in case alternatives before the case -------------------------------------+------------------------------------- Reporter: jstolarek | Owner: Type: task | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: 8317 Related Tickets: #1498 | Differential Rev(s): Phab:D343 Wiki Page: | -------------------------------------+------------------------------------- Changes (by osa1): * cc: osa1 (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8326#comment:39 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8326: Place heap checks common in case alternatives before the case -------------------------------------+------------------------------------- Reporter: jstolarek | Owner: Type: task | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: 8317 Related Tickets: #1498 | Differential Rev(s): Phab:D343 Wiki Page: | -------------------------------------+------------------------------------- Changes (by tjakway): * cc: tjakway (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8326#comment:40 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8326: Place heap checks common in case alternatives before the case -------------------------------------+------------------------------------- Reporter: jstolarek | Owner: (none) Type: task | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: 8317 Related Tickets: #1498 | Differential Rev(s): Phab:D343 Wiki Page: | -------------------------------------+------------------------------------- Changes (by michalt): * cc: michalt (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8326#comment:41 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8326: Place heap checks common in case alternatives before the case -------------------------------------+------------------------------------- Reporter: jstolarek | Owner: (none) Type: task | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: 8317 Related Tickets: #1498 | Differential Rev(s): Phab:D343 Wiki Page: | -------------------------------------+------------------------------------- Comment (by michalt): What is the status of this ticket? I've tried the patch suggested in comment:34, but my results of nofib were quite different, with some clear regressions (below I've removed anything where the difference was <2%) {{{ NoFib Results -------------------------------------------------------------------------------- Program Size Allocs Runtime Elapsed TotalMem -------------------------------------------------------------------------------- CSD -0.4% 0.0% +7.4% +7.4% 0.0% S -0.5% 0.0% +3.2% +3.3% 0.0% VS -0.5% 0.0% -4.4% -4.4% 0.0% VSM -0.5% 0.0% +8.7% +8.7% 0.0% bspt -0.5% 0.0% 0.003 0.003 +50.0% constraints -0.4% 0.0% +2.5% +2.6% 0.0% cryptarithm1 -0.4% 0.0% +6.0% +6.0% 0.0% exact-reals -1.2% 0.0% -2.7% -2.7% 0.0% fannkuch-redux -0.4% 0.0% -2.5% -2.5% 0.0% fasta -0.4% 0.0% +7.7% +7.6% 0.0% k-nucleotide -0.8% +0.0% +17.9% +18.0% 0.0% lambda -0.4% 0.0% -2.2% -2.1% 0.0% linear -1.2% 0.0% -5.9% -5.9% 0.0% mate -0.4% 0.0% -2.2% -2.2% 0.0% n-body -0.9% 0.0% +3.2% +3.2% 0.0% -------------------------------------------------------------------------------- Min -1.4% -0.2% -5.9% -5.9% 0.0% Max -0.2% +0.0% +17.9% +18.0% +50.0% Geometric Mean -0.7% -0.0% +1.7% +1.6% +0.4% }}} I've tried to have a look into `k-nucleotide`, which slowed down the most. My current understanding is that there's a tight loop within a single function that goes like this: - A: some computation, eventually goes to B - B: case on `andI# variable 127#` (one alternative is the slow path that allocates, the other is fast that doesn't) - C: alternative that does a bit of computation (but no allocation) and jumps back A Now with the change we get a heap check in front of the case, which will now be executed on every iteration and slow everything down. NOTE: I don't have much experience with investigations like this, so the whole analysis might be quite wrong. ;) Please let me know if something seems off. I'll attach the dump of STG/cmm/asm from both versions of `k-nucleotide` (with and without the patch). -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8326#comment:42 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8326: Place heap checks common in case alternatives before the case -------------------------------------+------------------------------------- Reporter: jstolarek | Owner: (none) Type: task | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: 8317 Related Tickets: #1498 | Differential Rev(s): Phab:D343 Wiki Page: | -------------------------------------+------------------------------------- Changes (by michalt): * Attachment "knucleotide-master-and-patch-dumps.tar.bz2" added. STG/cmm/asm dumps of k-nucleotide from the current master and the patched GHC -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8326 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8326: Place heap checks common in case alternatives before the case -------------------------------------+------------------------------------- Reporter: jstolarek | Owner: (none) Type: task | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: 8317 Related Tickets: #1498 | Differential Rev(s): Phab:D343 Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): What Michael says in comment:42 seems to be exactly what item (3) in the Description is all about. If the hot path does not allocate, then adding an allocation check into the hot path will cost time, eve if it reduces binary size. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8326#comment:43 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8326: Place heap checks common in case alternatives before the case -------------------------------------+------------------------------------- Reporter: jstolarek | Owner: (none) Type: task | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: 8317 Related Tickets: #1498 | Differential Rev(s): Phab:D343 Wiki Page: | -------------------------------------+------------------------------------- Comment (by AndreasK): Replying to [comment:42 michalt]:
What is the status of this ticket?
I've tried the patch suggested in comment:34, but my results of nofib were quite different, with some clear regressions (below I've removed anything where the difference was <2%)
NOTE: I don't have much experience with investigations like this, so the
whole analysis might be quite wrong. ;) Please let me know if something seems off. I'll attach the dump of STG/cmm/asm from both versions of `k-nucleotide` (with and without the patch). When I experimented with the order in which we generate uniques I also got a regression of ~18% for one of the shootout benchmarks, I think it was k-nucleotide but could have been another one. So while I don't doubt that there is a regression for k-nucleotide with this patch it doesn't have to be because the code we generate is worse for the general case. One really has to look at the Asm/Cmm for that.
it is not clear to me what to do if we have combination of the above (more than one branch that allocates heap and at least one branch that does not).
In the long run we should do some static analysis to help us determine the hot code path. Eg distinguish between: * Alternatives leading to recursion * Alternatives being called once. * Bottoming alternatives. There are some ideas and work on that in #14672. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8326#comment:44 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8326: Place heap checks common in case alternatives before the case -------------------------------------+------------------------------------- Reporter: jstolarek | Owner: (none) Type: task | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.7 Resolution: | Keywords: CodeGen Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: 8317 Related Tickets: #1498 | Differential Rev(s): Phab:D343 Wiki Page: | -------------------------------------+------------------------------------- Changes (by simonpj): * keywords: => CodeGen -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8326#comment:45 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8326: Place heap checks common in case alternatives before the case -------------------------------------+------------------------------------- Reporter: jstolarek | Owner: (none) Type: task | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.7 Resolution: | Keywords: CodeGen Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: 8317 Related Tickets: #1498 | Differential Rev(s): Phab:D343 Wiki Page: | -------------------------------------+------------------------------------- Description changed by simonpj: Old description:
We would like to have functions that check whether an `Int#` is a valid tag to represent `Bool` (see Note [Optimizing isTrue#] in ghc-prim):
{{{ isTrue# :: Int# -> Bool isTrue# 1# = True isTrue# _ = False
isFalse# :: Int# -> Bool isFalse# 0# = True isFalse# _ = False }}}
We could use them with comparison primops like this:
{{{ f :: Int# -> Int f x | isTrue# (x ># 0#) = I# x | otherwise = -(I# x) }}}
`isTrue#` is optimized away at the Core level:
{{{ A.f = \ (x_aqM :: GHC.Prim.Int#) -> case GHC.Prim.># x_aqM 0 of _ { __DEFAULT -> GHC.Types.I# (GHC.Prim.negateInt# x_aqM); 1 -> GHC.Types.I# x_aqM } }}}
but the code genrator produces very bad Cmm code, because it pushes heap checks into case alternatives:
{{{ {offset cFd: // stack check if ((Sp + -16) < SpLim) goto cFr; else goto cFs; cFr: // not enough place on the stack, call GC R2 = R2; R1 = A.f_closure; call (stg_gc_fun)(R2, R1) args: 8, res: 0, upd: 8; cFs: // scrutinize (x ># 0#) _sEU::I64 = R2; _sEV::I64 = %MO_S_Gt_W64(R2, 0); if (_sEV::I64 != 1) goto cFg; else goto cFo; cFg: // False branch Hp = Hp + 16; if (Hp > HpLim) goto cFy; else goto cFx; cFy: // not enough heap, call GC HpAlloc = 16; I64[Sp - 16] = cFf; R1 = _sEV::I64; I64[Sp - 8] = _sEU::I64; Sp = Sp - 16; call stg_gc_unbx_r1(R1) returns to cFf, args: 8, res: 8, upd: 8; cFf: // re-do the False branch _sEU::I64 = I64[Sp + 8]; Sp = Sp + 16; _sEV::I64 = R1; goto cFg; cFx: // RHS of False branch I64[Hp - 8] = GHC.Types.I#_con_info; I64[Hp] = -_sEU::I64; R1 = Hp - 7; call (P64[Sp])(R1) args: 8, res: 0, upd: 8; cFo: // True branch Hp = Hp + 16; if (Hp > HpLim) goto cFv; else goto cFu; cFv: // not enough heap, call GC HpAlloc = 16; I64[Sp - 16] = cFn; R1 = _sEV::I64; I64[Sp - 8] = _sEU::I64; Sp = Sp - 16; call stg_gc_unbx_r1(R1) returns to cFn, args: 8, res: 8, upd: 8; cFn: // re-do the True branch _sEU::I64 = I64[Sp + 8]; Sp = Sp + 16; _sEV::I64 = R1; goto cFo; cFu: // RHS of True branch I64[Hp - 8] = GHC.Types.I#_con_info; I64[Hp] = _sEU::I64; R1 = Hp - 7; call (P64[Sp])(R1) args: 8, res: 0, upd: 8; } }}}
This results in average 2.5% increase in binary size. By contrast, if we use `tagToEnum#` instead of `isTrue#` heap check will be placed before `case` expression and the code will be significantly shorter (this is done by a special case-on-bool optimization in the code generator - see #8317). What we would like to do here is:
1. compile case alternatives without placing heap checks inside them 2. each compiled alternative should return amount of heap it needs to allocate 3. code generator inspects amounts of heap needed by each alternative and either adds heap checks in alternatives or puts a single check before the case expression.
Getting this right might be a bit tricky. 1. if all branches allocate some heap then we can just put a common heap check before the case. Note that we must allocate the higgest amount required by any of the alternatives and then alternatives that use less heap must retract the heap pointer accordingly. 2. if we have two alternatives, one of which allocates heap and the other does not, we should place the heap check only in the alternative that allocates the stack. This will solve #1498. 3. it is not clear to me what to do if we have combination of the above (more than one branch that allocates heap and at least one branch that does not). If we place heap check before the `case` expression we lose optimization of recursive functions and face the problem described in #1498. If we push heap checks into branches that allocate heap then we get code duplication, i.e. the problem that we're addressing in this ticket. I guess the only way to make correct decission here is to try different aproaches and measure their performance.
This ticket is mentioned [http://ghc.haskell.org/trac/ghc/wiki/PrimBool#Implementationdetails on this wiki page] and in the source code in Note [Optimizing isTrue#] in ghc-prim. Once this ticket is resolved we need to update these places accordingly.
New description: We would like to have functions that check whether an `Int#` is a valid tag to represent `Bool` (see Note [Optimizing isTrue#] in ghc-prim): {{{ isTrue# :: Int# -> Bool isTrue# 1# = True isTrue# _ = False isFalse# :: Int# -> Bool isFalse# 0# = True isFalse# _ = False }}} We could use them with comparison primops like this: {{{ f :: Int# -> Int f x | isTrue# (x ># 0#) = I# x | otherwise = -(I# x) }}} `isTrue#` is optimized away at the Core level: {{{ A.f = \ (x_aqM :: GHC.Prim.Int#) -> case GHC.Prim.># x_aqM 0 of _ { __DEFAULT -> GHC.Types.I# (GHC.Prim.negateInt# x_aqM); 1 -> GHC.Types.I# x_aqM } }}} but the code genrator produces very bad Cmm code, because it pushes heap checks into case alternatives: {{{ {offset cFd: // stack check if ((Sp + -16) < SpLim) goto cFr; else goto cFs; cFr: // not enough place on the stack, call GC R2 = R2; R1 = A.f_closure; call (stg_gc_fun)(R2, R1) args: 8, res: 0, upd: 8; cFs: // scrutinize (x ># 0#) _sEU::I64 = R2; _sEV::I64 = %MO_S_Gt_W64(R2, 0); if (_sEV::I64 != 1) goto cFg; else goto cFo; cFg: // False branch Hp = Hp + 16; if (Hp > HpLim) goto cFy; else goto cFx; cFy: // not enough heap, call GC HpAlloc = 16; I64[Sp - 16] = cFf; R1 = _sEV::I64; I64[Sp - 8] = _sEU::I64; Sp = Sp - 16; call stg_gc_unbx_r1(R1) returns to cFf, args: 8, res: 8, upd: 8; cFf: // re-do the False branch _sEU::I64 = I64[Sp + 8]; Sp = Sp + 16; _sEV::I64 = R1; goto cFg; cFx: // RHS of False branch I64[Hp - 8] = GHC.Types.I#_con_info; I64[Hp] = -_sEU::I64; R1 = Hp - 7; call (P64[Sp])(R1) args: 8, res: 0, upd: 8; cFo: // True branch Hp = Hp + 16; if (Hp > HpLim) goto cFv; else goto cFu; cFv: // not enough heap, call GC HpAlloc = 16; I64[Sp - 16] = cFn; R1 = _sEV::I64; I64[Sp - 8] = _sEU::I64; Sp = Sp - 16; call stg_gc_unbx_r1(R1) returns to cFn, args: 8, res: 8, upd: 8; cFn: // re-do the True branch _sEU::I64 = I64[Sp + 8]; Sp = Sp + 16; _sEV::I64 = R1; goto cFo; cFu: // RHS of True branch I64[Hp - 8] = GHC.Types.I#_con_info; I64[Hp] = _sEU::I64; R1 = Hp - 7; call (P64[Sp])(R1) args: 8, res: 0, upd: 8; } }}} This results in average 2.5% increase in binary size. By contrast, if we use `tagToEnum#` instead of `isTrue#` heap check will be placed before `case` expression and the code will be significantly shorter (this is done by a special case-on-bool optimization in the code generator - see #8317). What we would like to do here is: 1. compile case alternatives without placing heap checks inside them 2. each compiled alternative should return amount of heap it needs to allocate 3. code generator inspects amounts of heap needed by each alternative and either adds heap checks in alternatives or puts a single check before the case expression. Getting this right might be a bit tricky. 1. if all branches allocate some heap then we can just put a common heap check before the case. Note that we must allocate the higgest amount required by any of the alternatives and then alternatives that use less heap must retract the heap pointer accordingly. 2. if we have two alternatives, one of which allocates heap and the other does not, we should place the heap check only in the alternative that allocates the stack. This will solve #1498. 3. it is not clear to me what to do if we have combination of the above (more than one branch that allocates heap and at least one branch that does not). If we place heap check before the `case` expression we lose optimization of recursive functions and face the problem described in #1498. If we push heap checks into branches that allocate heap then we get code duplication, i.e. the problem that we're addressing in this ticket. I guess the only way to make correct decission here is to try different aproaches and measure their performance. This ticket is mentioned * [http://ghc.haskell.org/trac/ghc/wiki/PrimBool#Implementationdetails on this wiki page] * in the source code in Note [Optimizing isTrue#] in ghc-prim. * In `Simplify.hs`, `Note [Optimising tagToEnum#]` Once this ticket is resolved we need to update these places accordingly. -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8326#comment:46 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC