
#8032: Worker-wrapper transform and NOINLINE trigger bad reboxing behavior --------------------------------------------+------------------------------ Reporter: ezyang | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime performance bug | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: | Blocked By: | Related Tickets: --------------------------------------------+------------------------------ Changes (by ezyang): * difficulty: => Unknown Old description:
When we worker-wrapper transform functions, we tend to be to eager to unbox all of our arguments and pass them to the worker. This backfires when the boxed version of the argument is needed:
{{{ module Gnam where
data KST = KST { ke :: {-# UNPACK #-} !Int , ka :: {-# UNPACK #-} !Int }
data KStop r = KNeedInput Int !(KC r) data Result r = Result r type KC r = KST -> Int -> Result r
newtype K r a = K { unK :: (KST -> KStop r -> Result r) -> KC r }
skipWhileK :: K () () skipWhileK = K $ \kf kst@KST{ ke = e } i0 -> let loop i rw0 -- Note: removing rw0 argument gets rid of re-boxing behavior | i < e = loop (i + 1) rw0 | otherwise = unK recurse kf kst i in loop i0 (0 :: Int) where -- Note that without NOINLINE, we get an unnecessary eager -- closure allocation even when closure is never used. This -- is unfortunate because recurse is the slow path (e.g., -- called 1/4000 of the time in the real code). {-# NOINLINE recurse #-} recurse = K $ \kf kst i -> kf kst $ KNeedInput i $ unK skipWhileK kf }}}
skipWhileK is an important loop which we would like to avoid performing heap allocation on. However, this code is compiled by HEAD with an allocation which reboxes KST:
{{{ Gnam.$wa :: (Gnam.KST -> Gnam.KStop () -> Gnam.Result ()) -> GHC.Prim.Int# -> GHC.Prim.Int# -> GHC.Prim.Int# -> Gnam.Result () [GblId, Arity=4, Caf=NoCafRefs, Str=DmdType
, Unf=OtherCon []] = \r srt:SRT:[] [w_srG ww_srz ww1_srA ww2_srK] let { wild_srB :: Gnam.KST [LclId, Str=DmdType, Unf=OtherCon []] = NO_CCS Gnam.KST! [ww_srz ww1_srA]; } in ... }}} The worker function takes i and the unboxed KST as arguments, as one might hope, but when it discovers that it needs KST on the inside, it has no choice but to reconstruct KST inside. What should be done instead is wild_srB (produced by the case analysis on KST here:
{{{ \r srt:SRT:[] [w_srV w1_srO w2_srS] case w1_srO of _ { // <--- wild_ here please! Gnam.KST ww1_srW ww2_srX -> case w2_srS of _ { GHC.Types.I# ww4_srY -> Gnam.$wa w_srV ww1_srW ww2_srX ww4_srY; }; }; }}}
This is perhaps a tradeoff: passing the evaluated version of things that were unpacked costs an extra argument; however, unboxing things can result in a lot of extra arguments! (And GHC doesn't seem to eliminate unused arguments, even when the rebox is not necessary.)
New description: (Note: I've updated the ticket with a simpler test-case). NOINLINE and the worker-wrapper transform sometimes interact poorly to cause unnecessary extra reboxing. {{{ module Gnam where data D = D Int foo k d@(D e) = let loop i | i < e = loop (i + 1) | otherwise = baz k d i in loop 0 where {-# NOINLINE baz #-} baz k d i = k (d, i) }}} This results in the following STG: {{{ Gnam.$wfoo :: forall t_alo. ((Gnam.D, GHC.Types.Int) -> t_alo) -> GHC.Prim.Int# -> t_alo [GblId, Arity=2, Caf=NoCafRefs, Str=DmdType C(S)L, Unf=OtherCon []] = \r [w_sob ww_so3] let { e_so4 :: GHC.Types.Int [LclId, Unf=OtherCon []] = NO_CCS GHC.Types.I#! [ww_so3]; } in let { wild_so6 :: Gnam.D [LclId, Unf=OtherCon []] = NO_CCS Gnam.D! [e_so4]; } in }}} This worker function needs to box its arguments so that they can be passed to baz. However, the only invocation of wfoo already had these arguments available: {{{ Gnam.foo [InlPrag=INLINE[0]] :: forall t_alo. ((Gnam.D, GHC.Types.Int) -> t_alo) -> Gnam.D -> t_alo [GblId, Arity=2, Caf=NoCafRefs, Str=DmdType C(S)U(U(L)), Unf=OtherCon []] = \r [w_son w1_soh] case w1_soh of _ { Gnam.D ww_sok -> case ww_sok of _ { GHC.Types.I# ww2_soo -> Gnam.$wfoo w_son ww2_soo; }; }; }}} The problem seems to lie in how the worker wrapper transformation operates. Before, the STG is: {{{ Gnam.foo = \ (@ t_alr) (k_aeM [Dmd=Just C(S)] :: (Gnam.D, GHC.Types.Int) -> t_alr) (d_aeN [Dmd=Just U(U(L))] :: Gnam.D) -> case d_aeN of wild_X5 { Gnam.D e_aeO [Dmd=Just U(L)] -> letrec { loop_smj [Occ=LoopBreaker] :: GHC.Types.Int -> t_alr [LclId, Arity=1, Str=DmdType U(L) {aeM->C(S) aeO->U(L)}, Unf=Unf{Src=<vanilla>, TopLvl=False, Arity=1, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=IF_ARGS [20] 112 0}] loop_smj = \ (i_aeU [Dmd=Just U(L)] :: GHC.Types.Int) -> case i_aeU of wild_alU { GHC.Types.I# x_alW [Dmd=Just L] -> case e_aeO of _ { GHC.Types.I# y_am0 [Dmd=Just L] -> case GHC.Prim.<# x_alW y_am0 of _ { GHC.Types.False -> baz_smn @ t_alr @ Gnam.D @ GHC.Types.Int k_aeM wild_X5 wild_alU; GHC.Types.True -> loop_smj (GHC.Types.I# (GHC.Prim.+# x_alW 1)) } } }; } in loop_smj lvl_smr } }}} Notice that wild_alU is being properly used in the result. After the worker wrapper transformation, foo is now: {{{ Gnam.foo = \ (@ t_alp) (w_sn7 [Dmd=Just C(S)] :: (Gnam.D, GHC.Types.Int) -> t_alp) (w_sn8 [Dmd=Just U(U(L))] :: Gnam.D) -> case w_sn8 of w_sn8 { Gnam.D ww_sna -> case ww_sna of ww_sna { GHC.Types.I# ww_snc -> $wfoo_sng @ t_alp w_sn7 ww_snc } } }}} So it seems that we should also pass along the evaluated variables, in case they are used. There is a tradeoff here, in that we will require more arguments to the function than if we just reconstructed it. However, if we smarten up worker-wrapper so that it drops unused arguments, this could be a win when not all of the fields are used, e.g. if we add another field to D: {{{ Gnam.foo [InlPrag=INLINE[0]] :: forall t_alp. ((Gnam.D, GHC.Types.Int) -> t_alp) -> Gnam.D -> t_alp [GblId, Arity=2, Caf=NoCafRefs, Str=DmdType C(S)U(U(L)L), Unf=OtherCon []] = \r [w_som w1_sof] case w1_sof of _ { Gnam.D ww_soj ww1_soo -> case ww_soj of _ { GHC.Types.I# ww3_son -> Gnam.$wfoo w_som ww3_son ww1_soo; }; }; }}} Now ww1_soo is passed, even though it is dead. I think there is a comment to this effect in the simplifier already. -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8032#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler