
#8032: Worker-wrapper transform and NOINLINE trigger bad reboxing behavior --------------------------------------------+------------------------------ Reporter: ezyang | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime performance bug | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: | Blocked By: | Related Tickets: --------------------------------------------+------------------------------ Description changed by ezyang: Old description:
(Note: I've updated the ticket with a simpler test-case).
NOINLINE and the worker-wrapper transform sometimes interact poorly to cause unnecessary extra reboxing.
{{{ module Gnam where
data D = D Int
foo k d@(D e) = let loop i | i < e = loop (i + 1) | otherwise = baz k d i in loop 0 where {-# NOINLINE baz #-} baz k d i = k (d, i) }}}
This results in the following STG:
{{{ Gnam.$wfoo :: forall t_alo. ((Gnam.D, GHC.Types.Int) -> t_alo) -> GHC.Prim.Int# -> t_alo [GblId, Arity=2, Caf=NoCafRefs, Str=DmdType C(S)L, Unf=OtherCon []] = \r [w_sob ww_so3] let { e_so4 :: GHC.Types.Int [LclId, Unf=OtherCon []] = NO_CCS GHC.Types.I#! [ww_so3]; } in let { wild_so6 :: Gnam.D [LclId, Unf=OtherCon []] = NO_CCS Gnam.D! [e_so4]; } in }}}
This worker function needs to box its arguments so that they can be passed to baz. However, the only invocation of wfoo already had these arguments available:
{{{ Gnam.foo [InlPrag=INLINE[0]] :: forall t_alo. ((Gnam.D, GHC.Types.Int) -> t_alo) -> Gnam.D -> t_alo [GblId, Arity=2, Caf=NoCafRefs, Str=DmdType C(S)U(U(L)), Unf=OtherCon []] = \r [w_son w1_soh] case w1_soh of _ { Gnam.D ww_sok -> case ww_sok of _ { GHC.Types.I# ww2_soo -> Gnam.$wfoo w_son ww2_soo; }; }; }}}
The problem seems to lie in how the worker wrapper transformation operates. Before, the STG is:
{{{ Gnam.foo = \ (@ t_alr) (k_aeM [Dmd=Just C(S)] :: (Gnam.D, GHC.Types.Int) -> t_alr) (d_aeN [Dmd=Just U(U(L))] :: Gnam.D) -> case d_aeN of wild_X5 { Gnam.D e_aeO [Dmd=Just U(L)] -> letrec { loop_smj [Occ=LoopBreaker] :: GHC.Types.Int -> t_alr [LclId, Arity=1, Str=DmdType U(L) {aeM->C(S) aeO->U(L)}, Unf=Unf{Src=<vanilla>, TopLvl=False, Arity=1, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=IF_ARGS [20] 112 0}] loop_smj = \ (i_aeU [Dmd=Just U(L)] :: GHC.Types.Int) -> case i_aeU of wild_alU { GHC.Types.I# x_alW [Dmd=Just L] -> case e_aeO of _ { GHC.Types.I# y_am0 [Dmd=Just L] -> case GHC.Prim.<# x_alW y_am0 of _ { GHC.Types.False -> baz_smn @ t_alr @ Gnam.D @ GHC.Types.Int k_aeM wild_X5 wild_alU; GHC.Types.True -> loop_smj (GHC.Types.I# (GHC.Prim.+# x_alW 1)) } } }; } in loop_smj lvl_smr } }}}
Notice that wild_alU is being properly used in the result. After the worker wrapper transformation, foo is now:
{{{ Gnam.foo = \ (@ t_alp) (w_sn7 [Dmd=Just C(S)] :: (Gnam.D, GHC.Types.Int) -> t_alp) (w_sn8 [Dmd=Just U(U(L))] :: Gnam.D) -> case w_sn8 of w_sn8 { Gnam.D ww_sna -> case ww_sna of ww_sna { GHC.Types.I# ww_snc -> $wfoo_sng @ t_alp w_sn7 ww_snc } } }}}
So it seems that we should also pass along the evaluated variables, in case they are used. There is a tradeoff here, in that we will require more arguments to the function than if we just reconstructed it. However, if we smarten up worker-wrapper so that it drops unused arguments, this could be a win when not all of the fields are used, e.g. if we add another field to D:
{{{ Gnam.foo [InlPrag=INLINE[0]] :: forall t_alp. ((Gnam.D, GHC.Types.Int) -> t_alp) -> Gnam.D -> t_alp [GblId, Arity=2, Caf=NoCafRefs, Str=DmdType C(S)U(U(L)L), Unf=OtherCon []] = \r [w_som w1_sof] case w1_sof of _ { Gnam.D ww_soj ww1_soo -> case ww_soj of _ { GHC.Types.I# ww3_son -> Gnam.$wfoo w_som ww3_son ww1_soo; }; }; }}}
Now ww1_soo is passed, even though it is dead. I think there is a comment to this effect in the simplifier already.
New description: (Note: I've updated the ticket with a simpler test-case). NOINLINE and the worker-wrapper transform sometimes interact poorly to cause unnecessary extra reboxing. {{{ module Gnam where data D = D Int foo k d@(D e) = let loop i | i < e = loop (i + 1) | otherwise = baz k d i in loop 0 where {-# NOINLINE baz #-} baz k d i = k (d, i) }}} This results in the following STG: {{{ Gnam.$wfoo :: forall t_alo. ((Gnam.D, GHC.Types.Int) -> t_alo) -> GHC.Prim.Int# -> t_alo [GblId, Arity=2, Caf=NoCafRefs, Str=DmdType C(S)L, Unf=OtherCon []] = \r [w_sob ww_so3] let { e_so4 :: GHC.Types.Int [LclId, Unf=OtherCon []] = NO_CCS GHC.Types.I#! [ww_so3]; } in let { wild_so6 :: Gnam.D [LclId, Unf=OtherCon []] = NO_CCS Gnam.D! [e_so4]; } in }}} This worker function needs to box its arguments so that they can be passed to baz. However, the only invocation of wfoo already had these arguments available: {{{ Gnam.foo [InlPrag=INLINE[0]] :: forall t_alo. ((Gnam.D, GHC.Types.Int) -> t_alo) -> Gnam.D -> t_alo [GblId, Arity=2, Caf=NoCafRefs, Str=DmdType C(S)U(U(L)), Unf=OtherCon []] = \r [w_son w1_soh] case w1_soh of _ { Gnam.D ww_sok -> case ww_sok of _ { GHC.Types.I# ww2_soo -> Gnam.$wfoo w_son ww2_soo; }; }; }}} The problem seems to lie in how the worker wrapper transformation operates. Before, the STG is: {{{ Gnam.foo = \ (@ t_alr) (k_aeM [Dmd=Just C(S)] :: (Gnam.D, GHC.Types.Int) -> t_alr) (d_aeN [Dmd=Just U(U(L))] :: Gnam.D) -> case d_aeN of wild_X5 { Gnam.D e_aeO [Dmd=Just U(L)] -> letrec { loop_smj [Occ=LoopBreaker] :: GHC.Types.Int -> t_alr [LclId, Arity=1, Str=DmdType U(L) {aeM->C(S) aeO->U(L)}, Unf=Unf{Src=<vanilla>, TopLvl=False, Arity=1, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=IF_ARGS [20] 112 0}] loop_smj = \ (i_aeU [Dmd=Just U(L)] :: GHC.Types.Int) -> case i_aeU of wild_alU { GHC.Types.I# x_alW [Dmd=Just L] -> case e_aeO of _ { GHC.Types.I# y_am0 [Dmd=Just L] -> case GHC.Prim.<# x_alW y_am0 of _ { GHC.Types.False -> baz_smn @ t_alr @ Gnam.D @ GHC.Types.Int k_aeM wild_X5 wild_alU; GHC.Types.True -> loop_smj (GHC.Types.I# (GHC.Prim.+# x_alW 1)) } } }; } in loop_smj lvl_smr } }}} Notice that wild_alU is being properly used in the result. After the worker wrapper transformation, foo is now: {{{ Gnam.foo = \ (@ t_alp) (w_sn7 [Dmd=Just C(S)] :: (Gnam.D, GHC.Types.Int) -> t_alp) (w_sn8 [Dmd=Just U(U(L))] :: Gnam.D) -> case w_sn8 of w_sn8 { Gnam.D ww_sna -> case ww_sna of ww_sna { GHC.Types.I# ww_snc -> $wfoo_sng @ t_alp w_sn7 ww_snc } } }}} So it seems that we should also pass along the evaluated variables, in case they are used. There is a tradeoff here, in that we will require more arguments to the function than if we just reconstructed it. However, if we smarten up worker-wrapper so that it drops unused arguments, this could be a win when not all of the fields are used, e.g. if we add another field to D: {{{ Gnam.foo [InlPrag=INLINE[0]] :: forall t_alp. ((Gnam.D, GHC.Types.Int) -> t_alp) -> Gnam.D -> t_alp [GblId, Arity=2, Caf=NoCafRefs, Str=DmdType C(S)U(U(L)L), Unf=OtherCon []] = \r [w_som w1_sof] case w1_sof of _ { Gnam.D ww_soj ww1_soo -> case ww_soj of _ { GHC.Types.I# ww3_son -> Gnam.$wfoo w_som ww3_son ww1_soo; }; }; }}} Now ww1_soo is passed, even though it is only ever used to rebox the value. I think there is a comment to this effect in the simplifier already. Passing boxed value would require exactly the same number of function arguments, but save on a heap allocation! -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8032#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler