Re: [GHC] #6070: Fun with the demand analyser

#6070: Fun with the demand analyser -------------------------------------+------------------------------------ Reporter: simonpj | Owner: simonpj Type: bug | Status: new Priority: normal | Milestone: 7.8.1 Component: Compiler | Version: 7.4.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Unknown/Multiple Type of failure: None/Unknown | Difficulty: Unknown Test Case: | Blocked By: Blocking: | Related Tickets: -------------------------------------+------------------------------------ Description changed by simonpj: Old description:
Max writes: I've been trying to speed up the supercompiler, and in the process I've noticed some infelicities in the demand analyser that might be worth looking at.
== Infelicity 1: analysis does not take into account extra unboxing done by the CPR transformation == An example is: {{{ e :: (Int, Int) -> Int -> (Int, Int) e x y = x `seq` if y > 10 then x else e x (y + 1) }}} Because x is seqed, the occurrence in the "then" branch gets the CPR property so e has the CPR property overall. However, the overall demand on x is S(AA), i.e. the demand analyser believes the x box is used -- if the box were unused it would get the demand U(LL). So the overall demand type is S(AA)U(L)m, and the worker looks like: {{{ Rec { CPR2.$we [Occ=LoopBreaker] :: (GHC.Types.Int, GHC.Types.Int) -> GHC.Prim.Int# -> (# GHC.Types.Int, GHC.Types.Int #) [GblId, Arity=2, Caf=NoCafRefs, Str=DmdType S(AA)L, Unf=OtherCon []] CPR2.$we = \ (w_srv :: (GHC.Types.Int, GHC.Types.Int)) (ww_srt :: GHC.Prim.Int#) -> case GHC.Prim.># ww_srt 10 of _ { GHC.Types.False -> case GHC.Prim.+# ww_srt 1 of sat_ssS { __DEFAULT -> CPR2.$we w_srv sat_ssS }; GHC.Types.True -> case w_srv of _ { (ww2_srA, ww3_srB) -> (# ww2_srA, ww3_srB #) } } end Rec } }}} But if we demand-analysed $we then GHC would say that it had the demand type U(LL)L and unbox the pair argument! It seems silly that the demand analyser outputs code that can be improved further by just running demand analysis again.
Somewhere where this really bit me in practice is in: {{{ d :: M.Map Int Int -> (Int, Int) d m = M.foldrWithKey' (\k v (a, b) -> if k + v > 2 then (a, b) else (b, a)) (0, 1) m }}} Which generates this inner loop: {{{ Rec { CPR2.$wgo2 [Occ=LoopBreaker] :: (GHC.Types.Int, GHC.Types.Int) -> Data.Map.Base.Map GHC.Types.Int GHC.Types.Int -> (# GHC.Types.Int, GHC.Types.Int #) [GblId, Arity=2, Caf=NoCafRefs, Str=DmdType S(AA)S, Unf=OtherCon []] CPR2.$wgo2 = \ (w_srS :: (GHC.Types.Int, GHC.Types.Int)) (w1_srQ :: Data.Map.Base.Map GHC.Types.Int GHC.Types.Int) -> case w1_srQ of _ { Data.Map.Base.Tip -> case w_srS of _ { (ww1_srW, ww2_srX) -> (# ww1_srW, ww2_srX #) }; Data.Map.Base.Bin rb_st1 kx_ss3 x_ssa l_ssk r_ss6 -> case kx_ss3 of _ { GHC.Types.I# x1_ssd -> case CPR2.$wgo2 w_srS r_ss6 of _ { (# ww1_ssi, ww2_ssh #) -> case x_ssa of _ { GHC.Types.I# y_sse -> case GHC.Prim.+# x1_ssd y_sse of sat_st0 { __DEFAULT -> case GHC.Prim.># sat_st0 2 of _ { GHC.Types.False -> let { sat_ssZ :: (GHC.Types.Int, GHC.Types.Int) [LclId] sat_ssZ = (ww2_ssh, ww1_ssi) } in CPR2.$wgo2 sat_ssZ l_ssk; GHC.Types.True -> let { sat_st6 :: (GHC.Types.Int, GHC.Types.Int) [LclId] sat_st6 = (ww1_ssi, ww2_ssh) } in CPR2.$wgo2 sat_st6 l_ssk } } } } } } end Rec } }}} We can save a number of allocations proportional to the size of the Map if the demand analyser didn't have this problem.
I hacked up the analyser to "fix" this by changing the lines at line 473 onward to read: {{{ if isTopLevel top_lvl then fn_ty -- Don't record top level things else case dmd of Box (Eval (Poly Abs)) | DmdType _ _ res <- fn_ty, returnsCPR res -> addVarDmd fn_ty var (Eval (Poly lazyDmd)) _ -> addVarDmd fn_ty var dmd }}} So if we demand a CPRish variable (such as bound by a case scrutinee or a U(LL)-demanded lambda-binder) in the evalDmd then I throw away the Box part of the evalDmd on the basis that after CPRing we won't demand the box at all.
I doubt this hack is the right solution (and I haven't tried it to see how it affects the libraries) but perhaps it gives you some ideas.
== Infelicity 2: a case where demand summarisation hurts us ==
I found practical examples where summarising the demand transfomer of a function as a single strictness signature hurt us. The examples were code like: {{{ h :: (Int, Int) -> Int -> (Int, Int) h x y = if y > 10 then x else h (case h x 0 of (y1, y2) -> (y2, y1)) (y + 1) }}} If, at the innermost use of h, we re-analyse h in the demand C(C(U(LL))) rather than just unleashing the vanilla DmdSig computed from the demand C(C(S)) then we can unbox the pair argument. Currently GHC just gives h the DmdType SU(L) which doesn't eliminate the allocation of the pair at all.
This situation occurs in practice with code like: {{{ c :: M.Map Int Int -> (Int, Int) c m = M.foldrWithKey (\k v (a, b) -> if k + v > 2 then (a, b) else (b, a)) (0, 1) m }}} The difference from my earlier example d is that I'm using the version of `foldrWithKey` that doesn't call `seq`. Current GHC generates this inner loop: {{{ Rec { CPR2.c_go2 [Occ=LoopBreaker] :: (GHC.Types.Int, GHC.Types.Int) -> Data.Map.Base.Map GHC.Types.Int GHC.Types.Int -> (GHC.Types.Int, GHC.Types.Int) [GblId, Arity=2, Caf=NoCafRefs, Str=DmdType U(L*)S, Unf=OtherCon []] CPR2.c_go2 = \ (z_spW :: (GHC.Types.Int, GHC.Types.Int)) (ds_spU :: Data.Map.Base.Map GHC.Types.Int GHC.Types.Int) -> case ds_spU of _ { Data.Map.Base.Tip -> z_spW; Data.Map.Base.Bin rb_sqq kx_sq2 x_sq9 l_sqj r_sq5 -> case kx_sq2 of _ { GHC.Types.I# x1_sqc -> case CPR2.c_go2 z_spW r_sq5 of wild2_sqk { (a_sqh, b_sqg) -> case x_sq9 of _ { GHC.Types.I# y_sqd -> case GHC.Prim.+# x1_sqc y_sqd of sat_sqp { __DEFAULT -> case GHC.Prim.># sat_sqp 2 of _ { GHC.Types.False -> let { sat_sqo :: (GHC.Types.Int, GHC.Types.Int) [LclId] sat_sqo = (b_sqg, a_sqh) } in CPR2.c_go2 sat_sqo l_sqj; GHC.Types.True -> CPR2.c_go2 wild2_sqk l_sqj } } } } } } end Rec } }}} I implemented this but the overhead of reanalysing a function at each occurrence site is prohibitive (with my changes paraffins took 2.5s to compile instead of 0.3s). It does fix the problem though.
A scheme like in "stricterness more relevant" but with let-binding that is polymorphic in the use-site demand might be able to detect the extra strictness here. I think Stefan Holdermanns has a paper describing such a system. But this is anyway much harder to fix than my first infelicity.
New description: Max writes: I've been trying to speed up the supercompiler, and in the process I've noticed some infelicities in the demand analyser that might be worth looking at. One is #5949. The other is: == Infelicity 2: a case where demand summarisation hurts us == I found practical examples where summarising the demand transfomer of a function as a single strictness signature hurt us. The examples were code like: {{{ h :: (Int, Int) -> Int -> (Int, Int) h x y = if y > 10 then x else h (case h x 0 of (y1, y2) -> (y2, y1)) (y + 1) }}} If, at the innermost use of h, we re-analyse h in the demand C(C(U(LL))) rather than just unleashing the vanilla DmdSig computed from the demand C(C(S)) then we can unbox the pair argument. Currently GHC just gives h the DmdType SU(L) which doesn't eliminate the allocation of the pair at all. This situation occurs in practice with code like: {{{ c :: M.Map Int Int -> (Int, Int) c m = M.foldrWithKey (\k v (a, b) -> if k + v > 2 then (a, b) else (b, a)) (0, 1) m }}} The difference from my earlier example d is that I'm using the version of `foldrWithKey` that doesn't call `seq`. Current GHC generates this inner loop: {{{ Rec { CPR2.c_go2 [Occ=LoopBreaker] :: (GHC.Types.Int, GHC.Types.Int) -> Data.Map.Base.Map GHC.Types.Int GHC.Types.Int -> (GHC.Types.Int, GHC.Types.Int) [GblId, Arity=2, Caf=NoCafRefs, Str=DmdType U(L*)S, Unf=OtherCon []] CPR2.c_go2 = \ (z_spW :: (GHC.Types.Int, GHC.Types.Int)) (ds_spU :: Data.Map.Base.Map GHC.Types.Int GHC.Types.Int) -> case ds_spU of _ { Data.Map.Base.Tip -> z_spW; Data.Map.Base.Bin rb_sqq kx_sq2 x_sq9 l_sqj r_sq5 -> case kx_sq2 of _ { GHC.Types.I# x1_sqc -> case CPR2.c_go2 z_spW r_sq5 of wild2_sqk { (a_sqh, b_sqg) -> case x_sq9 of _ { GHC.Types.I# y_sqd -> case GHC.Prim.+# x1_sqc y_sqd of sat_sqp { __DEFAULT -> case GHC.Prim.># sat_sqp 2 of _ { GHC.Types.False -> let { sat_sqo :: (GHC.Types.Int, GHC.Types.Int) [LclId] sat_sqo = (b_sqg, a_sqh) } in CPR2.c_go2 sat_sqo l_sqj; GHC.Types.True -> CPR2.c_go2 wild2_sqk l_sqj } } } } } } end Rec } }}} I implemented this but the overhead of reanalysing a function at each occurrence site is prohibitive (with my changes paraffins took 2.5s to compile instead of 0.3s). It does fix the problem though. A scheme like in "stricterness more relevant" but with let-binding that is polymorphic in the use-site demand might be able to detect the extra strictness here. I think Stefan Holdermanns has a paper describing such a system. But this is anyway much harder to fix than my first infelicity. -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/6070#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC