[GHC] #12915: cmmImplementSwitchPlans creates duplicate blocks

#12915: cmmImplementSwitchPlans creates duplicate blocks -------------------------------------+------------------------------------- Reporter: alexbiehl | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Keywords: | Operating System: Unknown/Multiple Architecture: | Type of failure: None/Unknown Unknown/Multiple | Test Case: | Blocked By: Blocking: | Related Tickets: Differential Rev(s): | Wiki Page: -------------------------------------+------------------------------------- Given this c-- code *after* common block elimination and *before* cmmImplementSwitchPlans: {{{#!hs ... cfzc: _sfh3::P64 = R1; _cfB7::I64 = %MO_UU_Conv_W32_W64(I32[I64[_sfh3::P64 - 1] - 4]); switch [0 .. 20] _cfB7::I64 { case 2 : goto cfAR; case 4 : goto cfAV; case 6 : goto cfAZ; default: goto cfAC; } cfAZ: _sfh6::I64 = I64[_sfh3::P64 + 7]; _sfgp::I64 = 1; goto sfgo; cfAV: _sfh5::I64 = I64[_sfh3::P64 + 7]; _sfgp::I64 = 1; goto sfgo; cfAR: _sfh4::I64 = I64[_sfh3::P64 + 7]; _sfgp::I64 = 1; goto sfgo; sfgo: ... }}} Everything is fine, the blocks cfAZ, cfAV and cfAR are all a little different. But after cmmImplementSwitchPlans the code becomes: {{{#!hs ... cfzc: _sfgl::I64 = I64[Sp + 24]; _sfgn::P64 = P64[Sp + 16]; _cfB7::I64 = %MO_UU_Conv_W32_W64(I32[I64[R1 - 1] - 4]); if (_cfB7::I64 < 5) goto ufBe; else goto ufBg; ufBe: if (_cfB7::I64 < 4) goto ufBf; else goto cfAV; ufBf: if (_cfB7::I64 != 2) goto cfAC; else goto cfAR; cfAR: _sfgp::I64 = 1; goto sfgo; cfAV: _sfgp::I64 = 1; goto sfgo; ufBg: if (_cfB7::I64 != 6) goto cfAC; else goto cfAZ; cfAZ: _sfgp::I64 = 1; goto sfgo; sfgo: ... }}} We can now see that cfAZ, cfAV and cfAR are code duplicates! Now, https://github.com/ghc/ghc/blob/master/compiler/cmm/CmmPipeline.hs#L72 states {{{ ... ----------- Eliminate common blocks ------------------------------------- g <- {-# SCC "elimCommonBlocks" #-} condPass Opt_CmmElimCommonBlocks elimCommonBlocks g Opt_D_dump_cmm_cbe "Post common block elimination" -- Any work storing block Labels must be performed _after_ -- elimCommonBlocks g <- {-# SCC "createSwitchPlans" #-} runUniqSM $ cmmImplementSwitchPlans dflags g dump Opt_D_dump_cmm_switch "Post switch plan" g ... }}} For some reason I don't understand cmmImplementSwitchPlans is run *after* elimCommonBlocks. Although it seems elimCommonBlocks can deal with the constructs introduced in cmmImplementSwitchPlans. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12915 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12915: cmmImplementSwitchPlans creates duplicate blocks -------------------------------------+------------------------------------- Reporter: alexbiehl | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by nomeata): `cmmImplementSwitchPlans` is run after `elimCommonBlocks` in order to avoid making unnecessary decisions in the branching code if the blocks happen to be the same ones any ways. I was not aware that `cmmImplementSwitchPlans` itself could make new blocks that are identical. In your examples, what happend to the assignments ` _sfh6::I64 = I64[_sfh3::P64 + 7];`? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12915#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12915: cmmImplementSwitchPlans creates duplicate blocks -------------------------------------+------------------------------------- Reporter: alexbiehl | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by alexbiehl): Ah, it is not `cmmImplementSwitchPlans` directly! After `cmmSwitchPlans` it is still {{{ cfzc: _sfh3::P64 = R1; _cfB7::I64 = %MO_UU_Conv_W32_W64(I32[I64[_sfh3::P64 - 1] - 4]); if (_cfB7::I64 < 5) goto ufBe; else goto ufBg; ufBe: if (_cfB7::I64 < 4) goto ufBf; else goto cfAV; ufBf: if (_cfB7::I64 != 2) goto cfAC; else goto cfAR; cfAR: _sfh4::I64 = I64[_sfh3::P64 + 7]; _sfgp::I64 = 1; goto sfgo; cfAV: _sfh5::I64 = I64[_sfh3::P64 + 7]; _sfgp::I64 = 1; goto sfgo; ufBg: if (_cfB7::I64 != 6) goto cfAC; else goto cfAZ; cfAZ: _sfh6::I64 = I64[_sfh3::P64 + 7]; _sfgp::I64 = 1; goto sfgo; sfgo: }}} `_sfh4`, `_sfh5`, `_sfh6` are dead. The sinking pass eliminates those and we end up with duplicate blocks! This code is part of an inner loop which just counts the occurrence of certain constructors. Here is the core for the mentioned loop: {{{#!hs letrec { $sloop :: State# RealWorld -> Int# -> (# State# RealWorld, Either Error Int #) $sloop = \ (sc :: State# RealWorld) (sc1 :: Int#) -> case (ww `cast` ...) sc of _ { (# ipv, ipv1 #) -> case ipv1 of _ { Nothing -> (# ipv, Right (I# sc1) #); Just a -> case length $fVectorVectora a of _ { I# y -> case y of _ { __DEFAULT -> (# ipv, empResult2 #); 4# -> case a of _ { Vector dt dt1 dt2 -> let { $wsucc_ :: Int# -> State# RealWorld -> (# State# RealWorld, Either Error Int #) $wsucc_ = \ (ww1 :: Int#) (w1 :: State# RealWorld) -> let { $wsucc_1 :: Int# -> State# RealWorld -> (# State# RealWorld, Either Error Int #) $wsucc_1 = \ (ww2 :: Int#) (w2 :: State# RealWorld) -> case indexArray# dt2 (+# dt ww2) of _ { (# ipv2 #) -> case ipv2 of _ { __DEFAULT -> (# w2, empResult2 #); CV_Int8 dt4 -> case indexArray# dt2 (+# dt (+# ww2 1#)) of _ { (# ipv3 #) -> case ipv3 of _ { __DEFAULT -> (# w2, empResult2 #); CV_Text t -> $sloop w2 (+# sc1 1#) } }; CV_Int16 dt4 -> case indexArray# dt2 (+# dt (+# ww2 1#)) of _ { (# ipv3 #) -> case ipv3 of _ { __DEFAULT -> (# w2, empResult2 #); CV_Text t -> $sloop w2 (+# sc1 1#) } }; CV_Int32 dt4 -> case indexArray# dt2 (+# dt (+# ww2 1#)) of _ { (# ipv3 #) -> case ipv3 of _ { __DEFAULT -> (# w2, empResult2 #); CV_Text t -> $sloop w2 (+# sc1 1#) } } } } } in case indexArray# dt2 (+# dt ww1) of _ { (# ipv2 #) -> case ipv2 of _ { __DEFAULT -> (# w1, empResult2 #); CV_Int8 dt4 -> $wsucc_1 (+# ww1 1#) w1; CV_Int16 dt4 -> $wsucc_1 (+# ww1 1#) w1; CV_Int32 dt4 -> $wsucc_1 (+# ww1 1#) w1 } } } in case indexArray# dt2 dt of _ { (# ipv2 #) -> case ipv2 of _ { __DEFAULT -> (# ipv, empResult2 #); CV_Int8 dt4 -> $wsucc_ 1# ipv; CV_Int16 dt4 -> $wsucc_ 1# ipv; CV_Int32 dt4 -> $wsucc_ 1# ipv } } } } } }}} You can see a minimized example of the original code here: https://gist.github.com/alexbiehl/0a1b5016223e00ae79a1399176e14eef -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12915#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12915: cmmImplementSwitchPlans creates duplicate blocks -------------------------------------+------------------------------------- Reporter: alexbiehl | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by nomeata): Just guessing here, but we should probably not run sinking earlier, before `layoutStack`. We could run `elimCommonBlocks` again later, but that might be expensive and not very often kick in. Maybe with `-O2`? My gut feeling is that this comes up rather rarely. If someone wants to find evidence to the contrary: Add a late `elimCommonBlocks` pass, dump number of blocks before and after, run `nofib` and see how often it fires. Or (easier) report the change in code size. There is probably no obvious way of avoiding to generate these dead assignments in the first place? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12915#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12915: cmmImplementSwitchPlans creates duplicate blocks -------------------------------------+------------------------------------- Reporter: alexbiehl | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by michalt): * cc: michalt (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12915#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12915: cmmImplementSwitchPlans creates duplicate blocks -------------------------------------+------------------------------------- Reporter: alexbiehl | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by michalt): @nomeata: Yes, we definitely don't want to move the sinking pass before stack layout. Stack layout can generate a lot of unnecessary stuff that the sinking pass removes. There's also a `Node [Sinking after stack layout]` that explains why we don't currently run sinking twice (before and after stack layout). As for the comment in code: {{{ -- Any work storing block Labels must be performed _after_ -- elimCommonBlocks }}} IIUC it refers to things like proc point analysis that stores the labels for later use (and since `elimCommonBlock` can remove blocks/labels, it might no longer be safe to use those results). Anyway, I did a quick experiment and modified `CmmPipeline`: - Added a call to `elimCommonBlocks` directly after the sinking pass. - Duplicated the proc point analysis so that the actual splitting uses up to date information. Results: - Binary size: {{{ -1 s.d. ----- -1.0% +1 s.d. ----- -0.8% Average ----- -0.9% }}} - Module sizes: {{{ -1 s.d. ----- -1.8% +1 s.d. ----- +0.6% Average ----- -0.6% }}} - Compile times: {{{ -1 s.d. ----- -1.6% +1 s.d. ----- +3.3% Average ----- +0.8% }}} - Compiler allocations: {{{ -1 s.d. ----- +0.3% +1 s.d. ----- +1.0% Average ----- +0.7% }}} Runtime didn't seem to change (there were some outliers in the 1-2% area, but I think that's just noise). So it seems that there is actually some potential to decrease binary size, but I'm not sure it's worth the compile time overhead. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12915#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12915: cmmImplementSwitchPlans creates duplicate blocks -------------------------------------+------------------------------------- Reporter: alexbiehl | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by nomeata): Thanks for the empirical data! Much appreciated.
I'm not sure it's worth the compile time overhead.
I also doubt it. Shall we close this? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12915#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12915: cmmImplementSwitchPlans creates duplicate blocks -------------------------------------+------------------------------------- Reporter: alexbiehl | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by michalt): I'm also leaning towards closing. Trading ~1% compile time for a similar decrease of binary size doesn't sound like a great deal... @alexbiehl: Is this causing you some problems/performance issues? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12915#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12915: cmmImplementSwitchPlans creates duplicate blocks -------------------------------------+------------------------------------- Reporter: alexbiehl | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by hsyl20): Could we allow this additional pass with an optimization flag? It would be a good candidate for a `-O2` (“Apply every non-dangerous optimisation, even if it means significantly longer compile times.” so you get what you ask for) or a potential `-Os`. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12915#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12915: cmmImplementSwitchPlans creates duplicate blocks -------------------------------------+------------------------------------- Reporter: alexbiehl | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by rwbarton): I'm thinking the same as hsyl20. The binary size numbers indicate that whatever parts of the standard libraries end up getting linked into all the nofib programs got 1% smaller, which is nothing to sneeze at. Might also be worth checking out whether this is due to a few small gains and whether there might be a cheaper way of getting those gains. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12915#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12915: cmmImplementSwitchPlans creates duplicate blocks -------------------------------------+------------------------------------- Reporter: alexbiehl | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by michalt): Another ticket about this: #9157 (also with a good example where sinking pass could expose some additional optimization opportunities for common block elimination). -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12915#comment:10 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12915: cmmImplementSwitchPlans creates duplicate blocks
-------------------------------------+-------------------------------------
Reporter: alexbiehl | Owner: (none)
Type: bug | Status: new
Priority: normal | Milestone:
Component: Compiler | Version: 8.0.1
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
| Unknown/Multiple
Type of failure: None/Unknown | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Rev(s):
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by Ben Gamari
participants (1)
-
GHC