[GHC] #14222: Simple text fusion example results in rather duplicative code

#14222: Simple text fusion example results in rather duplicative code -------------------------------------+------------------------------------- Reporter: bgamari | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Keywords: | Operating System: Unknown/Multiple Architecture: | Type of failure: None/Unknown Unknown/Multiple | Test Case: | Blocked By: Blocking: | Related Tickets: Differential Rev(s): | Wiki Page: -------------------------------------+------------------------------------- Consider this program, {{{#!hs module T14221 where import Data.Text as T isNumeric :: Text -> Bool isNumeric t = T.all isNumeric' t && T.any isNumber t where isNumber c = '0' <= c && c <= '9' isNumeric' c = isNumber c || c == 'e' || c == 'E' || c == '.' || c == '-' || c == '+' }}} Compiling with `-O` results in over 1 kilobyte of assembler. After looking at the simplified Core the issue becomes apparent: the case analysis of `isNumeric'` is duplicated six times, {{{#!hs case ww4_X2zB of { __DEFAULT -> GHC.Types.False; '+'# -> jump $wloop_all_s2z2 (GHC.Prim.+# ww3_s2z0 2#); '-'# -> jump $wloop_all_s2z2 (GHC.Prim.+# ww3_s2z0 2#); '.'# -> jump $wloop_all_s2z2 (GHC.Prim.+# ww3_s2z0 2#); 'E'# -> jump $wloop_all_s2z2 (GHC.Prim.+# ww3_s2z0 2#); 'e'# -> jump $wloop_all_s2z2 (GHC.Prim.+# ww3_s2z0 2#) }; }}} It seems to me that we would ideally try to share this bit. Of course, this may be quite tricky to do in practice. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14222 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14222: Simple text fusion example results in rather duplicative code
-------------------------------------+-------------------------------------
Reporter: bgamari | Owner: (none)
Type: bug | Status: new
Priority: normal | Milestone:
Component: Compiler | Version: 8.2.1
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
| Unknown/Multiple
Type of failure: None/Unknown | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Rev(s):
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by bgamari):
By contrast, a relatively naive translation to C compiled with `gcc -O0`
requires less than half the code,
{{{#!c
#include

#14222: Simple text fusion example results in rather duplicative code -------------------------------------+------------------------------------- Reporter: bgamari | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by nomeata): Shouldn’t the common block elimination on the CMM level take care of that kind of duplication? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14222#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

Shouldn’t the common block elimination on the CMM level take care of
#14222: Simple text fusion example results in rather duplicative code -------------------------------------+------------------------------------- Reporter: bgamari | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): Replying to [comment:2 nomeata]: that kind of duplication? Indeed I would have thought so but that is not what I observe. Initially I thought the problem was that the blocks differed in their source note ticks (which arise through unfoldings since I'm using the DWARF-enabled 8.2.1 binary distribution). I suspect that CBE (and indeed perhaps CSE in Core and STG) should learn to deal properly with such ticks. However, after switching back to a non-DWARF-enabled 8.0.2 build I found that I could still easily spot duplicate blocks. For instance, with the program above I see six blocks of the form, {{{#!asm block_c6K1_info: _c6K1: movq 7(%rbx),%r14 movq 8(%rbp),%rbx addq $16,%rbp jmp $wloop_all_s6CQ_info }}} If these were common'd up then that would also allow for a number of additional blocks which are currently distinct to be themselves common'd up (assuming our CBE pass runs to a fixed point). All-in-all it looks like something is broken in CBE. I've opened #14226 to track this. After this is fixed I should also make sure that CBE isn't defeated by the presence of source notes. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14222#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14222: Simple text fusion example results in rather duplicative code -------------------------------------+------------------------------------- Reporter: bgamari | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): I agree that CBE should nail it, so let's do #14226. But still it'd be even better to do it in Core. As you'll see in `Note [Combine identical alternatives]` in `CoreUtils` we do some combining of identical alternatives, but only in the very convenient case where one of them is the DEFAULT, which is not the case here. I suppose we'd like {{{ case ww4_X2zB of __DEFAULT -> GHC.Types.False; '+'# -> jump $wloop_all_s2z2 (GHC.Prim.+# ww3_s2z0 2#); '-'# -> jump $wloop_all_s2z2 (GHC.Prim.+# ww3_s2z0 2#); '.'# -> jump $wloop_all_s2z2 (GHC.Prim.+# ww3_s2z0 2#); 'E'# -> jump $wloop_all_s2z2 (GHC.Prim.+# ww3_s2z0 2#); 'e'# -> jump $wloop_all_s2z2 (GHC.Prim.+# ww3_s2z0 2#) ===> join j = jump $wloop_all_s2z2 (GHC.Prim.+# ww3_s2z0 2#) in case ww4_X2zB of __DEFAULT -> GHC.Types.False; '+'# -> jump j '-'# -> jump j '.'# -> jump j 'E'# -> jump j 'e'# -> jump j }}} You could imagine trying that. It's a variant of something I've wanted to try for some time to get better CSE. Suppose we have {{{ f (x+y) (g (x+y)) }}} Currently we don't CSE the two `(x+y)` expressions. Suppose we did this: 1. Convert to ANF, by let-binding every sub-expression {{{ f (let a1 = x+y in a1) (let a2 = let a3 = x+y in g a3 in a2) }}} 2. Float out aggressively {{{ let a1 = x+y in let a3 = x+y in let a2 = g a3 in f a1 a2 }}} 3. Do CSE exactly as now {{{ let a1 = x+y in let a3 = a1 in let a2 = g a3 in f a1 a2 }}} 4. Now inline as usual {{{ let a1 = x+1 in f a1 (g a1) }}} which is what we want. I like this 4-step plan because only step 1 is new: the other passes exist already. It's a bit brutal but it would do the job. And (provided we did this for join-points too) it would nail the example nicely. Anyone want to have a go? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14222#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14222: Simple text fusion example results in rather duplicative code -------------------------------------+------------------------------------- Reporter: bgamari | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: CSE Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by simonpj): * keywords: => CSE -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14222#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14222: Simple text fusion example results in rather duplicative code -------------------------------------+------------------------------------- Reporter: bgamari | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: CSE Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): I should have been a bit more clear; the duplication that I was pointing out wasn't in the case alternatives (although that duplication is perhaps also worth thinking about). Rather, I was pointing out that the **entire case expression** is duplicated six times at various points in the program. I'll attach the simplified Core in a moment. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14222#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14222: Simple text fusion example results in rather duplicative code -------------------------------------+------------------------------------- Reporter: bgamari | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: CSE Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * Attachment "T14221.dump-simpl" added. Simplified Core from example given in ticket description -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14222 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14222: Simple text fusion example results in rather duplicative code -------------------------------------+------------------------------------- Reporter: bgamari | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: CSE Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): Last night I looked further into why we end up duplicating the expession in question: There are two ways by which we end up duplicating the case analysis. == First Duplication == First, relatively early in simplification we end up rewriting `isNumeric'` to (after inlining `(||)`), {{{#!hs isNumeric'_s2z6 :: Char -> Bool isNumeric'_s2z6 = \ (eta_B1 :: Char) -> case eta_B1 of { GHC.Types.C# c2_a2sP -> case GHC.Prim.leChar# '0'# c2_a2sP of { __DEFAULT -> case c2_a2sP of { __DEFAULT -> GHC.Types.False; '+'# -> GHC.Types.True; '-'# -> GHC.Types.True; '.'# -> GHC.Types.True; 'E'# -> GHC.Types.True; 'e'# -> GHC.Types.True }; 1# -> case GHC.Prim.leChar# c2_a2sP '9'# of { __DEFAULT -> case c2_a2sP of { __DEFAULT -> GHC.Types.False; '+'# -> GHC.Types.True; '-'# -> GHC.Types.True; '.'# -> GHC.Types.True; 'E'# -> GHC.Types.True; 'e'# -> GHC.Types.True }; 1# -> GHC.Types.True } } } }}} Note the two instances of `case c2_a2sP of {...}`; this comes about via unconditional post-inlining since the scrutinee is trivial (being a `Var`). This makes me wonder whether there is any benefit for `postInlineUnconditional` to inline join points. == Second Duplication == Then, later in simplification we duplicate `isNumeric'` three times (resulting in six instances of the `case` analysis). This happens when we start with this Core after worker/wrapper, {{{#!hs join { $w$j_s2Cy :: GHC.Prim.Char# -> Int -> Bool $w$j_s2Cy (ww_s2Cw :: GHC.Prim.Char#) (w_s2Ct :: Int) = let { w_s2Cs :: Char w_s2Cs = GHC.Types.C# ww_s2Cw } in let { x_a2vL :: Char x_a2vL = w_s2Cs } in let { s'_a2vM :: Int s'_a2vM = w_s2Ct } in case x_a2vL of { GHC.Types.C# c2_a2t0 -> join { $j_s2yJ :: Bool $j_s2yJ = case c2_a2t0 of { __DEFAULT -> GHC.Types.False; '+'# -> jump loop_all_a2vd s'_a2vM; '-'# -> jump loop_all_a2vd s'_a2vM; '.'# -> jump loop_all_a2vd s'_a2vM; 'E'# -> jump loop_all_a2vd s'_a2vM; 'e'# -> jump loop_all_a2vd s'_a2vM } } in case GHC.Prim.leChar# '0'# c2_a2t0 of { __DEFAULT -> jump $j_s2yJ; 1# -> case GHC.Prim.leChar# c2_a2t0 '9'# of { __DEFAULT -> jump $j_s2yJ; 1# -> jump loop_all_a2vd s'_a2vM } } } } in join { -- the wrapper for the above worker $j_s2At :: Char -> Int -> Bool $j_s2At (w_s2Cs :: Char) (w_s2Ct :: Int) = case w_s2Cs of ww_s2Cv { GHC.Types.C# ww_s2Cw -> jump $w$j_s2Cy ww_s2Cw w_s2Ct } } in {- ... a large expression with three call-sites of $j_s2At -} }}} In the post-worker-wrapper simplification step we then immediately inline the wrappers into all of the call-sites, {{{ Considering inlining: $j_s2At arg infos [ValueArg, ValueArg] interesting continuation BoringCtxt some_benefit True is exp: True is work-free: True guidance ALWAYS_IF(arity=2,unsat_ok=True,boring_ok=False) ANSWER = YES Inlining done: $j_s2At Inlined fn: \ (w_s2Cs :: GHC.Types.Char) (w_s2Ct :: GHC.Types.Int) -> case w_s2Cs of { GHC.Types.C# ww_s2Cw -> jump $w$j ww_s2Cw w_s2Ct } Cont: ApplyToVal nodup (GHC.Types.C# (GHC.Prim.chr# (GHC.Prim.word2Int# r#))) ApplyToVal nodup (GHC.Types.I# (GHC.Prim.+# ipv 1#)) Stop[BoringCtxt] GHC.Types.Bool }}} and in each case inline the workers soon thereafter, {{{ Considering inlining: $w$j_s2Cy arg infos [NonTrivArg, ValueArg] interesting continuation BoringCtxt some_benefit True is exp: True is work-free: True guidance IF_ARGS [126 120] 190 0 discounted size = -35 ANSWER = YES Inlining done: $w$j Inlined fn: \ (ww_s2Cw :: GHC.Prim.Char#) (w_s2Ct :: GHC.Types.Int) -> join { $j_s2yJ :: GHC.Types.Bool $j_s2yJ = case ww_s2Cw of { __DEFAULT -> GHC.Types.False; '+'# -> case w_s2Ct of { GHC.Types.I# ww_X2Dh -> jump $wloop_all ww_X2Dh }; '-'# -> case w_s2Ct of { GHC.Types.I# ww_X2Dh -> jump $wloop_all ww_X2Dh }; '.'# -> case w_s2Ct of { GHC.Types.I# ww_X2Dh -> jump $wloop_all ww_X2Dh }; 'E'# -> case w_s2Ct of { GHC.Types.I# ww_X2Dh -> jump $wloop_all ww_X2Dh }; 'e'# -> case w_s2Ct of { GHC.Types.I# ww_X2Dh -> jump $wloop_all ww_X2Dh } } } in case GHC.Prim.leChar# '0'# ww_s2Cw of { __DEFAULT -> jump $j_s2yJ; 1# -> case GHC.Prim.leChar# ww_s2Cw '9'# of { __DEFAULT -> jump $j_s2yJ; 1# -> case w_s2Ct of { GHC.Types.I# ww_X2Dk -> jump $wloop_all ww_X2Dk } } } Cont: ApplyToVal nodup ww_s2Cw ApplyToVal nodup w_s2Ct Stop[BoringCtxt] GHC.Types.Bool }}} We consequently end up with two identical and one nearly identical copies of the same rather large block of code. It's not immediately clear to me how we can determine whether this latter duplication is fruitful. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14222#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14222: Simple text fusion example results in rather duplicative code -------------------------------------+------------------------------------- Reporter: bgamari | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: CSE Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): simonpj, regarding comment:4: It seems to me like there is a real threat of introducing inappropriate sharing in such a scheme, given that full laziness already tends to produce quite some [[http://www.well-typed.com/blog/2016/09/sharing- conduit/|headaches]]. Do you think CSE is currently cautious enough to avoid these sorts of blowups? Perhaps we would want to be slightly less aggressive in floating out bindings which the compiler introduced during ANF-ization? Of course, this is all hypothetical; I suppose the best way to find out is to try. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14222#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14222: Simple text fusion example results in rather duplicative code -------------------------------------+------------------------------------- Reporter: bgamari | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: CSE Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Phab:D3990 Wiki Page: | -------------------------------------+------------------------------------- Changes (by simonpj): * differential: => Phab:D3990 Comment: Phab:D3990 is start on comment:4 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14222#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC