[GHC] #14372: CMM contains a bunch of tail-merging opportunities

#14372: CMM contains a bunch of tail-merging opportunities -------------------------------------+------------------------------------- Reporter: heisenbug | Owner: (none) Type: feature | Status: new request | Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Keywords: | Operating System: Unknown/Multiple Architecture: | Type of failure: None/Unknown Unknown/Multiple | Test Case: | Blocked By: Blocking: | Related Tickets: Differential Rev(s): | Wiki Page: -------------------------------------+------------------------------------- Compile this code to CMM {{{#!hs data Small = S1 | S2 | S3 | S4 deriving (Show, Enum) data Big = B1 | B2 | B3 | B4 | B5 | B6 | B7 | B8 | B9 | B10 deriving (Show, Enum) {-# NOINLINE quux #-} quux B1 = 'a' quux B2 = 'b' quux B3 = 'c' quux B4 = 'd' quux B5 = 'e' quux B6 = 'f' quux B7 = 'g' quux B8 = 'h' quux B9 = 'i' quux B10 = 'j' {-# NOINLINE qaax #-} qaax B1 = 'a' qaax B2 = 'b' qaax B3 = 'c' qaax B4 = 'd' qaax B5 = 'e' qaax B7 = 'g' qaax B8 = 'h' qaax B9 = 'i' qaax B10 = 'j' {-# NOINLINE foo #-} foo B1 = S1 foo B2 = S2 foo B3 = S3 foo B4 = S4 {-# NOINLINE bar #-} bar S1 = B1 bar S2 = B2 bar S3 = B3 bar S4 = B4 main = do print $ take 100000 (repeat (foo <$> [B1 .. B4])) print $ take 100000 (repeat (bar <$> [S1 .. S4])) print $ take 100000 (repeat (quux <$> [B1 .. B10])) print $ qaax B1 }}} When `Char` or ''enum-like'' ADT is returned, I see lots of case branches, which only differ in the first instruction. E.g. {{{ c30l: // global R1 = stg_CHARLIKE_closure+1649; Sp = Sp + 8; call (P64[Sp])(R1) args: 8, res: 0, upd: 8; c30m: // global R1 = stg_CHARLIKE_closure+1665; Sp = Sp + 8; call (P64[Sp])(R1) args: 8, res: 0, upd: 8; u30Z: // global if (_c30p::I64 < 9) goto c30n; else goto c30o; c30n: // global R1 = stg_CHARLIKE_closure+1681; Sp = Sp + 8; call (P64[Sp])(R1) args: 8, res: 0, upd: 8; c30o: // global R1 = stg_CHARLIKE_closure+1697; Sp = Sp + 8; call (P64[Sp])(R1) args: 8, res: 0, upd: 8; }}} It would be nice to factor out the common tails, e.g. by branching to the first tail already emitted. Bonus points for rewriting switch tables to contain the above numbers and compile to a lookup + common code. This is what I am talking about: {{{ c307: // global _s2ON::P64 = R1; _c30j::P64 = _s2ON::P64 & 7; switch [1 .. 7] _c30j::P64 { case 1 : goto c30d; case 2 : goto c30e; case 3 : goto c30f; case 4 : goto c30g; case 5 : goto c30h; ... } ... c30h: // global R1 = stg_CHARLIKE_closure+1617; Sp = Sp + 8; call (P64[Sp])(R1) args: 8, res: 0, upd: 8; c30g: // global R1 = stg_CHARLIKE_closure+1601; Sp = Sp + 8; call (P64[Sp])(R1) args: 8, res: 0, upd: 8; c30f: // global R1 = stg_CHARLIKE_closure+1585; Sp = Sp + 8; call (P64[Sp])(R1) args: 8, res: 0, upd: 8; c30e: // global R1 = stg_CHARLIKE_closure+1569; Sp = Sp + 8; call (P64[Sp])(R1) args: 8, res: 0, upd: 8; c30d: // global R1 = stg_CHARLIKE_closure+1553; Sp = Sp + 8; call (P64[Sp])(R1) args: 8, res: 0, upd: 8; }}} There should be an array [1553, 1569, 1585, ...] and each case should be the same: {{{ R1 = stg_CHARLIKE_closure; R1 = R1 + array[tag]; Sp = Sp + 8; call (P64[Sp])(R1) args: 8, res: 0, upd: 8; }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14372 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14372: CMM contains a bunch of tail-merging opportunities -------------------------------------+------------------------------------- Reporter: heisenbug | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by heisenbug): See also #14226. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14372#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

It would be nice to factor out the common tails, e.g. by branching to
#14372: CMM contains a bunch of tail-merging opportunities -------------------------------------+------------------------------------- Reporter: heisenbug | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by nomeata): the first tail already emitted. Branches can be expensive, so this might not be a clear win. Have you measured the effect? (You can do that without modifying GHC: Just get your hands on the assembly file, make the changes you want to see, and compare.) The other idea, detecting blocks that are equal up-to a constant, sounds fun. But again, the question is: Is it worth it? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14372#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14372: CMM contains a bunch of tail-merging opportunities -------------------------------------+------------------------------------- Reporter: heisenbug | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): Poorly predicted branches can indeed be expensive. However, I think here we are just taking about jumps which, as far as I know, are quite cheap assuming they don't boot you out of $I since they can be predicted perfectly. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14372#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14372: CMM contains a bunch of tail-merging opportunities -------------------------------------+------------------------------------- Reporter: heisenbug | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): Across an entire module I think there are quite a lot of code blocks that are byte-for-byte identical; i.e. no extra branch or "equal up to constant" required. I wonder if we could simply gather stats on the top 100 code-blocks, put them into the RTS, and use them whenever we come across one? I bet that'd be highly effective; but I do not have data to back up my claim. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14372#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14372: CMM contains a bunch of tail-merging opportunities -------------------------------------+------------------------------------- Reporter: heisenbug | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): I wonder what prior art exists in this area; I'm sure other compilers have considered this in the past. An interesting and loosely related bit of work that I have seen in the past is the work in the LLVM community on a technique they call outlining. See the [[https://www.youtube.com/watch?v=yorld-WSOeU|talk]] for details. This was pursued as a means of reducing code size and is no doubt more sophisticated than what Simon was looking for in comment:4, but I thought I'd leave the reference here nevertheless. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14372#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

It would be nice to factor out the common tails, e.g. by branching to
#14372: CMM contains a bunch of tail-merging opportunities -------------------------------------+------------------------------------- Reporter: heisenbug | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by heisenbug): Replying to [comment:2 nomeata]: the first tail already emitted.
Branches can be expensive, so this might not be a clear win. Have you
measured the effect? (You can do that without modifying GHC: Just get your hands on the assembly file, make the changes you want to see, and compare.) These will probably be short, known branches. Since more ''relevant'' code now can reside in the instruction cache, I expect less misses and no prediction failures.
The other idea, detecting blocks that are equal up-to a constant, sounds
fun. But again, the question is: Is it worth it? Well, it makes code straight-line and branchless. Eliminates jump-tables. Shrinks code size. Which one of these is a penalty? ;-) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14372#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14372: CMM contains a bunch of tail-merging opportunities -------------------------------------+------------------------------------- Reporter: heisenbug | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by heisenbug): `-fcmm-elim-common-blocks` helps a bit, but many equal ones are not caught: These hash the same {{{ hash_block c3dj 170 hash_block c3dp 170 hash_block c3dv 170 hash_block c3dB 170 }}} but are not commoned: {{{ ==================== Post switch plan ==================== {offset c3dj: // global R1 = R1 + 7; call (P64[(old + 8)])(R1) args: 8, res: 0, upd: 8; } ==================== Post switch plan ==================== {offset c3dp: // global R1 = R1 + 7; call (P64[(old + 8)])(R1) args: 8, res: 0, upd: 8; } ==================== Post switch plan ==================== {offset c3dv: // global R1 = R1 + 7; call (P64[(old + 8)])(R1) args: 8, res: 0, upd: 8; } ==================== Post switch plan ==================== {offset c3dB: // global R1 = R1 + 7; call (P64[(old + 8)])(R1) args: 8, res: 0, upd: 8; } }}} Maybe the reason is the "result register" updates are not considered in `CmmCommonBlockElim.hs`. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14372#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14372: CMM contains a bunch of tail-merging opportunities -------------------------------------+------------------------------------- Reporter: heisenbug | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by heisenbug): Replying to [comment:7 heisenbug]:
`-fcmm-elim-common-blocks` helps a bit, but many equal ones are not caught:
Turns out c-b-e is a local optimisation, i.e. per procedure. We need a global one to crack this. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14372#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14372: CMM contains a bunch of tail-merging opportunities -------------------------------------+------------------------------------- Reporter: heisenbug | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by heisenbug): Replying to [comment:8 heisenbug]:
Replying to [comment:7 heisenbug]:
`-fcmm-elim-common-blocks` helps a bit, but many equal ones are not caught:
Turns out c-b-e is a local optimisation, i.e. per procedure. We need a global one to crack this.
Okay I now have the beginnings of a global CMM C-B-E here: https://github.com/ggreif/ghc/tree/wip/global-cmm-cbe Feedback welcome! Some comments (for self) * better use `ST(Refs)` to not mess with GHC's `-j` mode (multi-module compilation) * `CmmProc`s should be transformed. Not yet done. Graph label to branch transformation is bogus, but keeping it as an example for now. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14372#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14372: CMM contains a bunch of tail-merging opportunities -------------------------------------+------------------------------------- Reporter: heisenbug | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): It's hard to review code without a spec, or even some input/output examples. And an overview Note to explain the goal and how the impl works would be useful. Based on a very quick look, it seems that you hope to branch from one `CmmProc` to a label in another. That's a huge thing to do, because it messes up all dataflow analyses, which expect to be able to see all the places that jump to a block (e.g. constant propagation). I'm 99% sure that you can't translate it into LLVM. Before you invest a lot more effort, can we debate the goal? There are others on ghc-devs who know about this back-end stuff. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14372#comment:10 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14372: CMM contains a bunch of tail-merging opportunities -------------------------------------+------------------------------------- Reporter: heisenbug | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by heisenbug): Replying to [comment:10 simonpj]:
It's hard to review code without a spec, or even some input/output examples.
And an overview Note to explain the goal and how the impl works would be useful.
Will do. I am in a very early stage :-)
Based on a very quick look, it seems that you hope to branch from one
`CmmProc` to a label in another. That's a huge thing to do, because it messes up all dataflow analyses, which expect to be able to see all the places that jump to a block (e.g. constant propagation). I'm 99% sure that you can't translate it into LLVM. You are correct. We can only branch to labels that we emit. So this probably will be a NCG-only thing (C-backend is off-limits too), unless we find a way to invoke the "outliner" in LLVM specifically. What I have observed that sometimes previously noted labels disappear due to optimisation. So I'll have to move the commoning to the end of the Cmm pipeline.
Before you invest a lot more effort, can we debate the goal? There are
others on ghc-devs who know about this back-end stuff. Sure. First and above all I am trying to learn and understand the inner workings of Cmm. Let me just get this working with bootstrap and we can discuss. I'll also need to gather some statistics. Thanks for the review, Simon! -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14372#comment:11 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14372: CMM contains a bunch of tail-merging opportunities -------------------------------------+------------------------------------- Reporter: heisenbug | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by nomeata): * cc: nomeata (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14372#comment:12 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14372: CMM contains a bunch of tail-merging opportunities -------------------------------------+------------------------------------- Reporter: heisenbug | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by jp.rider63): What's the status of this? I'd be interested in contributing to the implementation if it's helpful. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14372#comment:13 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14372: CMM contains a bunch of tail-merging opportunities -------------------------------------+------------------------------------- Reporter: heisenbug | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by heisenbug): Replying to [comment:13 jp.rider63]:
What's the status of this? I'd be interested in contributing to the implementation if it's helpful.
There is a branch in my GHC fork on github: https://github.com/ggreif/ghc/tree/wip/global-cmm-cbe It does not work with `ghc --make` because the assembly files contain local labels. Concatenating all assembly files might work, but I have not tried it yet. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14372#comment:14 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14372: CMM contains a bunch of tail-merging opportunities -------------------------------------+------------------------------------- Reporter: heisenbug | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by nomeata): “This” is ambiguous. The original ticket asks for commoning up “almost equal” locks (by introducing a lookup table). I believe @jp.rider63 refers to that. Later in the discussion, global CBE was brought up, which @heisenbug seems to be working on. I recommend to open a new ticket for global CBE. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14372#comment:15 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14372: CMM contains a bunch of tail-merging opportunities -------------------------------------+------------------------------------- Reporter: heisenbug | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by jp.rider63): Yes, I was slightly confused about the CBE discussion, but it seems somewhat related. FYI, this is my [https://github.com/bos/aeson/pull/599 motivating example]. The speedup I thought was happening due to eliminating common tails actually turned out to be from (lack of) inlining on someone else's machine. Either way, I think it makes sense to benchmark performance differences and see if this change is worthwhile. Even if performance is similar, it might be worth implementing to reduce code size. For my use case, one optimization that might help more is creating lookup tables when matching on `Word8`. Maybe I'll create another ticket for that in the future. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14372#comment:16 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

Poorly predicted branches can indeed be expensive. However, I think here we are just taking about jumps which, as far as I know, are quite cheap assuming they don't boot you out of $I since they can be predicted
#14372: CMM contains a bunch of tail-merging opportunities -------------------------------------+------------------------------------- Reporter: heisenbug | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by AndreasK): Replying to [comment:3 bgamari]: perfectly. I've run the code: {{{ {-# NOINLINE f_test #-} f_test :: Int ->Int f_test a = case a of 1 -> 11001 2 -> 11002 3 -> 11003 4 -> 11004 5 -> 11005 6 -> 11006 7 -> 11007 8 -> 11008 main = print . sum . map (f_test) $ (concat . replicate 9000000) [1..45::Int] }}} I did the transformation manually into: {{{ .Lc48I: cmpq $9,%r14 jge .Lc48z .Lu48L: cmpq $1,%r14 jl .Lc48z .Lu48M: movl .Ln48P-8(,%r14,8), %ebx jmp *(%rbp) .Lc48z: movq $-1,%rbx jmp *(%rbp) .section .rodata .Ln48P: .quad 11001 # 11002 .. 11007 .quad 11008 }}} Turning the jumps into a indirect mov instruction.
I wonder what prior art exists in this area; I'm sure other compilers have considered this in the past.
Gcc/clang do the same thing for switch statements. It improved speed, but not by much and depending on the codelayout I did it was possible to get worse performance in specific cases. But thats true for everything that changes the code layout and isn't a major win I assume. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14372#comment:17 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14372: CMM contains a bunch of tail-merging opportunities -------------------------------------+------------------------------------- Reporter: heisenbug | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by AndreasK): I think this can be solved in two steps: ---- Change the Cmm generation for case statements: Currently we generate cmm of the sort {{{ ... case 1: goto foo; ... foo: Register = Value return(Register) }}} Instead we could inline this when generating the switch. {{{ #include "Cmm.h" test_entry() { switch [1 .. 8] R2 { case 1 : {R1 = 0; return();} ... case 8 : {R1 = 8; return();} } } }}} From what I've seen this hopefully won't change the generated assembly since GHC does this already when compiling the switch in the cases I looked at. ---- The second step would be to change the Assembly generated to something like the listing in [comment:17 the comment above]. I imagine a way to do that would be to: * Check if * Assignments are into the same register. * The rest of the code is the same * Collect all the constants, put them into a lookup table and generate assembly that uses the lookup table like [comment:17 above]. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14372#comment:18 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14372: CMM contains a bunch of tail-merging opportunities -------------------------------------+------------------------------------- Reporter: heisenbug | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by michalt): * cc: michalt (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14372#comment:19 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14372: CMM contains a bunch of tail-merging opportunities -------------------------------------+------------------------------------- Reporter: heisenbug | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: CodeGen Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by simonpj): * keywords: => CodeGen Old description:
Compile this code to CMM {{{#!hs data Small = S1 | S2 | S3 | S4 deriving (Show, Enum)
data Big = B1 | B2 | B3 | B4 | B5 | B6 | B7 | B8 | B9 | B10 deriving (Show, Enum)
{-# NOINLINE quux #-} quux B1 = 'a' quux B2 = 'b' quux B3 = 'c' quux B4 = 'd' quux B5 = 'e' quux B6 = 'f' quux B7 = 'g' quux B8 = 'h' quux B9 = 'i' quux B10 = 'j'
{-# NOINLINE qaax #-} qaax B1 = 'a' qaax B2 = 'b' qaax B3 = 'c' qaax B4 = 'd' qaax B5 = 'e'
qaax B7 = 'g' qaax B8 = 'h' qaax B9 = 'i' qaax B10 = 'j'
{-# NOINLINE foo #-} foo B1 = S1 foo B2 = S2 foo B3 = S3 foo B4 = S4
{-# NOINLINE bar #-} bar S1 = B1 bar S2 = B2 bar S3 = B3 bar S4 = B4
main = do print $ take 100000 (repeat (foo <$> [B1 .. B4])) print $ take 100000 (repeat (bar <$> [S1 .. S4])) print $ take 100000 (repeat (quux <$> [B1 .. B10])) print $ qaax B1 }}}
When `Char` or ''enum-like'' ADT is returned, I see lots of case branches, which only differ in the first instruction.
E.g. {{{ c30l: // global R1 = stg_CHARLIKE_closure+1649; Sp = Sp + 8; call (P64[Sp])(R1) args: 8, res: 0, upd: 8; c30m: // global R1 = stg_CHARLIKE_closure+1665; Sp = Sp + 8; call (P64[Sp])(R1) args: 8, res: 0, upd: 8; u30Z: // global if (_c30p::I64 < 9) goto c30n; else goto c30o; c30n: // global R1 = stg_CHARLIKE_closure+1681; Sp = Sp + 8; call (P64[Sp])(R1) args: 8, res: 0, upd: 8; c30o: // global R1 = stg_CHARLIKE_closure+1697; Sp = Sp + 8; call (P64[Sp])(R1) args: 8, res: 0, upd: 8; }}}
It would be nice to factor out the common tails, e.g. by branching to the first tail already emitted.
Bonus points for rewriting switch tables to contain the above numbers and compile to a lookup + common code.
This is what I am talking about:
{{{ c307: // global _s2ON::P64 = R1; _c30j::P64 = _s2ON::P64 & 7; switch [1 .. 7] _c30j::P64 { case 1 : goto c30d; case 2 : goto c30e; case 3 : goto c30f; case 4 : goto c30g; case 5 : goto c30h; ... }
... c30h: // global R1 = stg_CHARLIKE_closure+1617; Sp = Sp + 8; call (P64[Sp])(R1) args: 8, res: 0, upd: 8; c30g: // global R1 = stg_CHARLIKE_closure+1601; Sp = Sp + 8; call (P64[Sp])(R1) args: 8, res: 0, upd: 8; c30f: // global R1 = stg_CHARLIKE_closure+1585; Sp = Sp + 8; call (P64[Sp])(R1) args: 8, res: 0, upd: 8; c30e: // global R1 = stg_CHARLIKE_closure+1569; Sp = Sp + 8; call (P64[Sp])(R1) args: 8, res: 0, upd: 8; c30d: // global R1 = stg_CHARLIKE_closure+1553; Sp = Sp + 8; call (P64[Sp])(R1) args: 8, res: 0, upd: 8; }}}
There should be an array [1553, 1569, 1585, ...] and each case should be the same: {{{ R1 = stg_CHARLIKE_closure; R1 = R1 + array[tag]; Sp = Sp + 8; call (P64[Sp])(R1) args: 8, res: 0, upd: 8; }}}
New description: Compile this code to CMM {{{#!hs data Small = S1 | S2 | S3 | S4 deriving (Show, Enum) data Big = B1 | B2 | B3 | B4 | B5 | B6 | B7 | B8 | B9 | B10 deriving (Show, Enum) {-# NOINLINE quux #-} quux B1 = 'a' quux B2 = 'b' quux B3 = 'c' quux B4 = 'd' quux B5 = 'e' quux B6 = 'f' quux B7 = 'g' quux B8 = 'h' quux B9 = 'i' quux B10 = 'j' {-# NOINLINE qaax #-} qaax B1 = 'a' qaax B2 = 'b' qaax B3 = 'c' qaax B4 = 'd' qaax B5 = 'e' qaax B7 = 'g' qaax B8 = 'h' qaax B9 = 'i' qaax B10 = 'j' {-# NOINLINE foo #-} foo B1 = S1 foo B2 = S2 foo B3 = S3 foo B4 = S4 {-# NOINLINE bar #-} bar S1 = B1 bar S2 = B2 bar S3 = B3 bar S4 = B4 main = do print $ take 100000 (repeat (foo <$> [B1 .. B4])) print $ take 100000 (repeat (bar <$> [S1 .. S4])) print $ take 100000 (repeat (quux <$> [B1 .. B10])) print $ qaax B1 }}} When `Char` or ''enum-like'' ADT is returned, I see lots of case branches, which only differ in the first instruction. E.g. {{{ c30l: // global R1 = stg_CHARLIKE_closure+1649; Sp = Sp + 8; call (P64[Sp])(R1) args: 8, res: 0, upd: 8; c30m: // global R1 = stg_CHARLIKE_closure+1665; Sp = Sp + 8; call (P64[Sp])(R1) args: 8, res: 0, upd: 8; u30Z: // global if (_c30p::I64 < 9) goto c30n; else goto c30o; c30n: // global R1 = stg_CHARLIKE_closure+1681; Sp = Sp + 8; call (P64[Sp])(R1) args: 8, res: 0, upd: 8; c30o: // global R1 = stg_CHARLIKE_closure+1697; Sp = Sp + 8; call (P64[Sp])(R1) args: 8, res: 0, upd: 8; }}} It would be nice to factor out the common tails, e.g. by branching to the first tail already emitted. Bonus points for rewriting switch tables to contain the above numbers and compile to a lookup + common code. This is what I am talking about: {{{ c307: // global _s2ON::P64 = R1; _c30j::P64 = _s2ON::P64 & 7; switch [1 .. 7] _c30j::P64 { case 1 : goto c30d; case 2 : goto c30e; case 3 : goto c30f; case 4 : goto c30g; case 5 : goto c30h; ... } ... c30h: // global R1 = stg_CHARLIKE_closure+1617; Sp = Sp + 8; call (P64[Sp])(R1) args: 8, res: 0, upd: 8; c30g: // global R1 = stg_CHARLIKE_closure+1601; Sp = Sp + 8; call (P64[Sp])(R1) args: 8, res: 0, upd: 8; c30f: // global R1 = stg_CHARLIKE_closure+1585; Sp = Sp + 8; call (P64[Sp])(R1) args: 8, res: 0, upd: 8; c30e: // global R1 = stg_CHARLIKE_closure+1569; Sp = Sp + 8; call (P64[Sp])(R1) args: 8, res: 0, upd: 8; c30d: // global R1 = stg_CHARLIKE_closure+1553; Sp = Sp + 8; call (P64[Sp])(R1) args: 8, res: 0, upd: 8; }}} There should be an array [1553, 1569, 1585, ...] and each case should be the same: {{{ R1 = stg_CHARLIKE_closure; R1 = R1 + array[tag]; Sp = Sp + 8; call (P64[Sp])(R1) args: 8, res: 0, upd: 8; }}} See also #14666 -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14372#comment:20 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14372: CMM contains a bunch of tail-merging opportunities -------------------------------------+------------------------------------- Reporter: heisenbug | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: CodeGen Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #14666 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by AndreasK): * cc: AndreasK (added) * related: => #14666 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14372#comment:21 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14372: CMM contains a bunch of tail-merging opportunities -------------------------------------+------------------------------------- Reporter: heisenbug | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: CodeGen Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #14666 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by sjakobi): * cc: sjakobi (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14372#comment:22 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC