
#14372: CMM contains a bunch of tail-merging opportunities -------------------------------------+------------------------------------- Reporter: heisenbug | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: CodeGen Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by simonpj): * keywords: => CodeGen Old description:
Compile this code to CMM {{{#!hs data Small = S1 | S2 | S3 | S4 deriving (Show, Enum)
data Big = B1 | B2 | B3 | B4 | B5 | B6 | B7 | B8 | B9 | B10 deriving (Show, Enum)
{-# NOINLINE quux #-} quux B1 = 'a' quux B2 = 'b' quux B3 = 'c' quux B4 = 'd' quux B5 = 'e' quux B6 = 'f' quux B7 = 'g' quux B8 = 'h' quux B9 = 'i' quux B10 = 'j'
{-# NOINLINE qaax #-} qaax B1 = 'a' qaax B2 = 'b' qaax B3 = 'c' qaax B4 = 'd' qaax B5 = 'e'
qaax B7 = 'g' qaax B8 = 'h' qaax B9 = 'i' qaax B10 = 'j'
{-# NOINLINE foo #-} foo B1 = S1 foo B2 = S2 foo B3 = S3 foo B4 = S4
{-# NOINLINE bar #-} bar S1 = B1 bar S2 = B2 bar S3 = B3 bar S4 = B4
main = do print $ take 100000 (repeat (foo <$> [B1 .. B4])) print $ take 100000 (repeat (bar <$> [S1 .. S4])) print $ take 100000 (repeat (quux <$> [B1 .. B10])) print $ qaax B1 }}}
When `Char` or ''enum-like'' ADT is returned, I see lots of case branches, which only differ in the first instruction.
E.g. {{{ c30l: // global R1 = stg_CHARLIKE_closure+1649; Sp = Sp + 8; call (P64[Sp])(R1) args: 8, res: 0, upd: 8; c30m: // global R1 = stg_CHARLIKE_closure+1665; Sp = Sp + 8; call (P64[Sp])(R1) args: 8, res: 0, upd: 8; u30Z: // global if (_c30p::I64 < 9) goto c30n; else goto c30o; c30n: // global R1 = stg_CHARLIKE_closure+1681; Sp = Sp + 8; call (P64[Sp])(R1) args: 8, res: 0, upd: 8; c30o: // global R1 = stg_CHARLIKE_closure+1697; Sp = Sp + 8; call (P64[Sp])(R1) args: 8, res: 0, upd: 8; }}}
It would be nice to factor out the common tails, e.g. by branching to the first tail already emitted.
Bonus points for rewriting switch tables to contain the above numbers and compile to a lookup + common code.
This is what I am talking about:
{{{ c307: // global _s2ON::P64 = R1; _c30j::P64 = _s2ON::P64 & 7; switch [1 .. 7] _c30j::P64 { case 1 : goto c30d; case 2 : goto c30e; case 3 : goto c30f; case 4 : goto c30g; case 5 : goto c30h; ... }
... c30h: // global R1 = stg_CHARLIKE_closure+1617; Sp = Sp + 8; call (P64[Sp])(R1) args: 8, res: 0, upd: 8; c30g: // global R1 = stg_CHARLIKE_closure+1601; Sp = Sp + 8; call (P64[Sp])(R1) args: 8, res: 0, upd: 8; c30f: // global R1 = stg_CHARLIKE_closure+1585; Sp = Sp + 8; call (P64[Sp])(R1) args: 8, res: 0, upd: 8; c30e: // global R1 = stg_CHARLIKE_closure+1569; Sp = Sp + 8; call (P64[Sp])(R1) args: 8, res: 0, upd: 8; c30d: // global R1 = stg_CHARLIKE_closure+1553; Sp = Sp + 8; call (P64[Sp])(R1) args: 8, res: 0, upd: 8; }}}
There should be an array [1553, 1569, 1585, ...] and each case should be the same: {{{ R1 = stg_CHARLIKE_closure; R1 = R1 + array[tag]; Sp = Sp + 8; call (P64[Sp])(R1) args: 8, res: 0, upd: 8; }}}
New description: Compile this code to CMM {{{#!hs data Small = S1 | S2 | S3 | S4 deriving (Show, Enum) data Big = B1 | B2 | B3 | B4 | B5 | B6 | B7 | B8 | B9 | B10 deriving (Show, Enum) {-# NOINLINE quux #-} quux B1 = 'a' quux B2 = 'b' quux B3 = 'c' quux B4 = 'd' quux B5 = 'e' quux B6 = 'f' quux B7 = 'g' quux B8 = 'h' quux B9 = 'i' quux B10 = 'j' {-# NOINLINE qaax #-} qaax B1 = 'a' qaax B2 = 'b' qaax B3 = 'c' qaax B4 = 'd' qaax B5 = 'e' qaax B7 = 'g' qaax B8 = 'h' qaax B9 = 'i' qaax B10 = 'j' {-# NOINLINE foo #-} foo B1 = S1 foo B2 = S2 foo B3 = S3 foo B4 = S4 {-# NOINLINE bar #-} bar S1 = B1 bar S2 = B2 bar S3 = B3 bar S4 = B4 main = do print $ take 100000 (repeat (foo <$> [B1 .. B4])) print $ take 100000 (repeat (bar <$> [S1 .. S4])) print $ take 100000 (repeat (quux <$> [B1 .. B10])) print $ qaax B1 }}} When `Char` or ''enum-like'' ADT is returned, I see lots of case branches, which only differ in the first instruction. E.g. {{{ c30l: // global R1 = stg_CHARLIKE_closure+1649; Sp = Sp + 8; call (P64[Sp])(R1) args: 8, res: 0, upd: 8; c30m: // global R1 = stg_CHARLIKE_closure+1665; Sp = Sp + 8; call (P64[Sp])(R1) args: 8, res: 0, upd: 8; u30Z: // global if (_c30p::I64 < 9) goto c30n; else goto c30o; c30n: // global R1 = stg_CHARLIKE_closure+1681; Sp = Sp + 8; call (P64[Sp])(R1) args: 8, res: 0, upd: 8; c30o: // global R1 = stg_CHARLIKE_closure+1697; Sp = Sp + 8; call (P64[Sp])(R1) args: 8, res: 0, upd: 8; }}} It would be nice to factor out the common tails, e.g. by branching to the first tail already emitted. Bonus points for rewriting switch tables to contain the above numbers and compile to a lookup + common code. This is what I am talking about: {{{ c307: // global _s2ON::P64 = R1; _c30j::P64 = _s2ON::P64 & 7; switch [1 .. 7] _c30j::P64 { case 1 : goto c30d; case 2 : goto c30e; case 3 : goto c30f; case 4 : goto c30g; case 5 : goto c30h; ... } ... c30h: // global R1 = stg_CHARLIKE_closure+1617; Sp = Sp + 8; call (P64[Sp])(R1) args: 8, res: 0, upd: 8; c30g: // global R1 = stg_CHARLIKE_closure+1601; Sp = Sp + 8; call (P64[Sp])(R1) args: 8, res: 0, upd: 8; c30f: // global R1 = stg_CHARLIKE_closure+1585; Sp = Sp + 8; call (P64[Sp])(R1) args: 8, res: 0, upd: 8; c30e: // global R1 = stg_CHARLIKE_closure+1569; Sp = Sp + 8; call (P64[Sp])(R1) args: 8, res: 0, upd: 8; c30d: // global R1 = stg_CHARLIKE_closure+1553; Sp = Sp + 8; call (P64[Sp])(R1) args: 8, res: 0, upd: 8; }}} There should be an array [1553, 1569, 1585, ...] and each case should be the same: {{{ R1 = stg_CHARLIKE_closure; R1 = R1 + array[tag]; Sp = Sp + 8; call (P64[Sp])(R1) args: 8, res: 0, upd: 8; }}} See also #14666 -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14372#comment:20 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler