[GHC] #13569: Optimise code-gen for field-updates of large products

13 Apr 2017

      #13569: Optimise code-gen for field-updates of large products
-------------------------------------+-------------------------------------
           Reporter:  hvr            |             Owner:  (none)
               Type:  task           |            Status:  new
           Priority:  normal         |         Milestone:
          Component:  Compiler       |           Version:  8.0.1
  (CodeGen)                          |
           Keywords:                 |  Operating System:  Unknown/Multiple
       Architecture:                 |   Type of failure:  Other
  Unknown/Multiple                   |
          Test Case:                 |        Blocked By:
           Blocking:                 |   Related Tickets:
Differential Rev(s):                 |         Wiki Page:
-------------------------------------+-------------------------------------
 Here's something we talked about last year which I wanted to look into but
 then got distracted (but I'm writing this one up now so it doesn't get
 lost again):

 Consider large field updates such as

 {{{#!hs
 module BigRec where

 data T = C
     { fld01 :: !Int
     , fld02 :: !Int
     , fld03 :: Int
     , fld04 :: Int
     , fld05 :: Int
     , fld06 :: Int
     , fld07 :: Int
     , fld08 :: Int
     , fld09 :: Int
     , fld10 :: Int
     , fld11 :: Int
     , fld12 :: Int
     , fld13 :: Int
     , fld14 :: Int
     , fld15 :: Int
     , fld16 :: Int
     , fld17 :: Int
     , fld18 :: Int
     , fld19 :: Int
     }

 inc01 x = x { fld01 = (fld01 x) + 1 }

 inc0102 x = x { fld01 = (fld01 x) + 1, fld02 = (fld02 x) + 1 }
 }}}

 These get translated into long chains of assignments, which in turn result
 in respectively long chains of register<->memory move instructions being
 generated by the NCG

 {{{#!c
 BigRec.inc0102_entry() //  [R2]
         { [(c1em,
             block_c1em_info:
                 const 0;
                 const 31;),
            (c1ep,
             BigRec.inc0102_info:
                 const 4294967301;
                 const 0;
                 const 15;)]
         }
     {offset
       c1ep:
           if ((Sp + -8) < SpLim) goto c1ez; else goto c1eA;
       c1ez:
           // nop
           R1 = BigRec.inc0102_closure;
           call (I64[BaseReg - 8])(R2, R1) args: 8, res: 0, upd: 8;
       c1eA:
           I64[Sp - 8] = block_c1em_info;
           R1 = R2;
           Sp = Sp - 8;
           if (R1 & 7 != 0) goto c1em; else goto c1en;
       c1en:
           call (I64[R1])(R1) returns to c1em, args: 8, res: 8, upd: 8;
       c1em:
           Hp = Hp + 160;
           if (Hp > I64[BaseReg + 856]) goto c1eD; else goto c1eC;
       c1eD:
           I64[BaseReg + 904] = 160;
           // nop
           call stg_gc_unpt_r1(R1) returns to c1em, args: 8, res: 8, upd:
 8;
       c1eC:
           _s15F::P64 = P64[R1 + 7];
           _s15G::P64 = P64[R1 + 15];
           _s15H::P64 = P64[R1 + 23];
           _s15I::P64 = P64[R1 + 31];
           _s15J::P64 = P64[R1 + 39];
           _s15K::P64 = P64[R1 + 47];
           _s15L::P64 = P64[R1 + 55];
           _s15M::P64 = P64[R1 + 63];
           _s15N::P64 = P64[R1 + 71];
           _s15O::P64 = P64[R1 + 79];
           _s15P::P64 = P64[R1 + 87];
           _s15Q::P64 = P64[R1 + 95];
           _s15R::P64 = P64[R1 + 103];
           _s15S::P64 = P64[R1 + 111];
           _s15T::P64 = P64[R1 + 119];
           _s15U::P64 = P64[R1 + 127];
           _s15V::P64 = P64[R1 + 135];
           _s15X::I64 = I64[R1 + 151] + 1;
           _s15W::I64 = I64[R1 + 143] + 1;
           I64[Hp - 152] = BigRec.C_con_info;
           P64[Hp - 144] = _s15F::P64;
           P64[Hp - 136] = _s15G::P64;
           P64[Hp - 128] = _s15H::P64;
           P64[Hp - 120] = _s15I::P64;
           P64[Hp - 112] = _s15J::P64;
           P64[Hp - 104] = _s15K::P64;
           P64[Hp - 96] = _s15L::P64;
           P64[Hp - 88] = _s15M::P64;
           P64[Hp - 80] = _s15N::P64;
           P64[Hp - 72] = _s15O::P64;
           P64[Hp - 64] = _s15P::P64;
           P64[Hp - 56] = _s15Q::P64;
           P64[Hp - 48] = _s15R::P64;
           P64[Hp - 40] = _s15S::P64;
           P64[Hp - 32] = _s15T::P64;
           P64[Hp - 24] = _s15U::P64;
           P64[Hp - 16] = _s15V::P64;
           I64[Hp - 8] = _s15W::I64;
           I64[Hp] = _s15X::I64;
           R1 = Hp - 151;
 }}}

 This happens in a few places inside GHC's source-tree (where we have
 large-ish products, such as DynFlags where we update single fields), which
 is where I noticed this while looking at generated ASM output.

--
Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13569
GHC http://www.haskell.org/ghc/
The Glasgow Haskell Compiler