[GHC] #8321: improve basic block layout on LLVM backend by predicting stack/heap checks

17 Sep 2013

      #8321: improve basic block layout on LLVM backend by predicting stack/heap checks
------------------------------------+-------------------------------------
       Reporter:  rwbarton          |             Owner:
           Type:  feature request   |            Status:  new
       Priority:  normal            |         Milestone:  7.10.1
      Component:  Compiler (LLVM)   |           Version:  7.7
       Keywords:                    |  Operating System:  Unknown/Multiple
   Architecture:  Unknown/Multiple  |   Type of failure:  None/Unknown
     Difficulty:  Unknown           |         Test Case:
     Blocked By:                    |          Blocking:
Related Tickets:                    |
------------------------------------+-------------------------------------
 Currently we don't give the LLVM optimizer any information about which
 branch of an `if` is likely to be taken. As a result, the optimizer is
 likely to produce a basic block layout which is not optimal. Improving the
 layout can improve performance through better instruction cache usage and
 better branch prediction by the hardware.

 We can control LLVM's idea of what is likely with the `llvm.expect`
 intrinsic function. Some obvious branches which we can predict accurately
 are the stack and heap checks that appear near the entry of many
 functions.

 Here's a small example of some Cmm code and the output of the LLVM
 optimizer/compiler.

 {{{
  block_c2Lc_entry() //  [R1, R2]
          { info_tbl: [(c2Lc,
                        label: block_c2Lc_info
                        rep:StackRep [])]
            stack_info: arg_space: 0 updfr_space: Nothing
          }
      {offset
        c2Lc:
            Hp = Hp + 24;
            if (Hp > HpLim) goto c2Lm; else goto c2Ll;
        c2Lm:
            HpAlloc = 24;
            R2 = R2;
            R1 = R1;
            call stg_gc_pp(R2, R1) args: 8, res: 8, upd: 24;
        c2Ll:
            I64[Hp - 16] = :_con_info;
            P64[Hp - 8] = R1;
            P64[Hp] = R2;
            R1 = Hp - 14;
            Sp = Sp + 8;
            call (P64[Sp])(R1) args: 24, res: 0, upd: 24;
      }
  }]
 }}}

 {{{

 00000000000002b8 :
  2b8:   4c 89 e0                mov    %r12,%rax
  2bb:   4c 8d 60 18             lea    0x18(%rax),%r12
  2bf:   4d 3b a5 18 01 00 00    cmp    0x118(%r13),%r12
  2c6:   76 10                   jbe    2d8 
  2c8:   49 c7 85 48 01 00 00    movq   $0x18,0x148(%r13)
  2cf:   18 00 00 00
  2d3:   e9 00 00 00 00          jmpq   2d8 
                         2d4: R_X86_64_PC32
 stg_gc_pp+0xfffffffffffffffc
  2d8:   48 c7 40 08 00 00 00    movq   $0x0,0x8(%rax)
  2df:   00
                         2dc: R_X86_64_32S
 ghczmprim_GHCziTypes_ZC_con_info
  2e0:   48 89 58 10             mov    %rbx,0x10(%rax)
  2e4:   4c 89 70 18             mov    %r14,0x18(%rax)
  2e8:   48 8b 45 08             mov    0x8(%rbp),%rax
  2ec:   48 83 c5 08             add    $0x8,%rbp
  2f0:   49 8d 5c 24 f2          lea    -0xe(%r12),%rbx
  2f5:   ff e0                   jmpq   *%rax
  2f7:   90                      nop
 }}}

 It would likely be better to invert the branch at `2c6` to a `ja`, so that
 the common case is adjacent to the function entry, and when `llvm.expect`
 is used on the condition in the branch, the LLVM optimizer does produce
 this alternate layout.

-- 
Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8321
GHC http://www.haskell.org/ghc/
The Glasgow Haskell Compiler