[GHC] #15124: Improve block layout for the NCG

5 May 2018

      #15124: Improve block layout for the NCG
-------------------------------------+-------------------------------------
           Reporter:  AndreasK       |             Owner:  (none)
               Type:  task           |            Status:  new
           Priority:  normal         |         Milestone:  8.6.1
          Component:  Compiler       |           Version:  8.2.2
  (NCG)                              |
           Keywords:  CodeGen        |  Operating System:  Unknown/Multiple
       Architecture:                 |   Type of failure:  None/Unknown
  Unknown/Multiple                   |
          Test Case:                 |        Blocked By:
           Blocking:                 |   Related Tickets:
Differential Rev(s):                 |         Wiki Page:
-------------------------------------+-------------------------------------
 The current codegen tries to place two blocks A and B next to each other
 if they are "close".

 = Drawbacks of the current algorithm

 Blocks are only considered close if one jumps to the other via a
 unconditional jump instructions.
 This is a fast and reasonable approximation. But we might be able to do
 better.

 == Calls are not considered even if they always return to the same block.

 For example it's clear that if foo == bar we jump to C here:
 {{{
 A:
         movq $block_C_info,-16(%rbp)
         cmp foo,bar
         jne D
 B:
         jmp *(%rbx) #Returns to C

         ...

 .align 8
         .infotable
 block_C_info:
 C:
         doThings
 }}}

 == Conditional jumps are never considered.

 The following block either jumps directly to c3mQ or returns there from a
 call.
 Yet that is completely ignored by the current algorithm.

 {{{
 _c3mG:
         movq $block_c3mQ_info,-8(%rbp)
         ...
         jne _c3mQ
         jmp *(%rbx) #returns to c3mQ
 }}}

 == Likelyhood of branches is not considered.

 We track information about branches which are rarely taken at the cmm
 level.
 However this information is currently lost during the Cmm -> Asm
 transformation
 which happens before we do code layout.

 This means for Cmm code where one branch is never executed the branch
 that's never
 executed might be the only one considered relevant.

 This happens for example for this stack check:
 {{{
     if ((Sp + -40) >= SpLim) (likely: True) goto c5PS; else goto c5Q3;
 }}}

 = Potential gains

 If nothing else this would allow some Cmm optimizations which are made
 useless by the current code layout algorithm.
 I have hit cases where the block layout algorithm turned optimizations
 into pessimizations of 2-5% by pulling obvious loops apart.

 Given that the scenario that leads to loops being pulled apart doesn't
 seem that unlikely I assume some loops are also pulled apart like this in
 HEAD.

 Cleaning this up ideally would also make it possible to move blocks which
 are not part of loops out of them.

-- 
Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15124
GHC http://www.haskell.org/ghc/
The Glasgow Haskell Compiler