Re: [GHC] #14672: Make likelyhood of branches/conditions available throughout the compiler.

26 Apr 2018

      #14672: Make likelyhood of branches/conditions available throughout the compiler.
-------------------------------------+-------------------------------------
        Reporter:  AndreasK          |                Owner:  (none)
            Type:  task              |               Status:  new
        Priority:  normal            |            Milestone:
       Component:  Compiler          |              Version:  8.2.2
      Resolution:                    |             Keywords:
Operating System:  Unknown/Multiple  |         Architecture:
                                     |  Unknown/Multiple
 Type of failure:  None/Unknown      |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:                    |  Differential Rev(s):  Phab:D4316
       Wiki Page:                    |  Phab:D4324 Phab:D4327
-------------------------------------+-------------------------------------

Comment (by AndreasK):

 I did some work on this for the Cmm/STG stages. But so far the results
 still don't justify the complexity. Although it's pretty clear which
 things should be tried before further judgment can be made.

 I did implement a rudimentary version which detected recursion as likely
 and bottoming branches as unlikely.
 Overall performance was underwhelming. While most nofib programs got
 slightly faster (0-1%) very few got a lot (order of ~5%) slower.

 What I did not do yet is propagte the likelyhood info into the asm passes
 which is probably required. Without this we get some awkward interactions.
 For example we might have code of the sort:

 `if checkError() then {goto panic} else {goto foo};`

 Assuming another block also jumps to `foo` there are two ways to compile
 this block:

 {{{
 check:
     cmp R1, 0
     jnz panic
     jmp foo

 check2:
     cmp R1, 0
     jz foo
     jmp panic
 }}}

 Performance of jumps to panic is essentially meaningless so we want
 variant two which saves an instruction in each loop.

 Blocklayout however turns this into a pessimization.

 We often get something like this:
 {{{
 check:
     cmp R1, 0
     jnz panic
     jmp foo
 bar: #<other block jumping to foo>
     few ins here
     #<shortcut>jmp foo
 foo:
     ins do things
 panic:
     some
     more
     instructions
 }}}
 This seems reasonable. If we are lucky bar is small enough that when we
 jump to foo we won't even miss the cache.

 But if we optimize the check block we suddenly get:
 {{{
 check2:
     cmp R1, 0
     jz foo
     #<shortcut> jmp panic #block layout assumes we will take this path
 panic:
     some
     more
     instructions
 bar: #<other block jumping to foo>
     few ins here
     #<shortcut>jmp foo
 foo:
     ins do things
 }}}
 Clearly we don't want this because it pulls check2 and foo far apart. But
 block layout assumes the last jump is the likely one so tries to place
 `panic` right after `check2`.
 If this is a loop there is a good chance that this causes a cache miss on
 each jump to foo.

 I did play around with the block layout code but without actually having
 likelyhood info I couldn't make it work better than what we have know.
 I do want to give lowering likelyhood info into the asm stage a try sooner
 or (probably) later since I would expect that to lead to much better
 code..

 However for now I ran out of steam before I came up with a good way to
 tackle this.

 Things that need to be still done.
 - Find a good layout algorithm which can make use of partial information.
 There are descriptions for algorithms which work with full information
 about block entry counts. But Implementing one which works well on partial
 information is something I couldn't find anything on yet. Rolling my own
 is not out of the question but seems like something for a few weeks and
 not a few days.
 * Chose a design for lowering likelyhood information to the asm level. How
 isn't yet clear. Annotate the instructions? Add information about blocks
 in a sidechannel? It's also not static since things like the register
 allocator also generate new blocks on the fly. Shortcutting might remove
 blocks and so on.
 * Besides block layout if we lower that information it should also be used
 by the register allocators.

-- 
Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14672#comment:14
GHC http://www.haskell.org/ghc/
The Glasgow Haskell Compiler