Re: [GHC] #9661: Branchless ==# is compiled to branchy code

3 Oct 2014

      #9661: Branchless ==# is compiled to branchy code
-------------------------------------+-------------------------------------
              Reporter:  dfeuer      |            Owner:
                  Type:  feature     |           Status:  new
  request                            |        Milestone:
              Priority:  normal      |          Version:  7.9
             Component:  Compiler    |         Keywords:
            Resolution:              |     Architecture:  Unknown/Multiple
      Operating System:              |       Difficulty:  Unknown
  Unknown/Multiple                   |       Blocked By:
       Type of failure:  Runtime     |  Related Tickets:
  performance bug                    |
             Test Case:              |
              Blocking:              |
Differential Revisions:              |
-------------------------------------+-------------------------------------
Changes (by rwbarton):

 * type:  bug => feature request

Comment:

 I think GHC is doing the right thing here, or at least not the wrong
 thing.

 It's wrong to think of `(==#)` as "branchless equality". It's just a
 function whose value is `1#` if its arguments are equal, and `0#` if they
 aren't. This fact needs to be visible to GHC in order to optimize code
 like
 {{{#!hs
 f :: Int -> (Int,Int)
 f x = if x == 0        -- defined in terms of ==#
       then (x', 1)     -- here GHC can inline and then constant-fold x' to
 257
       else (x-1, x')
   where x' = 1000 * x + 257
 }}}
 and this is the purpose of the `litEq` rule. As a result, GHC may not
 output assembly containing the number of branches that you intended. This
 is the normal function of an optimizing compiler; if GHC was not free to
 replace your program with an equivalent one, then users would be forced to
 perform all kinds of optimizations by hand to get efficient code.

 I would note that this behavior is not specific to GHC. If you compile
 this C program with gcc (Debian 4.9.1-14) at optimization level `-O1` or
 higher
 {{{#!c
 void g();
 void h();
 void f(int x)
 {
   if ((x == 0) OR (x == 5))
     g();
   else
     h();
 }
 }}}
 you will get identical assembly output (with two branches, not an `or`
 instruction) whether you set `-DOR=|` or `-DOR=||`. So C does not give you
 control over branchlessness either.

 I understand that depending on the characteristics of the input, the
 "branchless" code may be faster (e.g. for the C function above, `x`
 randomly drawn from the set 0 (49%), 5 (49%), 10 (2%)). However, with
 certain other distributions of the input value, the "branchy" version is
 likely to be faster. So there is no right or wrong answer here.

 In general it takes a lot of engineering effort to get a compiler to
 reliably produce two different optimized outputs for two different
 programs that it can see are semantically identical. In this case maybe
 there is an easy solution: add a new `secretlyEquality#` which is the same
 as `(==#)`, but where that fact is hidden from GHC for long enough for
 `secretlyEquality#` to survive the stages that transform `(==#)`
 unscathed. This approach is intrinsically somewhat fragile (for example,
 the LLVM backend may introduce additional branches anyways), but it has
 the merit of ease of implementation.

 In any case, I would like to see some benchmarks showing that there are
 performance gains to be had in real programs from control over
 branchlessness before expending any effort on implementation.

--
Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9661#comment:7
GHC http://www.haskell.org/ghc/
The Glasgow Haskell Compiler