Re: [GHC] #9661: Branchless ==# is compiled to branchy code

3 Oct 2014

      ...
I think GHC is doing the right thing here, or at least not the wrong
#9661: Branchless ==# is compiled to branchy code
-------------------------------------+-------------------------------------
              Reporter:  dfeuer      |            Owner:
                  Type:  feature     |           Status:  new
  request                            |        Milestone:
              Priority:  normal      |          Version:  7.9
             Component:  Compiler    |         Keywords:
            Resolution:              |     Architecture:  Unknown/Multiple
      Operating System:              |       Difficulty:  Unknown
  Unknown/Multiple                   |       Blocked By:
       Type of failure:  Runtime     |  Related Tickets:
  performance bug                    |
             Test Case:              |
              Blocking:              |
Differential Revisions:              |
-------------------------------------+-------------------------------------

Comment (by dfeuer):

 Replying to [comment:7 rwbarton]:
 thing.
...
It's wrong to think of `(==#)` as "branchless equality". It's just a
function whose value is `1#` if its arguments are equal, and `0#` if they
 aren't. This fact needs to be visible to GHC in order to optimize code
 like
 ...
...
and this is the purpose of the `litEq` rule.
I wonder if there's another way to accomplish that, but see below.
...
As a result, GHC may not output assembly containing the number of
 branches that you intended. This is the normal function of an optimizing
 compiler; if GHC was not free to replace your program with an equivalent
 one, then users would be forced to perform all kinds of optimizations by
 hand to get efficient code.
...
I understand that depending on the characteristics of the input, the
 "branchless" code may be faster (e.g. for the C function above, `x`
 randomly drawn from the set 0 (49%), 5 (49%), 10 (2%)). However, with
 certain other distributions of the input value, the "branchy" version is
...
In any case, I would like to see some benchmarks showing that there are
With the exception of some bottom-of-the-stack `base` code with module
 dependency nightmares, people who use such primops are usually trying to
 do something specific for performance reasons, and we should be careful
 not to interfere excessively if we can avoid doing so. Your preferred
 `==#` can be implemented in terms of mine, I believe, like this:

 {{{#!hs
 bartonEq x y = case feuerEq x y of
                  1# -> 1#
                  _  -> 0#
 }}}

 Going the other way around, however, is impossible. If you prefer, we can
 give `feuerEq` a different name so current code will continue to work the
 same.

 likely to be faster. So there is no right or wrong answer here.

 Of course.

 performance gains to be had in real programs from control over
 branchlessness before expending any effort on implementation.

 That ship has long since sailed—the massive breaking primBool change was
 undertaken specifically to support this sort of thing.

--
Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9661#comment:8
GHC http://www.haskell.org/ghc/
The Glasgow Haskell Compiler