Re: [GHC] #9342: Branchless arithmetic operations

28 Sep 2015

      #9342: Branchless arithmetic operations
-------------------------------------+-------------------------------------
        Reporter:  hvr               |                Owner:
            Type:  feature request   |               Status:  new
        Priority:  normal            |            Milestone:  8.0.1
       Component:  Compiler          |              Version:  7.8.3
  (CodeGen)                          |
      Resolution:                    |             Keywords:
Operating System:  Unknown/Multiple  |         Architecture:
                                     |  Unknown/Multiple
 Type of failure:  None/Unknown      |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:                    |  Differential Rev(s):
-------------------------------------+-------------------------------------

Comment (by gregorycollins):

 I measured, yes, but not across processors: when I was working on this I
 optimized for 64-bit i7 (probably Sandy Bridge IIRC). The version of mask
 you linked to with the funky branchless code was definitively faster on
 that chip vs. the simpler alternative:

 {{{
 mask# :: Int# -> Int# -> Int#
 mask# !a# !b# = let !(I# z#) = fromEnum (a# ==# b#)
                     !q#      = negateInt# z#
                 in q#
 }}}

 (of course this is from when `==#` returned `Bool` rather than `Int#`).

 The difference was about 15-20% IIRC. Unfortunately I've lost the raw
 numbers, sorry, but as Sven points out they'd be useless anyways towards
 determining how good the change is in aggregate. Quite willing to believe
 that code could be a pessimization on ia32.

--
Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9342#comment:10
GHC http://www.haskell.org/ghc/
The Glasgow Haskell Compiler