I've gotten substantial speedups (>30%) in the past (e.g. in the hashtables package) by replacing search loops with branchless code, but the technique works best in situations where you have a long run of straight-line integer code for the CPU to chew on; for branchless to be a win you need to feed enough instructions into the pipeline such that the savings of getting 3 instructions per cycle and avoiding paying for branch mispredictions outweighs the savings you would get from not doing the work. I don't think the use case you posted falls into that category.

On Sun, Apr 19, 2015 at 12:44 AM, Joachim Breitner <mail@joachim-breitner.de> wrote:
Dear devs,

in  https://ghc.haskell.org/trac/ghc/ticket/10124 bgamari suggested that
code like

        f :: Int -> Bool
        f a = case a of
                1  -> True
                5  -> True
                8  -> True
                9  -> True
                11 -> True
                19 -> True
                _  -> False
        {-# NOINLINE f #-}

should not compile to a series of conditional jumps, but rather a
branchless code akin to

        let !p = a ==# 1 `orI#` a ==# 5 `orI#` a ==# 8 `orI#` a ==# 9
                         `orI#` a ==# 11 `orI#` a ==# 19
        case p of
          1 -> do_something
          0 -> do_something_else

Subsequently, I refactored the way we produce Cmm code from STG, opening
the possibility to implement this optimization at that stage¹.

But when I then added this implementation, I could not measure any
runtime improvement, see
https://ghc.haskell.org/trac/ghc/ticket/10124#comment:15

So my question to the general audience: Is such branchless code really
better than the current, branching code? Can someone provide us with an
example that shows that it is better? Do I need to produce different
branchless assembly?

If it is not better, I can again refactor the switch generation and
simplify it a bit again.

Hmm, only now I see that rwbarton has replied there. Not sure why I
missed that. Anyways, more voices are welcome :-)

Greetings,
Joachim

¹ This should not preclude an implementation on the Core level, which
SPJ preferred. But I improved other aspects of the code generation as
well, so this is worthwhile on its own.

--
Joachim “nomeata” Breitner
  mail@joachim-breitner.dehttp://www.joachim-breitner.de/
  Jabber: nomeata@joachim-breitner.de  • GPG-Key: 0xF0FBF51F
  Debian Developer: nomeata@debian.org

_______________________________________________
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs




--
Gregory Collins <greg@gregorycollins.net>