
the optimization manual ben mentions is
http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-...
it does a very good job laying out the scope of both *generality* and
*impact* for a huge range of systemsy tricks.
there are certain conditions when that transformation can be a win, but
they generally correspond with *inner loop* scenarios where bad branch
prediction heavily impacts performance. One class of algorithms that might
provide some grist for the optimization mill would be those in the hacker's
delight book http://www.hackersdelight.org/
I have written some code that has that bit fiddling vs bad branch
prediction encoding tradeoff before, i'll see if i can dig any of it up if
you'd like!
On Sun, Apr 19, 2015 at 10:41 AM, Ben Gamari
Joachim Breitner
writes: Dear devs,
in https://ghc.haskell.org/trac/ghc/ticket/10124 bgamari suggested that code like
f :: Int -> Bool f a = case a of 1 -> True 5 -> True 8 -> True 9 -> True 11 -> True 19 -> True _ -> False {-# NOINLINE f #-}
should not compile to a series of conditional jumps, but rather a branchless code akin to
let !p = a ==# 1 `orI#` a ==# 5 `orI#` a ==# 8 `orI#` a ==# 9 `orI#` a ==# 11 `orI#` a ==# 19 case p of 1 -> do_something 0 -> do_something_else
Hi Joachim,
Thanks for trying this and I'm sorry to hear that the optimization hasn't yielded the improvement I would have expected. I've been a bit silent on the issue as I've been working on finishing up my PhD. Hopefully in a couple of months after I'm settled in Germany I'll have time to delve into this more deeply.
Subsequently, I refactored the way we produce Cmm code from STG, opening the possibility to implement this optimization at that stage¹.
But when I then added this implementation, I could not measure any runtime improvement, see https://ghc.haskell.org/trac/ghc/ticket/10124#comment:15
So my question to the general audience: Is such branchless code really better than the current, branching code? Can someone provide us with an example that shows that it is better? Do I need to produce different branchless assembly?
Have you had a look at the Intel Optimization manual? I'll admit that I haven't looked myself but I'd imagine they'd probably have a few things to say about this.
Cheers,
- Ben
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs