
I think GHC is doing the right thing here, or at least not the wrong
#9661: Branchless ==# is compiled to branchy code -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: Type: feature | Status: new request | Milestone: Priority: normal | Version: 7.9 Component: Compiler | Keywords: Resolution: | Architecture: Unknown/Multiple Operating System: | Difficulty: Unknown Unknown/Multiple | Blocked By: Type of failure: Runtime | Related Tickets: performance bug | Test Case: | Blocking: | Differential Revisions: | -------------------------------------+------------------------------------- Comment (by dfeuer): Replying to [comment:7 rwbarton]: thing.
It's wrong to think of `(==#)` as "branchless equality". It's just a
function whose value is `1#` if its arguments are equal, and `0#` if they aren't. This fact needs to be visible to GHC in order to optimize code like ...
and this is the purpose of the `litEq` rule.
I wonder if there's another way to accomplish that, but see below.
As a result, GHC may not output assembly containing the number of branches that you intended. This is the normal function of an optimizing compiler; if GHC was not free to replace your program with an equivalent one, then users would be forced to perform all kinds of optimizations by hand to get efficient code.
I understand that depending on the characteristics of the input, the "branchless" code may be faster (e.g. for the C function above, `x` randomly drawn from the set 0 (49%), 5 (49%), 10 (2%)). However, with certain other distributions of the input value, the "branchy" version is
In any case, I would like to see some benchmarks showing that there are
With the exception of some bottom-of-the-stack `base` code with module dependency nightmares, people who use such primops are usually trying to do something specific for performance reasons, and we should be careful not to interfere excessively if we can avoid doing so. Your preferred `==#` can be implemented in terms of mine, I believe, like this: {{{#!hs bartonEq x y = case feuerEq x y of 1# -> 1# _ -> 0# }}} Going the other way around, however, is impossible. If you prefer, we can give `feuerEq` a different name so current code will continue to work the same. likely to be faster. So there is no right or wrong answer here. Of course. performance gains to be had in real programs from control over branchlessness before expending any effort on implementation. That ship has long since sailed—the massive breaking primBool change was undertaken specifically to support this sort of thing. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9661#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler