
#9661: Branchless ==# is compiled to branchy code -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: Type: feature | Status: new request | Milestone: Priority: normal | Version: 7.9 Component: Compiler | Keywords: Resolution: | Architecture: Unknown/Multiple Operating System: | Difficulty: Unknown Unknown/Multiple | Blocked By: Type of failure: Runtime | Related Tickets: performance bug | Test Case: | Blocking: | Differential Revisions: | -------------------------------------+------------------------------------- Changes (by rwbarton): * type: bug => feature request Comment: I think GHC is doing the right thing here, or at least not the wrong thing. It's wrong to think of `(==#)` as "branchless equality". It's just a function whose value is `1#` if its arguments are equal, and `0#` if they aren't. This fact needs to be visible to GHC in order to optimize code like {{{#!hs f :: Int -> (Int,Int) f x = if x == 0 -- defined in terms of ==# then (x', 1) -- here GHC can inline and then constant-fold x' to 257 else (x-1, x') where x' = 1000 * x + 257 }}} and this is the purpose of the `litEq` rule. As a result, GHC may not output assembly containing the number of branches that you intended. This is the normal function of an optimizing compiler; if GHC was not free to replace your program with an equivalent one, then users would be forced to perform all kinds of optimizations by hand to get efficient code. I would note that this behavior is not specific to GHC. If you compile this C program with gcc (Debian 4.9.1-14) at optimization level `-O1` or higher {{{#!c void g(); void h(); void f(int x) { if ((x == 0) OR (x == 5)) g(); else h(); } }}} you will get identical assembly output (with two branches, not an `or` instruction) whether you set `-DOR=|` or `-DOR=||`. So C does not give you control over branchlessness either. I understand that depending on the characteristics of the input, the "branchless" code may be faster (e.g. for the C function above, `x` randomly drawn from the set 0 (49%), 5 (49%), 10 (2%)). However, with certain other distributions of the input value, the "branchy" version is likely to be faster. So there is no right or wrong answer here. In general it takes a lot of engineering effort to get a compiler to reliably produce two different optimized outputs for two different programs that it can see are semantically identical. In this case maybe there is an easy solution: add a new `secretlyEquality#` which is the same as `(==#)`, but where that fact is hidden from GHC for long enough for `secretlyEquality#` to survive the stages that transform `(==#)` unscathed. This approach is intrinsically somewhat fragile (for example, the LLVM backend may introduce additional branches anyways), but it has the merit of ease of implementation. In any case, I would like to see some benchmarks showing that there are performance gains to be had in real programs from control over branchlessness before expending any effort on implementation. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9661#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler