
#9342: Branchless arithmetic operations -------------------------------------+------------------------------------- Reporter: hvr | Owner: Type: feature request | Status: new Priority: normal | Milestone: 8.0.1 Component: Compiler | Version: 7.8.3 (CodeGen) | Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): -------------------------------------+------------------------------------- Comment (by svenpanne): Re #7: Do we have real-world benchmarks for Atom/i3/i7/Xeon in ia32/x64 modes? ARM? ARM64? With and without branches? Without that, it's unclear if that's "effective use". My point (again) is: Being branchless in itself is a non-goal. Picking e.g. just https://github.com/gregorycollins/hashtables/blob/9e477b825a98e4f574a0e889e5... : This is a perfect example which will probably make things *slower*, depending on the availability of spare registers. It uses a handful of intermediate values, while the branching code uses none, and a single spill caused by the higher register pressure will be much, much more costly than anything else, especially ia32 sucks here. Furthermore, if surrounding code uses shifts and/or complicated addressing modes a lot, one will have to wait for the shifting unit to become available. This is all not theoretical, we had to revert to the straightforward code with branches in Chrome's V8 JavaScript JIT on some platforms/CPUs in similar places. Perhaps the code patterns in GHC-generated code are different, but the only way to know is to do benchmarking on wide variety of benchmarks and CPUs. Yes, that's a lot of work and needs some infrastructure, but without that, changes like this are just a shot in the dark. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9342#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler