
#9342: Branchless arithmetic operations -------------------------------------+------------------------------------- Reporter: hvr | Owner: Type: feature request | Status: new Priority: normal | Milestone: 8.0.1 Component: Compiler | Version: 7.8.3 (CodeGen) | Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): -------------------------------------+------------------------------------- Comment (by gregorycollins): I measured, yes, but not across processors: when I was working on this I optimized for 64-bit i7 (probably Sandy Bridge IIRC). The version of mask you linked to with the funky branchless code was definitively faster on that chip vs. the simpler alternative: {{{ mask# :: Int# -> Int# -> Int# mask# !a# !b# = let !(I# z#) = fromEnum (a# ==# b#) !q# = negateInt# z# in q# }}} (of course this is from when `==#` returned `Bool` rather than `Int#`). The difference was about 15-20% IIRC. Unfortunately I've lost the raw numbers, sorry, but as Sven points out they'd be useless anyways towards determining how good the change is in aggregate. Quite willing to believe that code could be a pessimization on ia32. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9342#comment:10 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler