
On Sun, 14 Jan 2024, amindfv--- via Haskell-Cafe wrote:
I have a function that's called many (many) times, which involves a saturating addition (addition without overflow).
In other words, I'd like ((254 :: Word8) + 10) to equal 255 (which is maxBound::Word8), not 8 (which is what you get with overflow).
My current code, without particular regard to performance, is simply:
satAdd :: Word8 -> Word8 -> Word8 satAdd x y = let m = x + y in if m < x then 255 else m
This seems like the type of code somebody's already worked on the performance for, though I can't find much Haskell-specific in searches online.
LLVM is pretty good at optimizing such code snippets. You may first watch
what code it generates. I guess you do not have to apply manual
optimizations. But you might be curious how LLVM translates your code.
The CPU sets a flag when it encounters a carry (aka unsigned overflow).
Then it has an instruction for doing a conditional move. That is, even
without special saturated arithmetic instructions, unsigned saturated
addition can be done in two or three basic instructions.
LLVM also has a saturating add intrinsic:
https://llvm.org/docs/LangRef.html#saturation-arithmetic-intrinsics
x86 provides saturating addition in its SSE and AVX subsystems. They even
allow you to perform up to 64 byte additions in one instruction. I have
looked up that x86 has PADDUSB instruction.
Ok, lets watch LLVM:
$ cat saturated-add.ll
target triple = "x86_64-unknown-linux-gnu"
define i8 @satAdd(i8 %x, i8 %y) {
%m = add i8 %x, %y
%carry = icmp ult i8 %m, %x
%satSum = select i1 %carry, i8 255, i8 %m
ret i8 %satSum
}
$ llc-17