Re: [Haskell-cafe] Fastest saturating add?

14 Jan 2024

      On Sun, 14 Jan 2024, amindfv--- via Haskell-Cafe wrote:
...
I have a function that's called many (many) times, which involves a 
saturating addition (addition without overflow).
In other words, I'd like ((254 :: Word8) + 10) to equal 255 (which is 
maxBound::Word8), not 8 (which is what you get with overflow).
My current code, without particular regard to performance, is simply:
satAdd :: Word8 -> Word8 -> Word8
   satAdd x y =
      let m = x + y
      in if m < x then 255 else m
This seems like the type of code somebody's already worked on the 
performance for, though I can't find much Haskell-specific in searches 
online.
LLVM is pretty good at optimizing such code snippets. You may first watch 
what code it generates. I guess you do not have to apply manual 
optimizations. But you might be curious how LLVM translates your code.

The CPU sets a flag when it encounters a carry (aka unsigned overflow). 
Then it has an instruction for doing a conditional move. That is, even 
without special saturated arithmetic instructions, unsigned saturated 
addition can be done in two or three basic instructions.

LLVM also has a saturating add intrinsic:
    https://llvm.org/docs/LangRef.html#saturation-arithmetic-intrinsics

x86 provides saturating addition in its SSE and AVX subsystems. They even 
allow you to perform up to 64 byte additions in one instruction. I have 
looked up that x86 has PADDUSB instruction.

Ok, lets watch LLVM:

$ cat saturated-add.ll
target triple = "x86_64-unknown-linux-gnu"
define i8 @satAdd(i8 %x, i8 %y) {
   %m = add i8 %x, %y
   %carry = icmp ult i8 %m, %x
   %satSum = select i1 %carry, i8 255, i8 %m
   ret i8 %satSum
}

$ llc-17 

Re: [Haskell-cafe] Fastest saturating add?

Henning Thielemann