
Ian Lynagh wrote:
Hi all,
I was under the impression that simple code like the below, which swaps the endianness of a block of data, ought to be near C speed:
[...] poke p (shiftL x 24 .|. shiftL (x .&. 0xff00) 8 .|. (shiftR x 8 .&. 0xff00) .|. shiftR x 24) [...]
The problem here is that the shiftL and shiftR operations don't get inlined properly. They get replaced by a call to shift, but that doesn't get inlined. The shift function also wastes some more time by checking the sign of the shift amount. A few well-placed INLINE pragmas in the libraries might help.
Is there anything I can do to get better performance in this sort of code without resorting to calling out to C?
You could import some private GHC modules and use the primop directly: import GHC.Prim import GHC.Word main :: IO () main = do p <- mallocArray 104857600 foo p 104857600 shiftL (W32# a) (I# b) = W32# (shiftL# a b) shiftR (W32# a) (I# b) = W32# (shiftRL# a b) Using those instead of the standard ones speeds up the program a lot; be aware however that you shouldn't use negative shift amounts with those (undefined result, no checking). Cheers, Wolfgang