FFI-free NaN checks? (isDoubleNan and friends)

Hi, Recently at a client I was profiling some code and isDoubleNaN lit up. We were checking a lot of doubles for NaN as that's what customer would send in. I went to investigate and I found that FFI is used to achieve this. I was always under the impression that FFI costs a little. I had at the time replaced the code with a hack with great results: ``` isNaN' :: Double -> Bool isNaN' d = d /= d ``` While this worked and provided good speedup in my case, this fails catastrophically if the program is compiled with -ffast-math. This is expected. I have since reverted it. Seeking an alternative solution I have thought about re-implementing the C code with a native Haskell version: after all it just checks a few bits. Apparently unsafeCoerce# and friends were a big no-no but I found https://phabricator.haskell.org/D3358 . I have implemented the code at the bottom of this post. Obviously it's missing endianness (compile-time switch). This seems to be faster for smaller `mkInput` list than Prelude.isNaN but slower slightly on the one below. The `/=` version is the fastest but very fragile. My question to you all is whether implementing a version of this function in Haskell makes sense and if not, why not? The stgDoubleToWord64 is implemented in CMM and I don't know anything about the costs of that. * Is there a cheaper alternative to FFI way? * If yes, does anyone know how to write it such that it compiles to same code but without the call overhead? I must have failed below as it's slower on some inputs. Basically if a faster way exists for isNaN, something I have to do a lot, I'd love to hear about it. I leave you with basic code I managed to come up with. 8.4.x only. ``` {-# LANGUAGE MagicHash #-} {-# OPTIONS_GHC -O2 -ddump-simpl -ddump-stg -ddump-to-file -ddump-asm #-} module Main (main) where import GHC.Float import GHC.Prim isNaN' :: Double -> Bool isNaN' d = d /= d isNaNBits :: Double -> Bool isNaNBits (D# d) = case (bits `and#` expMask) `eqWord#` expMask of 1# -> case bits `and#` mantissaMask of 0## -> False _ -> True _ -> False where bits :: Word# bits = stgDoubleToWord64 d expMask, mantissaMask :: Word# expMask = 0x7FF0000000000000## mantissaMask = 0x000FFFFFFFFFFFFF## main :: IO () main = sumFilter isNaN {-isNaN'-} {-isNaNBits-} (mkInput 100000000) `seq` pure () where nan :: Double nan = log (-1) mkInput :: Int -> [Double] mkInput n = take n $ cycle [1, nan] sumFilter :: (Double -> Bool) -> [Double] -> Double sumFilter p = Prelude.sum . Prelude.filter (not . p) ``` -- Mateusz K.

If the FFI version is done with "safe", consider using "unsafe" instead.
There are technical reasons why this is slightly incorrect, but unless
you're fiddling with the CPU's FP control flags they're mostly irrelevant
and you can treat isNaN as pure and non-side-effectful, significantly
reducing the overhead. You may also be able to use "ccall" to take
advantage of C compiler level optimizations, or simply to directly invoke a
CPU-based test with asm(); but you'll need to hide that in a C preprocessor
#define, so that it looks syntactically like a function call to the FFI.
(One of the technical reasons is that various OSes have been known to
introduce bugs in their FP register and state handling across system calls,
in which case the "safe" version may turn "complete FP chaos" into merely
"wrong answer". It's your call whether, or which side, of this bothers you.)
On Mon, Mar 5, 2018 at 4:53 PM, Mateusz Kowalczyk
Hi,
Recently at a client I was profiling some code and isDoubleNaN lit up. We were checking a lot of doubles for NaN as that's what customer would send in.
I went to investigate and I found that FFI is used to achieve this. I was always under the impression that FFI costs a little. I had at the time replaced the code with a hack with great results:
``` isNaN' :: Double -> Bool isNaN' d = d /= d ```
While this worked and provided good speedup in my case, this fails catastrophically if the program is compiled with -ffast-math. This is expected. I have since reverted it. Seeking an alternative solution I have thought about re-implementing the C code with a native Haskell version: after all it just checks a few bits. Apparently unsafeCoerce# and friends were a big no-no but I found https://phabricator.haskell.org/D3358 . I have implemented the code at the bottom of this post. Obviously it's missing endianness (compile-time switch).
This seems to be faster for smaller `mkInput` list than Prelude.isNaN but slower slightly on the one below. The `/=` version is the fastest but very fragile.
My question to you all is whether implementing a version of this function in Haskell makes sense and if not, why not? The stgDoubleToWord64 is implemented in CMM and I don't know anything about the costs of that.
* Is there a cheaper alternative to FFI way? * If yes, does anyone know how to write it such that it compiles to same code but without the call overhead? I must have failed below as it's slower on some inputs.
Basically if a faster way exists for isNaN, something I have to do a lot, I'd love to hear about it.
I leave you with basic code I managed to come up with. 8.4.x only.
``` {-# LANGUAGE MagicHash #-} {-# OPTIONS_GHC -O2 -ddump-simpl -ddump-stg -ddump-to-file -ddump-asm #-} module Main (main) where
import GHC.Float import GHC.Prim
isNaN' :: Double -> Bool isNaN' d = d /= d
isNaNBits :: Double -> Bool isNaNBits (D# d) = case (bits `and#` expMask) `eqWord#` expMask of 1# -> case bits `and#` mantissaMask of 0## -> False _ -> True _ -> False where bits :: Word# bits = stgDoubleToWord64 d
expMask, mantissaMask :: Word# expMask = 0x7FF0000000000000## mantissaMask = 0x000FFFFFFFFFFFFF##
main :: IO () main = sumFilter isNaN {-isNaN'-} {-isNaNBits-} (mkInput 100000000) `seq` pure () where nan :: Double nan = log (-1)
mkInput :: Int -> [Double] mkInput n = take n $ cycle [1, nan]
sumFilter :: (Double -> Bool) -> [Double] -> Double sumFilter p = Prelude.sum . Prelude.filter (not . p) ```
-- Mateusz K. _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
-- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net

On 03/05/2018 10:23 PM, Brandon Allbery wrote:
If the FFI version is done with "safe", consider using "unsafe" instead. There are technical reasons why this is slightly incorrect, but unless you're fiddling with the CPU's FP control flags they're mostly irrelevant and you can treat isNaN as pure and non-side-effectful, significantly reducing the overhead. You may also be able to use "ccall" to take advantage of C compiler level optimizations, or simply to directly invoke a CPU-based test with asm(); but you'll need to hide that in a C preprocessor #define, so that it looks syntactically like a function call to the FFI.
(One of the technical reasons is that various OSes have been known to introduce bugs in their FP register and state handling across system calls, in which case the "safe" version may turn "complete FP chaos" into merely "wrong answer". It's your call whether, or which side, of this bothers you.)
Perhaps I was a little unclear. The FFI-using isDoubleNaN is something GHC does! ``` libraries/base/GHC/Float.hs:foreign import ccall unsafe "isDoubleNaN" isDoubleNaN :: Double -> Int ``` ``` HsInt isDoubleNaN(HsDouble d) { union stg_ieee754_dbl u; u.d = d; return ( u.ieee.exponent == 2047 /* 2^11 - 1 */ && /* Is the exponent all ones? */ (u.ieee.mantissa0 != 0 || u.ieee.mantissa1 != 0) /* and the mantissa non-zero? */ ); } ``` My question is whether it could do better by not doing FFI and instead computing natively and if not, why not?
On Mon, Mar 5, 2018 at 4:53 PM, Mateusz Kowalczyk
wrote: Hi,
Recently at a client I was profiling some code and isDoubleNaN lit up. We were checking a lot of doubles for NaN as that's what customer would send in.
I went to investigate and I found that FFI is used to achieve this. I was always under the impression that FFI costs a little. I had at the time replaced the code with a hack with great results:
``` isNaN' :: Double -> Bool isNaN' d = d /= d ```
While this worked and provided good speedup in my case, this fails catastrophically if the program is compiled with -ffast-math. This is expected. I have since reverted it. Seeking an alternative solution I have thought about re-implementing the C code with a native Haskell version: after all it just checks a few bits. Apparently unsafeCoerce# and friends were a big no-no but I found https://phabricator.haskell.org/D3358 . I have implemented the code at the bottom of this post. Obviously it's missing endianness (compile-time switch).
This seems to be faster for smaller `mkInput` list than Prelude.isNaN but slower slightly on the one below. The `/=` version is the fastest but very fragile.
My question to you all is whether implementing a version of this function in Haskell makes sense and if not, why not? The stgDoubleToWord64 is implemented in CMM and I don't know anything about the costs of that.
* Is there a cheaper alternative to FFI way? * If yes, does anyone know how to write it such that it compiles to same code but without the call overhead? I must have failed below as it's slower on some inputs.
Basically if a faster way exists for isNaN, something I have to do a lot, I'd love to hear about it.
I leave you with basic code I managed to come up with. 8.4.x only.
``` {-# LANGUAGE MagicHash #-} {-# OPTIONS_GHC -O2 -ddump-simpl -ddump-stg -ddump-to-file -ddump-asm #-} module Main (main) where
import GHC.Float import GHC.Prim
isNaN' :: Double -> Bool isNaN' d = d /= d
isNaNBits :: Double -> Bool isNaNBits (D# d) = case (bits `and#` expMask) `eqWord#` expMask of 1# -> case bits `and#` mantissaMask of 0## -> False _ -> True _ -> False where bits :: Word# bits = stgDoubleToWord64 d
expMask, mantissaMask :: Word# expMask = 0x7FF0000000000000## mantissaMask = 0x000FFFFFFFFFFFFF##
main :: IO () main = sumFilter isNaN {-isNaN'-} {-isNaNBits-} (mkInput 100000000) `seq` pure () where nan :: Double nan = log (-1)
mkInput :: Int -> [Double] mkInput n = take n $ cycle [1, nan]
sumFilter :: (Double -> Bool) -> [Double] -> Double sumFilter p = Prelude.sum . Prelude.filter (not . p) ```
-- Mateusz K. _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
-- Mateusz K.

I'd in general expect good C code to optimize a little better; and in
particular, decomposing an IEEE float is almost certainly more expensive in
Haskell than in C, because unions let you cheat. (And I recall looking at
the implementation of decodeFloat once; it's significantly longer than that
C.) But I have to wonder if that code would be better done with something
more native; the implementation may be a portable default, and you might be
able to find something x86-specific that is faster.
On Tue, Mar 6, 2018 at 5:35 AM, Mateusz Kowalczyk
If the FFI version is done with "safe", consider using "unsafe" instead. There are technical reasons why this is slightly incorrect, but unless you're fiddling with the CPU's FP control flags they're mostly irrelevant and you can treat isNaN as pure and non-side-effectful, significantly reducing the overhead. You may also be able to use "ccall" to take advantage of C compiler level optimizations, or simply to directly invoke a CPU-based test with asm(); but you'll need to hide that in a C
On 03/05/2018 10:23 PM, Brandon Allbery wrote: preprocessor
#define, so that it looks syntactically like a function call to the FFI.
(One of the technical reasons is that various OSes have been known to introduce bugs in their FP register and state handling across system calls, in which case the "safe" version may turn "complete FP chaos" into merely "wrong answer". It's your call whether, or which side, of this bothers you.)
Perhaps I was a little unclear. The FFI-using isDoubleNaN is something GHC does!
``` libraries/base/GHC/Float.hs:foreign import ccall unsafe "isDoubleNaN" isDoubleNaN :: Double -> Int ``` ``` HsInt isDoubleNaN(HsDouble d) { union stg_ieee754_dbl u;
u.d = d;
return ( u.ieee.exponent == 2047 /* 2^11 - 1 */ && /* Is the exponent all ones? */ (u.ieee.mantissa0 != 0 || u.ieee.mantissa1 != 0) /* and the mantissa non-zero? */ ); } ```
My question is whether it could do better by not doing FFI and instead computing natively and if not, why not?
On Mon, Mar 5, 2018 at 4:53 PM, Mateusz Kowalczyk < fuuzetsu@fuuzetsu.co.uk> wrote:
Hi,
Recently at a client I was profiling some code and isDoubleNaN lit up. We were checking a lot of doubles for NaN as that's what customer would send in.
I went to investigate and I found that FFI is used to achieve this. I was always under the impression that FFI costs a little. I had at the time replaced the code with a hack with great results:
``` isNaN' :: Double -> Bool isNaN' d = d /= d ```
While this worked and provided good speedup in my case, this fails catastrophically if the program is compiled with -ffast-math. This is expected. I have since reverted it. Seeking an alternative solution I have thought about re-implementing the C code with a native Haskell version: after all it just checks a few bits. Apparently unsafeCoerce# and friends were a big no-no but I found https://phabricator.haskell.org/D3358 . I have implemented the code at the bottom of this post. Obviously it's missing endianness (compile-time switch).
This seems to be faster for smaller `mkInput` list than Prelude.isNaN but slower slightly on the one below. The `/=` version is the fastest but very fragile.
My question to you all is whether implementing a version of this function in Haskell makes sense and if not, why not? The stgDoubleToWord64 is implemented in CMM and I don't know anything about the costs of that.
* Is there a cheaper alternative to FFI way? * If yes, does anyone know how to write it such that it compiles to same code but without the call overhead? I must have failed below as it's slower on some inputs.
Basically if a faster way exists for isNaN, something I have to do a lot, I'd love to hear about it.
I leave you with basic code I managed to come up with. 8.4.x only.
``` {-# LANGUAGE MagicHash #-} {-# OPTIONS_GHC -O2 -ddump-simpl -ddump-stg -ddump-to-file -ddump-asm #-} module Main (main) where
import GHC.Float import GHC.Prim
isNaN' :: Double -> Bool isNaN' d = d /= d
isNaNBits :: Double -> Bool isNaNBits (D# d) = case (bits `and#` expMask) `eqWord#` expMask of 1# -> case bits `and#` mantissaMask of 0## -> False _ -> True _ -> False where bits :: Word# bits = stgDoubleToWord64 d
expMask, mantissaMask :: Word# expMask = 0x7FF0000000000000## mantissaMask = 0x000FFFFFFFFFFFFF##
main :: IO () main = sumFilter isNaN {-isNaN'-} {-isNaNBits-} (mkInput 100000000) `seq` pure () where nan :: Double nan = log (-1)
mkInput :: Int -> [Double] mkInput n = take n $ cycle [1, nan]
sumFilter :: (Double -> Bool) -> [Double] -> Double sumFilter p = Prelude.sum . Prelude.filter (not . p) ```
-- Mateusz K. _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
-- Mateusz K.
-- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net

On 03/06/2018 10:43 AM, Brandon Allbery wrote:
I'd in general expect good C code to optimize a little better; and in particular, decomposing an IEEE float is almost certainly more expensive in Haskell than in C, because unions let you cheat. (And I recall looking at the implementation of decodeFloat once; it's significantly longer than that C.) But I have to wonder if that code would be better done with something more native; the implementation may be a portable default, and you might be able to find something x86-specific that is faster.
There's a https://c9x.me/x86/html/file_module_x86_id_316.html that the ‘d /= d’ way compiles to. I suppose maybe I could just keep using that and fall back onto isDoubleNaN if __FAST_MATH__ is set…
On Tue, Mar 6, 2018 at 5:35 AM, Mateusz Kowalczyk
wrote: If the FFI version is done with "safe", consider using "unsafe" instead. There are technical reasons why this is slightly incorrect, but unless you're fiddling with the CPU's FP control flags they're mostly irrelevant and you can treat isNaN as pure and non-side-effectful, significantly reducing the overhead. You may also be able to use "ccall" to take advantage of C compiler level optimizations, or simply to directly invoke a CPU-based test with asm(); but you'll need to hide that in a C
On 03/05/2018 10:23 PM, Brandon Allbery wrote: preprocessor
#define, so that it looks syntactically like a function call to the FFI.
(One of the technical reasons is that various OSes have been known to introduce bugs in their FP register and state handling across system calls, in which case the "safe" version may turn "complete FP chaos" into merely "wrong answer". It's your call whether, or which side, of this bothers you.)
Perhaps I was a little unclear. The FFI-using isDoubleNaN is something GHC does!
``` libraries/base/GHC/Float.hs:foreign import ccall unsafe "isDoubleNaN" isDoubleNaN :: Double -> Int ``` ``` HsInt isDoubleNaN(HsDouble d) { union stg_ieee754_dbl u;
u.d = d;
return ( u.ieee.exponent == 2047 /* 2^11 - 1 */ && /* Is the exponent all ones? */ (u.ieee.mantissa0 != 0 || u.ieee.mantissa1 != 0) /* and the mantissa non-zero? */ ); } ```
My question is whether it could do better by not doing FFI and instead computing natively and if not, why not?
On Mon, Mar 5, 2018 at 4:53 PM, Mateusz Kowalczyk < fuuzetsu@fuuzetsu.co.uk> wrote:
Hi,
Recently at a client I was profiling some code and isDoubleNaN lit up. We were checking a lot of doubles for NaN as that's what customer would send in.
I went to investigate and I found that FFI is used to achieve this. I was always under the impression that FFI costs a little. I had at the time replaced the code with a hack with great results:
``` isNaN' :: Double -> Bool isNaN' d = d /= d ```
While this worked and provided good speedup in my case, this fails catastrophically if the program is compiled with -ffast-math. This is expected. I have since reverted it. Seeking an alternative solution I have thought about re-implementing the C code with a native Haskell version: after all it just checks a few bits. Apparently unsafeCoerce# and friends were a big no-no but I found https://phabricator.haskell.org/D3358 . I have implemented the code at the bottom of this post. Obviously it's missing endianness (compile-time switch).
This seems to be faster for smaller `mkInput` list than Prelude.isNaN but slower slightly on the one below. The `/=` version is the fastest but very fragile.
My question to you all is whether implementing a version of this function in Haskell makes sense and if not, why not? The stgDoubleToWord64 is implemented in CMM and I don't know anything about the costs of that.
* Is there a cheaper alternative to FFI way? * If yes, does anyone know how to write it such that it compiles to same code but without the call overhead? I must have failed below as it's slower on some inputs.
Basically if a faster way exists for isNaN, something I have to do a lot, I'd love to hear about it.
I leave you with basic code I managed to come up with. 8.4.x only.
``` {-# LANGUAGE MagicHash #-} {-# OPTIONS_GHC -O2 -ddump-simpl -ddump-stg -ddump-to-file -ddump-asm #-} module Main (main) where
import GHC.Float import GHC.Prim
isNaN' :: Double -> Bool isNaN' d = d /= d
isNaNBits :: Double -> Bool isNaNBits (D# d) = case (bits `and#` expMask) `eqWord#` expMask of 1# -> case bits `and#` mantissaMask of 0## -> False _ -> True _ -> False where bits :: Word# bits = stgDoubleToWord64 d
expMask, mantissaMask :: Word# expMask = 0x7FF0000000000000## mantissaMask = 0x000FFFFFFFFFFFFF##
main :: IO () main = sumFilter isNaN {-isNaN'-} {-isNaNBits-} (mkInput 100000000) `seq` pure () where nan :: Double nan = log (-1)
mkInput :: Int -> [Double] mkInput n = take n $ cycle [1, nan]
sumFilter :: (Double -> Bool) -> [Double] -> Double sumFilter p = Prelude.sum . Prelude.filter (not . p) ```
-- Mateusz K. _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
-- Mateusz K.
-- Mateusz K.

Hi, You can try with foreign primops, it should be faster than the FFI: In IsDoubleNanPrim.s: .global isDoubleNan_prim isDoubleNan_prim: xor %rbx,%rbx ucomisd %xmm1, %xmm1 lahf testb $68, %ah jnp .Lout mov $1, %rbx .Lout: jmp * (%rbp) In IsDoubleNan.hs: {-# LANGUAGE GHCForeignImportPrim #-} {-# LANGUAGE MagicHash #-} {-# LANGUAGE UnliftedFFITypes #-} module Main where import GHC.Base foreign import prim "isDoubleNan_prim" isDoubleNan_prim :: Double# -> Int# isDoubleNan :: Double -> Bool isDoubleNan (D# d#) = case isDoubleNan_prim d# of 0# -> False _ -> True main :: IO () main = do let testNaN x = putStrLn $ "Testing " ++ show x ++ ": " ++ show (isDoubleNan x) testNaN 10.3 testNaN (0/0) Compile with: ghc -Wall -O IsDoubleNan.hs IsDoubleNanPrim.s I haven't benchmarked this but I would be interested to see the comparison with the other versions on your benchmarks! Cheers, Sylvain On 05/03/2018 22:53, Mateusz Kowalczyk wrote:
Hi,
Recently at a client I was profiling some code and isDoubleNaN lit up. We were checking a lot of doubles for NaN as that's what customer would send in.
I went to investigate and I found that FFI is used to achieve this. I was always under the impression that FFI costs a little. I had at the time replaced the code with a hack with great results:
``` isNaN' :: Double -> Bool isNaN' d = d /= d ```
While this worked and provided good speedup in my case, this fails catastrophically if the program is compiled with -ffast-math. This is expected. I have since reverted it. Seeking an alternative solution I have thought about re-implementing the C code with a native Haskell version: after all it just checks a few bits. Apparently unsafeCoerce# and friends were a big no-no but I found https://phabricator.haskell.org/D3358 . I have implemented the code at the bottom of this post. Obviously it's missing endianness (compile-time switch).
This seems to be faster for smaller `mkInput` list than Prelude.isNaN but slower slightly on the one below. The `/=` version is the fastest but very fragile.
My question to you all is whether implementing a version of this function in Haskell makes sense and if not, why not? The stgDoubleToWord64 is implemented in CMM and I don't know anything about the costs of that.
* Is there a cheaper alternative to FFI way? * If yes, does anyone know how to write it such that it compiles to same code but without the call overhead? I must have failed below as it's slower on some inputs.
Basically if a faster way exists for isNaN, something I have to do a lot, I'd love to hear about it.
I leave you with basic code I managed to come up with. 8.4.x only.
``` {-# LANGUAGE MagicHash #-} {-# OPTIONS_GHC -O2 -ddump-simpl -ddump-stg -ddump-to-file -ddump-asm #-} module Main (main) where
import GHC.Float import GHC.Prim
isNaN' :: Double -> Bool isNaN' d = d /= d
isNaNBits :: Double -> Bool isNaNBits (D# d) = case (bits `and#` expMask) `eqWord#` expMask of 1# -> case bits `and#` mantissaMask of 0## -> False _ -> True _ -> False where bits :: Word# bits = stgDoubleToWord64 d
expMask, mantissaMask :: Word# expMask = 0x7FF0000000000000## mantissaMask = 0x000FFFFFFFFFFFFF##
main :: IO () main = sumFilter isNaN {-isNaN'-} {-isNaNBits-} (mkInput 100000000) `seq` pure () where nan :: Double nan = log (-1)
mkInput :: Int -> [Double] mkInput n = take n $ cycle [1, nan]
sumFilter :: (Double -> Bool) -> [Double] -> Double sumFilter p = Prelude.sum . Prelude.filter (not . p) ```
participants (3)
-
Brandon Allbery
-
Mateusz Kowalczyk
-
Sylvain Henry