FFI: Overhead of foreign unsafe imports

Hello, When I was using C code from Python, the overhead put on calling C code by Python was significant. To simplify, say I have on C-side two procedures f and g, I do all the stuff to call them in a row from Python, well I'm better off factorizing: adding on C side a wrapper h procedure that calls them both, and then call h from Python, because then I will have half as much overhead: Instead of SwitchToC -> call f -> SwitchToPython -> SwitchToC -> call g -> SwitchToPython, the factorization leads to SwitchToC -> call f -> call g -> SwitchToPython, which gives the same result yet is different performance-wise because each switching has a cost. This is painful, because if another time I have to call f and j (another function), then I have to make another wrapper. In Haskell world, now, given that my functions f and g would have been imported using *unsafe*: foreign import unsafe "f" f :: Thing -> Foo -> IO () foreign import unsafe "g" g :: Stuff -> Bar -> IO () foreign import unsafe "h" h :: Thing -> Foo -> Stuff -> Bar -> IO () Are doStuff = f x y >> g z w and doStuff = h x y z w equivalent, or is there an overhead (e.g. due to IO monad, or due to the way the FFI does the calls) when compiled (if need be with optimizations) with GHC?

On Sun, Feb 26, 2012 at 10:36 AM, Yves Parès
Hello, When I was using C code from Python, the overhead put on calling C code by Python was significant. To simplify, say I have on C-side two procedures f and g, I do all the stuff to call them in a row from Python, well I'm better off factorizing: adding on C side a wrapper h procedure that calls them both, and then call h from Python, because then I will have half as much overhead:
Instead of SwitchToC -> call f -> SwitchToPython -> SwitchToC -> call g -> SwitchToPython, the factorization leads to SwitchToC -> call f -> call g -> SwitchToPython, which gives the same result yet is different performance-wise because each switching has a cost.
This is painful, because if another time I have to call f and j (another function), then I have to make another wrapper.
In Haskell world, now, given that my functions f and g would have been imported using unsafe:
foreign import unsafe "f" f :: Thing -> Foo -> IO () foreign import unsafe "g" g :: Stuff -> Bar -> IO () foreign import unsafe "h" h :: Thing -> Foo -> Stuff -> Bar -> IO ()
Are doStuff = f x y >> g z w and doStuff = h x y z w equivalent, or is there an overhead (e.g. due to IO monad, or due to the way the FFI does the calls) when compiled (if need be with optimizations) with GHC?
I would expect the second doStuff to be more efficient, but you shouldn't care what I expect! To check what is going on you can use a tool like ghc-core (available on hackage) to see what code GHC is generating. To see the difference in performance you can use a micro benchmark tool like criterion (also on hackage) to quantify performance differences. Once you have a conclusion based on some good evidence, please report back here :) I hope that helps, Jason

On Sun, Feb 26, 2012 at 1:36 PM, Yves Parès
Hello, When I was using C code from Python, the overhead put on calling C code by Python was significant. To simplify, say I have on C-side two procedures f and g, I do all the stuff to call them in a row from Python, well I'm better off factorizing: adding on C side a wrapper h procedure that calls them both, and then call h from Python, because then I will have half as much overhead:
Instead of SwitchToC -> call f -> SwitchToPython -> SwitchToC -> call g -> SwitchToPython, the factorization leads to SwitchToC -> call f -> call g -> SwitchToPython, which gives the same result yet is different performance-wise because each switching has a cost.
This is painful, because if another time I have to call f and j (another function), then I have to make another wrapper.
In Haskell world, now, given that my functions f and g would have been imported using *unsafe*:
foreign import unsafe "f" f :: Thing -> Foo -> IO () foreign import unsafe "g" g :: Stuff -> Bar -> IO () foreign import unsafe "h" h :: Thing -> Foo -> Stuff -> Bar -> IO ()
Are doStuff = f x y >> g z w and doStuff = h x y z w equivalent, or is there an overhead (e.g. due to IO monad, or due to the way the FFI does the calls) when compiled (if need be with optimizations) with GHC?
Each unsafe call to FFI should be pretty fast - I have measured it to be
about 15ns on my computer (~30-50 clock cycles). Assuming C implementation
of (f;g) and h take about the same time in C, first version of doStuff
would likely be a bit slower than second version because of one additional
FFI call - I would expect it to take ~15ns more on my computer. From what I
have seen in my code, GHC optimizes away IO monad when compiled with -O2
flag. So, impact of IO monad on performance should usually be
negligible/close to zero.
Keep in mind that unsafe call to FFI will block the OS capability used up
for that FFI thread until that FFI call returns (more here:
http://blog.melding-monads.com/category/haskell/). So, if it takes long
time to execute, you might want to use the safe version instead. I have
seen safe version take about ~150ns in my tests.
Like Jason said, you could create a micro-benchmark to see the difference.
For example, like below:
Haskell code:
import Control.Monad (forM_)import Foreign.C.Types (CInt)import
Data.Time.Clock (diffUTCTime, getCurrentTime)import Foreign.C
foreign import ccall safe "print"
printsafe :: CInt -> IO ()foreign import ccall unsafe "print"
printunsafe :: CInt -> IO ()
main = do
let l = 50000
a <- getCurrentTime
forM_ [1..l] $ \x -> printsafe x
b <- getCurrentTime
forM_ [1..l] $ \x -> printunsafe x
c <- getCurrentTime
print $ "safe call average overhead " ++ show ((diffUTCTime b
a)/fromIntegral l)
print $ "unsafe call average overhead " ++ show ((diffUTCTime c
b)/fromIntegral l)
C code:
#include
participants (3)
-
Jason Dagit
-
Sanket Agrawal
-
Yves Parès