On Sun, Feb 26, 2012 at 1:36 PM, Yves Parès <yves.pares@gmail.com> wrote:
Hello,
When I was using C code from Python, the overhead put on calling C code by Python was significant.
To simplify, say I have on C-side two procedures f and g, I do all the stuff to call them in a row from Python, well I'm better off factorizing: adding on C side a wrapper h procedure that calls them both, and then call h from Python, because then I will have half as much overhead:

Instead of SwitchToC -> call f -> SwitchToPython -> SwitchToC -> call g -> SwitchToPython,
the factorization leads to SwitchToC -> call f -> call g -> SwitchToPython,
which gives the same result yet is different performance-wise because each switching has a cost.

This is painful, because if another time I have to call f and j (another function), then I have to make another wrapper.

In Haskell world, now, given that my functions f and g would have been imported using unsafe:

foreign import unsafe "f" f :: Thing -> Foo -> IO ()
foreign import unsafe "g" g :: Stuff -> Bar -> IO ()
foreign import unsafe "h" h :: Thing -> Foo -> Stuff -> Bar -> IO ()

Are
doStuff = f x y >> g z w
and
doStuff = h x y z w
equivalent, or is there an overhead (e.g. due to IO monad, or due to the way the FFI does the calls) when compiled (if need be with optimizations) with GHC?

Each unsafe call to FFI should be pretty fast - I have measured it to be about 15ns on my computer (~30-50 clock cycles). Assuming C implementation of (f;g) and h take about the same time in C, first version of doStuff would likely be a bit slower than second version because of one additional FFI call - I would expect it to take ~15ns more on my computer. From what I have seen in my code, GHC optimizes away IO monad when compiled with -O2 flag. So, impact of IO monad on performance should usually be negligible/close to zero.

Keep in mind that unsafe call to FFI will block the OS capability used up for that FFI thread until that FFI call returns (more here: http://blog.melding-monads.com/category/haskell/). So, if it takes long time to execute, you might want to use the safe version instead. I have seen safe version take about ~150ns in my tests.

Like Jason said, you could create a micro-benchmark to see the difference. For example, like below:

Haskell code:

import Control.Monad (forM_)
import Foreign.C.Types (CInt)
import Data.Time.Clock (diffUTCTime, getCurrentTime)
import Foreign.C


foreign import ccall safe "print"
  printsafe :: CInt -> IO ()
foreign import ccall unsafe "print"
  printunsafe :: CInt -> IO ()


main = do
  let  l = 50000
  a <- getCurrentTime
  forM_ [1..l] $ \x -> printsafe x
  b <- getCurrentTime
  forM_ [1..l] $ \x -> printunsafe x
  c <- getCurrentTime

  print $ "safe call average overhead " ++ show ((diffUTCTime b a)/fromIntegral l)
  print $ "unsafe call average overhead " ++ show ((diffUTCTime c b)/fromIntegral l)


C code:

#include <stdio.h>

void print(int x){
  return;
}