Foreign function performance: monadic vs pure

Consider two versions of sin wrapped: foreign import ccall "math.h sin" c_sin_m :: CDouble -> IO CDouble and foreign import ccall "math.h sin" c_sin :: CDouble -> CDouble One can invoke them so: mapM c_sin_m [1..n] mapM (return . c_sin) [1..n] On my computer with n = 10^7 the first version never finishes, whereas the second one calculates the result within seconds. To give you my context, I need to call a random variable generator multiple times, so that it must return IO a. Any explanation for this behavior?

On Mon, 2011-04-11 at 12:09 +0000, Serguei Son wrote:
Consider two versions of sin wrapped: foreign import ccall "math.h sin" c_sin_m :: CDouble -> IO CDouble and foreign import ccall "math.h sin" c_sin :: CDouble -> CDouble
One can invoke them so:
mapM c_sin_m [1..n] mapM (return . c_sin) [1..n]
On my computer with n = 10^7 the first version never finishes, whereas the second one calculates the result within seconds.
To give you my context, I need to call a random variable generator multiple times, so that it must return IO a.
Any explanation for this behavior?
Simple (but possibly wrong) - the first one is always evaluated (as it might have side-effects) while the second one is left in unevaluated form (return does not force effects): (values for 2^14)
mapM c_sin_m [1..n] 1.087 s mapM (return . c_sin) [1..n] 0.021 s mapM (\x -> return $! c_sin x) [1..n] 1.160 s return $ map c_sin [1..n] 0.006 s mapM (const (return undefined)) [1..n] 0.011 s
I.e. - c_sin_m have forced evaluation so you do 10^7 times save of Haskell context (it is not marked as unsafe) and call of function - return . c_sin have not forced evaluation so you do 10^7 times wrap unevaluated value into IO To compare:
foreign import ccall unsafe "math.h sin" c_sin_um :: CDouble -> IO CDouble
foreign import ccall unsafe "math.h sin" c_sin_u :: CDouble -> CDouble
main = mapM c_sin_um [1..n] 0.028 s main = mapM (\x -> return $! c_sin_u) [1..n] 0.012 s main = mapM (return . c_sin_u) [1..n] 0.023 s
I.e. it is difference in laziness of Haskell and the making sure that function may safely call back to Haskell (which sin does not). Regards

Felipe Almeida Lessa
On Mon, Apr 11, 2011 at 10:14 AM, Maciej Marcin Piechotka
wrote: main = mapM (\x -> return $! c_sin_u) [1..n] 0.012 s
This should be
main = mapM (\x -> return $! c_sin_u x) [1..n]
So if I must use a safe function returning IO a, there is no way to improve its performance? To give you a benchmark, calling gsl_ran_ugaussian a million times in pure C takes only a second or two on my system.

Serguei Son
Felipe Almeida Lessa
writes: On Mon, Apr 11, 2011 at 10:14 AM, Maciej Marcin Piechotka
wrote: main = mapM (\x -> return $! c_sin_u) [1..n] 0.012 s
This should be
main = mapM (\x -> return $! c_sin_u x) [1..n]
So if I must use a safe function returning IO a, there is no way to improve its performance? To give you a benchmark, calling gsl_ran_ugaussian a million times in pure C takes only a second or two on my system.
Also, please note that I can force the evaluation of c_sin, e.g. mapM (return . c_sin) [1..n] >>= (print $ foldl' (+) 0) And it will still execute reasonably fast.

Serguei Son
Also, please note that I can force the evaluation of c_sin, e.g.
mapM (return . c_sin) [1..n] >>= (print $ foldl' (+) 0)
And it will still execute reasonably fast.
Pls disregard the my previous post. I actually meant let lst = map c_sin [1..n] print $ foldl' (+) 0 lst This executes in 0.2 s for n = 10^7. c_sin is safe, as well as c_sin_m. The only difference is CDouble -> CDouble vs CDouble -> IO CDouble.

On Mon, Apr 11, 2011 at 3:55 PM, Serguei Son
So if I must use a safe function returning IO a, there is no way to improve its performance? To give you a benchmark, calling gsl_ran_ugaussian a million times in pure C takes only a second or two on my system.
In the C version, are you also producing a linked list containing all
of the values? Because that's what mapM does. Your test is mostly
measuring the cost of allocating and filling ~3 million machine words
on the heap. Try mapM_ instead.
G
--
Gregory Collins

On Mon, Apr 11, 2011 at 8:09 AM, Serguei Son
Consider two versions of sin wrapped: foreign import ccall "math.h sin" c_sin_m :: CDouble -> IO CDouble
Marking this call as unsafe (i.e. foreign import ccall unsafe "math.h sin") can improve performance dramatically. If the FFI call is quick, then I believe this is the recommended approach. If you really need the imported function to be thread safe, then perhaps you should move more of the calculation into C to decrease the granularity of FFI calls. It is remarkably easy to get the meanings of safe and unsafe confused, and I can't even see the word "unsafe" in the current FFI user's guide! http://www.haskell.org/ghc/docs/7.0.3/html/users_guide/ffi-ghc.html Anthony
and foreign import ccall "math.h sin" c_sin :: CDouble -> CDouble
One can invoke them so:
mapM c_sin_m [1..n] mapM (return . c_sin) [1..n]
On my computer with n = 10^7 the first version never finishes, whereas the second one calculates the result within seconds.
To give you my context, I need to call a random variable generator multiple times, so that it must return IO a.
Any explanation for this behavior?
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
participants (5)
-
Anthony Cowley
-
Felipe Almeida Lessa
-
Gregory Collins
-
Maciej Marcin Piechotka
-
Serguei Son