
Brian Hulley wrote:
With -O2 enabled, __ccall_GC duma_vertex3f is indeed called directly instead of vertex3f, from a different module, so that proves that different monads can indeed be used to wrap IO operations without any performance penalty at all.
However I've just discovered there *is* a penalty for converting between callback functions that return a different monad from the IO monad. For example, if I have a RenderM monad that allows primitives to be drawn to the screen, and a callback: newtype RenderM a = RenderM (IO a) deriving (Functor, Monad, MonadIO) type RenderCallback = Int -> Int -> RenderM () where the intention is that the callback will take the width and height of the window and return a RenderM action, the problem is that because the FFI does not allow RenderM to appear in a foreign type, the actual render function has to be converted into a function which returns an IO action instead of a RenderM action eg by: type RenderCallbackIO = Int -> Int -> IO () dropRenderM :: RenderCallback -> RenderCallbackIO dropRenderM f x y = let RenderM io = f x y in io foreign import ccall duma_onRender :: FunPtr RenderCallbackIO -> IO () foreign import ccall "wrapper" mkRenderCallbackIO :: RenderCallbackIO -> IO (FunPtr RenderCallbackIO) onRender :: RenderCallback -> IO () onRender f = mkRenderCallbackIO (dropRenderM f) >>= duma_onRender With -O2 optimization, GHC does not seem to be able to optimize out the call to dropRenderM even though this just changes the return value of f from RenderM (IO a) to IO a, so RenderM is not transparent after all: Duma.onRender = \ (f :: Duma.RenderCallback) (eta :: GHC.Prim.State# GHC.Prim.RealWorld) -> case (# GHC.Prim.State# GHC.Prim.RealWorld, () #) Duma.mkRenderCallbackIO (Duma.dropRenderM f) eta of wild { (# new_s, a86 #) -> case (# GHC.Prim.State# GHC.Prim.RealWorld, () #) a86 of ds { GHC.Ptr.FunPtr ds1 -> case (# GHC.Prim.State# GHC.Prim.RealWorld, () #) {__ccall_GC duma_onRender GHC.Prim.Addr# -> GHC.Prim.State# GHC.Prim.RealWorld -> (# GHC.Prim.State# GHC.Prim.RealWorld #)} ds1 new_s of wild1 { (# ds2 #) -> (# ds2, GHC.Base.() #) } } } I must admit I'm not at all clear how to read the -ddump-simpl output so I may have got this wrong, but since Duma.dropRenderM is mentioned, I think this means this has not been optimized out. Therefore there does seem to be an overhead for using different monads at the moment (?) Regards, Brian.