Re: Possible runtime overhead of wrapping the IO monad?

30 Mar 2006

      Brian Hulley wrote:
...
With -O2 enabled, __ccall_GC duma_vertex3f is indeed called directly
instead of vertex3f, from a different module, so that proves that
different monads can indeed be used to wrap IO operations without any
performance penalty at all.
However I've just discovered there *is* a penalty for converting between 
callback functions that return a different monad from the IO monad. For 
example, if I have a RenderM monad that allows primitives to be drawn to the 
screen, and a callback:

       newtype RenderM a = RenderM (IO a) deriving (Functor, Monad, MonadIO)

       type RenderCallback = Int -> Int -> RenderM ()

where the intention is that the callback will take the width and height of 
the window and return a RenderM action, the problem is that because the FFI 
does not allow RenderM to appear in a foreign type, the actual render 
function has to be converted into a function which returns an IO action 
instead of a RenderM action eg by:

       type RenderCallbackIO = Int -> Int -> IO ()

       dropRenderM :: RenderCallback -> RenderCallbackIO
       dropRenderM f x y = let RenderM io = f x y in io

       foreign import ccall duma_onRender :: FunPtr RenderCallbackIO -> IO 
()

       foreign import ccall "wrapper" mkRenderCallbackIO
            :: RenderCallbackIO -> IO (FunPtr RenderCallbackIO)

       onRender :: RenderCallback -> IO ()
       onRender f = mkRenderCallbackIO (dropRenderM f) >>= duma_onRender

With -O2 optimization, GHC does not seem to be able to optimize out the call 
to dropRenderM even though this just changes the return value of f from 
RenderM (IO a) to IO a, so RenderM is not transparent after all:

Duma.onRender = \ (f :: Duma.RenderCallback)
    (eta :: GHC.Prim.State# GHC.Prim.RealWorld) ->
    case (# GHC.Prim.State# GHC.Prim.RealWorld, () #) 
Duma.mkRenderCallbackIO
              (Duma.dropRenderM f) eta
    of wild { (# new_s, a86 #) ->
    case (# GHC.Prim.State# GHC.Prim.RealWorld, () #) a86
    of ds { GHC.Ptr.FunPtr ds1 ->
    case (# GHC.Prim.State# GHC.Prim.RealWorld,
     () #) {__ccall_GC duma_onRender GHC.Prim.Addr#
     -> GHC.Prim.State# GHC.Prim.RealWorld
     -> (# GHC.Prim.State# GHC.Prim.RealWorld #)}
      ds1 new_s
    of wild1 { (# ds2 #) ->
    (# ds2, GHC.Base.() #)
    }
    }
    }

I must admit I'm not at all clear how to read the -ddump-simpl output so I 
may have got this wrong, but since Duma.dropRenderM is mentioned, I think 
this means this has not been optimized out.

Therefore there does seem to be an overhead for using different monads at 
the moment (?)

Regards, Brian.