Possible runtime overhead of wrapping the IO monad?

Hi, I'm designing an API for a simple graphics window, and am trying to make the correct usage of the API functions explicit and visible to the type system by using different monads which ultimately just wrap the IO monad. For example, in a callback for rendering stuff to the screen, only operations in the RenderM monad are allowed, and when specifying vertex info for a primitive, only VertexM operations are allowed. However I'm wondering if I can rely on all this monad stuff being optimized out at compile time. A sample monad is below: newtype VertexM a = VertexM (IO a) instance Monad VertexM where VertexM x >>= fry = VertexM $ do ax <- x let VertexM y = fry ax y return x = VertexM $ return x instance MonadIO VertexM where liftIO = VertexM The monad doesn't do anything interesting apart from allowing the type checker to reject programs that don't use the API the way it was intended (all these things you have to keep in your head in C programs), but I don't want to use it if I'm going to get a performance hit. Also, in: foreign import ccall duma_vertex3f :: Float -> Float -> Float -> IO () vertex3f :: Float -> Float -> Float -> VertexM () vertex3f x y z = liftIO $ duma_vertex3f x y z is there a penalty involved in calling vertex3f (from another module) or will the worker/wrapper optimization ensure that machine code in the other module just calls duma_vertex3f directly since the liftIO operation is just an irrelevance at the machine code level? So far I've just been using ghc --make and not bothering about what kind of code is generated. Is there a flag I can use to get ghc to output the stg code (or something higher level than just x86 machine code itself) so I can look at the output to see what optimizations are being done? Thanks, Brian.

Brian Hulley wrote:
Hi, I'm designing an API for a simple graphics window, and am trying to make the correct usage of the API functions explicit and visible to the type system by using different monads which ultimately just wrap the IO monad. For example, in a callback for rendering stuff to the screen, only operations in the RenderM monad are allowed, and when specifying vertex info for a primitive, only VertexM operations are allowed.
However I'm wondering if I can rely on all this monad stuff being optimized out at compile time. A sample monad is below:
newtype VertexM a = VertexM (IO a)
instance Monad VertexM where VertexM x >>= fry = VertexM $ do ax <- x let VertexM y = fry ax y
return x = VertexM $ return x
instance MonadIO VertexM where liftIO = VertexM
There should be no overhead for a newtype. The above can be shortened to one line: newtype VertexM a = VertexM (IO a) deriving (Functor,Monad,MonadIO) (Needs ghc -fglasgow-exts, I expect)
Also, in:
foreign import ccall duma_vertex3f :: Float -> Float -> Float -> IO ()
vertex3f :: Float -> Float -> Float -> VertexM () vertex3f x y z = liftIO $ duma_vertex3f x y z
is there a penalty involved in calling vertex3f (from another module) or will the worker/wrapper optimization ensure that machine code in the other module just calls duma_vertex3f directly since the liftIO operation is just an irrelevance at the machine code level?
I doubt there is a penalty.
So far I've just been using ghc --make and not bothering about what kind of code is generated. Is there a flag I can use to get ghc to output the stg code (or something higher level than just x86 machine code itself) so I can look at the output to see what optimizations are being done?
Thanks, Brian.
Yes, there are several ghc options: -ddump-<insert keyword> is documented in http://www.haskell.org/ghc/docs/latest/html/users_guide/options-debugging.ht... In particular -ddump-simpl has been helpful for some people, and you want -ddump-stg, perhaps.

Chris Kuklewicz wrote:
Brian Hulley wrote:
Hi, I'm designing an API for a simple graphics window, and am trying to [snip]
There should be no overhead for a newtype. The above can be shortened to one line:
newtype VertexM a = VertexM (IO a) deriving (Functor,Monad,MonadIO)
Thanks - that certainly saves a lot of typing! :-)
(Needs ghc -fglasgow-exts, I expect)
Also, in:
foreign import ccall duma_vertex3f :: Float -> Float -> Float -> IO ()
vertex3f :: Float -> Float -> Float -> VertexM () vertex3f x y z = liftIO $ duma_vertex3f x y z
is there a penalty involved in calling vertex3f (from another module) or will the worker/wrapper optimization ensure that machine code in the other module just calls duma_vertex3f directly since the liftIO operation is just an irrelevance at the machine code level?
I doubt there is a penalty. [snip] In particular -ddump-simpl has been helpful for some people, and you want -ddump-stg, perhaps.
I've just now tried compiling with -ddump-stg but this is difficult (for me) to understand so I tried with -ddump-simpl as you suggested, and compared the outputs when compiling with and without the -O2 optimization flag. With -O2 enabled, __ccall_GC duma_vertex3f is indeed called directly instead of vertex3f, from a different module, so that proves that different monads can indeed be used to wrap IO operations without any performance penalty at all. What a great language and compiler!!! :-) One little thing I wondered about when looking at the -ddump-simpl output: for each C API function, there is a warning of the form: warning: implicit declaration of function `duma_vertex3f' Do you know what this means? It doesn't seem to matter as everything compiles and runs ok but it would be interesting to know. Thanks, Brian.

Brian Hulley wrote:
With -O2 enabled, __ccall_GC duma_vertex3f is indeed called directly instead of vertex3f, from a different module, so that proves that different monads can indeed be used to wrap IO operations without any performance penalty at all.
However I've just discovered there *is* a penalty for converting between callback functions that return a different monad from the IO monad. For example, if I have a RenderM monad that allows primitives to be drawn to the screen, and a callback: newtype RenderM a = RenderM (IO a) deriving (Functor, Monad, MonadIO) type RenderCallback = Int -> Int -> RenderM () where the intention is that the callback will take the width and height of the window and return a RenderM action, the problem is that because the FFI does not allow RenderM to appear in a foreign type, the actual render function has to be converted into a function which returns an IO action instead of a RenderM action eg by: type RenderCallbackIO = Int -> Int -> IO () dropRenderM :: RenderCallback -> RenderCallbackIO dropRenderM f x y = let RenderM io = f x y in io foreign import ccall duma_onRender :: FunPtr RenderCallbackIO -> IO () foreign import ccall "wrapper" mkRenderCallbackIO :: RenderCallbackIO -> IO (FunPtr RenderCallbackIO) onRender :: RenderCallback -> IO () onRender f = mkRenderCallbackIO (dropRenderM f) >>= duma_onRender With -O2 optimization, GHC does not seem to be able to optimize out the call to dropRenderM even though this just changes the return value of f from RenderM (IO a) to IO a, so RenderM is not transparent after all: Duma.onRender = \ (f :: Duma.RenderCallback) (eta :: GHC.Prim.State# GHC.Prim.RealWorld) -> case (# GHC.Prim.State# GHC.Prim.RealWorld, () #) Duma.mkRenderCallbackIO (Duma.dropRenderM f) eta of wild { (# new_s, a86 #) -> case (# GHC.Prim.State# GHC.Prim.RealWorld, () #) a86 of ds { GHC.Ptr.FunPtr ds1 -> case (# GHC.Prim.State# GHC.Prim.RealWorld, () #) {__ccall_GC duma_onRender GHC.Prim.Addr# -> GHC.Prim.State# GHC.Prim.RealWorld -> (# GHC.Prim.State# GHC.Prim.RealWorld #)} ds1 new_s of wild1 { (# ds2 #) -> (# ds2, GHC.Base.() #) } } } I must admit I'm not at all clear how to read the -ddump-simpl output so I may have got this wrong, but since Duma.dropRenderM is mentioned, I think this means this has not been optimized out. Therefore there does seem to be an overhead for using different monads at the moment (?) Regards, Brian.

On Thu, Mar 30, 2006 at 03:50:06AM +0100, Brian Hulley wrote:
where the intention is that the callback will take the width and height of the window and return a RenderM action, the problem is that because the FFI does not allow RenderM to appear in a foreign type.
it should, the types in foreign declarations should "see through" newtypes. John -- John Meacham - ⑆repetae.net⑆john⑈

John Meacham wrote:
On Thu, Mar 30, 2006 at 03:50:06AM +0100, Brian Hulley wrote:
where the intention is that the callback will take the width and height of the window and return a RenderM action, the problem is that because the FFI does not allow RenderM to appear in a foreign type.
it should, the types in foreign declarations should "see through" newtypes.
Unfortunately GHC does not seem to support this: foreign import ccall duma_clear :: Word.Word32 -> RenderM () Unacceptable result type in foreign declaration: RenderM () When checking declaration: foreign import ccall safe "static &duma_clear" duma_clear :: GHC.Word.Word32 -> RenderM () even though the FFI spec agrees with you (Section 3.2): The argument types ati produced by fatype must be marshallable foreign types; that is, each ati is either (1) a basic foreign type or (2) a type synonym or renamed datatype of a marshallable foreign type. Moreover, the result type rt produced by frtype must be a marshallable foreign result type; that is, it is either a marshallable foreign type, ... Thus leading to all the messy fiddling about I have to do to use RenderM instead of IO in my imported functions and callbacks. A tool like greencard may solve some of these problems but the problem I would then have is how to incorporate it in my very flaky development environment (I'm just using the plain text editor in Visual C++ with a shortcut bound to a batch file which calls ghc so the build process is all held together by bits of string and sellotape therefore I don't want to complicate it further if at all possible...) By the way, I think I may have been wrong about GHC not optimizing the call to dropRenderM out of the onRender function in my last post, because when I look at the output using -dddump-cmm (instead of -simpl) dropRenderM is not mentioned in the code for onRender (although I'm not an expert at reading cmm output either and it is mentioned in some other places....) Regards, Brian.

On Thursday 30 March 2006 14:13, Brian Hulley wrote:
John Meacham wrote:
On Thu, Mar 30, 2006 at 03:50:06AM +0100, Brian Hulley wrote:
where the intention is that the callback will take the width and height of the window and return a RenderM action, the problem is that because the FFI does not allow RenderM to appear in a foreign type.
it should, the types in foreign declarations should "see through" newtypes.
Unfortunately GHC does not seem to support this:
foreign import ccall duma_clear :: Word.Word32 -> RenderM ()
Unacceptable result type in foreign declaration: RenderM () When checking declaration: foreign import ccall safe "static &duma_clear" duma_clear :: GHC.Word.Word32 -> RenderM ()
even though the FFI spec agrees with you (Section 3.2):
The argument types ati produced by fatype must be marshallable foreign types; that is, each ati is either (1) a basic foreign type or (2) a type synonym or renamed datatype of a marshallable foreign type. Moreover, the result type rt produced by frtype must be a marshallable foreign result type; that is, it is either a marshallable foreign type, ...
Note that the addendum continues "...that is, it is either a marshallable foreign type, the type (), or a type matching Prelude.IO t, where t is a marshallable foreign type or ()." Nowhere is it written that 'IO t' (in itself) is a marshallable type whenever 't' is a marshallable type. [I am not sure what it means exactly if a type "matches" 'Prelude.IO t'.] Thus, GHC does nothing wrong, according to the addendum. That doesn't mean allowing IO-equivalent newtypes wouldn't be a good idea. It is just not written in the addendum. Ben

Benjamin Franksen wrote:
[snip] Thus, GHC does nothing wrong, according to the addendum. That doesn't mean allowing IO-equivalent newtypes wouldn't be a good idea. It is just not written in the addendum.
Apologies for not reading the addendum properly and slighting the good character of GHC... :-) I've submitted a feature request that explains in more detail why I think it would be a good idea which can be read at http://hackage.haskell.org/trac/ghc/ticket/736 Regards, Brian.

Brian I've committed a fix for this. By which I mean that you don't need to write dropRenderM. You can just use RenderM as if it were IO. The change won't be in 6.4.2, but it's in the HEAD and will be in 6.6 Simon | -----Original Message----- | From: glasgow-haskell-users-bounces@haskell.org [mailto:glasgow-haskell-users- | bounces@haskell.org] On Behalf Of Brian Hulley | Sent: 30 March 2006 03:50 | To: glasgow-haskell-users@haskell.org | Subject: Re: Possible runtime overhead of wrapping the IO monad? | | Brian Hulley wrote: | > With -O2 enabled, __ccall_GC duma_vertex3f is indeed called directly | > instead of vertex3f, from a different module, so that proves that | > different monads can indeed be used to wrap IO operations without any | > performance penalty at all. | | However I've just discovered there *is* a penalty for converting between | callback functions that return a different monad from the IO monad. For | example, if I have a RenderM monad that allows primitives to be drawn to the | screen, and a callback: | | newtype RenderM a = RenderM (IO a) deriving (Functor, Monad, MonadIO) | | type RenderCallback = Int -> Int -> RenderM () | | where the intention is that the callback will take the width and height of | the window and return a RenderM action, the problem is that because the FFI | does not allow RenderM to appear in a foreign type, the actual render | function has to be converted into a function which returns an IO action | instead of a RenderM action eg by: | | type RenderCallbackIO = Int -> Int -> IO () | | dropRenderM :: RenderCallback -> RenderCallbackIO | dropRenderM f x y = let RenderM io = f x y in io | | foreign import ccall duma_onRender :: FunPtr RenderCallbackIO -> IO | () | | foreign import ccall "wrapper" mkRenderCallbackIO | :: RenderCallbackIO -> IO (FunPtr RenderCallbackIO) | | onRender :: RenderCallback -> IO () | onRender f = mkRenderCallbackIO (dropRenderM f) >>= duma_onRender | | With -O2 optimization, GHC does not seem to be able to optimize out the call | to dropRenderM even though this just changes the return value of f from | RenderM (IO a) to IO a, so RenderM is not transparent after all: | | Duma.onRender = \ (f :: Duma.RenderCallback) | (eta :: GHC.Prim.State# GHC.Prim.RealWorld) -> | case (# GHC.Prim.State# GHC.Prim.RealWorld, () #) | Duma.mkRenderCallbackIO | (Duma.dropRenderM f) eta | of wild { (# new_s, a86 #) -> | case (# GHC.Prim.State# GHC.Prim.RealWorld, () #) a86 | of ds { GHC.Ptr.FunPtr ds1 -> | case (# GHC.Prim.State# GHC.Prim.RealWorld, | () #) {__ccall_GC duma_onRender GHC.Prim.Addr# | -> GHC.Prim.State# GHC.Prim.RealWorld | -> (# GHC.Prim.State# GHC.Prim.RealWorld #)} | ds1 new_s | of wild1 { (# ds2 #) -> | (# ds2, GHC.Base.() #) | } | } | } | | I must admit I'm not at all clear how to read the -ddump-simpl output so I | may have got this wrong, but since Duma.dropRenderM is mentioned, I think | this means this has not been optimized out. | | Therefore there does seem to be an overhead for using different monads at | the moment (?) | | Regards, Brian. | | _______________________________________________ | Glasgow-haskell-users mailing list | Glasgow-haskell-users@haskell.org | http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

On Mon, Mar 27, 2006 at 08:14:40PM +0100, Brian Hulley wrote:
However I'm wondering if I can rely on all this monad stuff being optimized out at compile time. A sample monad is below:
newtype VertexM a = VertexM (IO a)
in GHC you can actually guarentee there is no overhead with the newtype deriving feature.
newtype VertexM a = VertexM (IO a) deriving(Monad,Functor,MonadIO)
now it will use the exact same dictionaries as the IO monad. the newtype deriving feature is very very useful for this sort of thing, or when you need to make an alternate type that has almost all the qualities of another one, you can newtype-derive all the same bits and just provide the instance for the different one. John -- John Meacham - ⑆repetae.net⑆john⑈
participants (5)
-
Benjamin Franksen
-
Brian Hulley
-
Chris Kuklewicz
-
John Meacham
-
Simon Peyton-Jones