Ridiculously slow FFI, or cairo binding?

Hello, I've got two very simple programs that draw a very simple picture using cairo, doing a couple hundred thousand of cairo calls. One program is in C++. The other is in Haskell and uses the cairo library bindings. The C++ program completes in a fraction of a second, the Haskell program takes about 7-8 seconds to run. They produce exactly the same output. What could be at fault here? Why are the cairo bindings working so slow? (I suppose there isn't too much cairo-specific stuff here, perhaps it's a general FFI question?) #include "cairo.h" int main() { cairo_surface_t *surface = cairo_image_surface_create(CAIRO_FORMAT_ARGB32, 1024, 768); cairo_t *cr = cairo_create(surface); cairo_set_source_rgb(cr, 0, 255, 0); for(int x = 0; x < 1024; x += 2) for(int y = 0; y < 768; y += 2) { cairo_rectangle(cr, x, y, 1, 1); cairo_fill(cr); } cairo_surface_write_to_png(surface, "picture.png"); return 0; } module Main where import qualified Graphics.Rendering.Cairo as C import Control.Monad main = C.withImageSurface C.FormatARGB32 1024 768 $ \s -> do C.renderWith s $ do C.setSourceRGBA 0 255 0 255 forM_ [0,2..1024] $ \x -> do forM_ [0,2..768] $ \y -> do C.rectangle x y 1 1 C.fill C.surfaceWriteToPNG s "picture.png" -- Eugene Kirpichov Principal Engineer, Mirantis Inc. http://www.mirantis.com/ Editor, http://fprog.ru/

I forgot to specify my environment.
Windows Server 2008 R2 x64, ghc 7.0.3.
However, I observed the same speed differences on a 64-bit ubuntu with ghc
6.12 - I profiled my application with cairo-trace, and cairo-perf-trace
drew in a fraction of a second the picture that my Haskell program spend a
dozen seconds drawing.
On Wed, Nov 2, 2011 at 1:17 PM, Eugene Kirpichov
Hello,
I've got two very simple programs that draw a very simple picture using cairo, doing a couple hundred thousand of cairo calls. One program is in C++. The other is in Haskell and uses the cairo library bindings.
The C++ program completes in a fraction of a second, the Haskell program takes about 7-8 seconds to run. They produce exactly the same output.
What could be at fault here? Why are the cairo bindings working so slow? (I suppose there isn't too much cairo-specific stuff here, perhaps it's a general FFI question?)
#include "cairo.h" int main() { cairo_surface_t *surface = cairo_image_surface_create(CAIRO_FORMAT_ARGB32, 1024, 768); cairo_t *cr = cairo_create(surface); cairo_set_source_rgb(cr, 0, 255, 0); for(int x = 0; x < 1024; x += 2) for(int y = 0; y < 768; y += 2) { cairo_rectangle(cr, x, y, 1, 1); cairo_fill(cr); } cairo_surface_write_to_png(surface, "picture.png"); return 0; }
module Main where
import qualified Graphics.Rendering.Cairo as C import Control.Monad
main = C.withImageSurface C.FormatARGB32 1024 768 $ \s -> do C.renderWith s $ do C.setSourceRGBA 0 255 0 255 forM_ [0,2..1024] $ \x -> do forM_ [0,2..768] $ \y -> do C.rectangle x y 1 1 C.fill C.surfaceWriteToPNG s "picture.png"
-- Eugene Kirpichov Principal Engineer, Mirantis Inc. http://www.mirantis.com/ Editor, http://fprog.ru/
-- Eugene Kirpichov Principal Engineer, Mirantis Inc. http://www.mirantis.com/ Editor, http://fprog.ru/

On Wednesday 02 November 2011, 10:19:08, Eugene Kirpichov wrote:
I forgot to specify my environment.
Windows Server 2008 R2 x64, ghc 7.0.3.
However, I observed the same speed differences on a 64-bit ubuntu with ghc 6.12 - I profiled my application with cairo-trace, and cairo-perf-trace drew in a fraction of a second the picture that my Haskell program spend a dozen seconds drawing.
Just FYI, $ uname -a Linux linux-v7dw.site 2.6.37.6-0.7-desktop #1 SMP PREEMPT 2011-07-21 02:17:24 +0200 x86_64 x86_64 x86_64 GNU/Linux $ g++ -O3 -o csurf surf.cc -I/usr/include/cairo -cairo $ time ./csurf real 0m0.126s user 0m0.119s sys 0m0.006s $ ghc-7.0.4 -O2 hssurf.hs [1 of 1] Compiling Main ( hssurf.hs, hssurf.o ) Linking hssurf ... $ time ./hssurf real 0m5.857s user 0m5.840s sys 0m0.011s $ ghc -O2 hssurf.hs -o hssurf2 [1 of 1] Compiling Main ( hssurf.hs, hssurf.o ) Linking hssurf2 ... $ time ./hssurf2 real 0m0.355s user 0m0.350s sys 0m0.005s (fromRational . toRational) is still slow, but nowhere as slow as it used to be.

On 02/11/11 09:17, Eugene Kirpichov wrote:
Hello,
I've got two very simple programs that draw a very simple picture using cairo, doing a couple hundred thousand of cairo calls. One program is in C++. The other is in Haskell and uses the cairo library bindings.
The C++ program completes in a fraction of a second, the Haskell program takes about 7-8 seconds to run. They produce exactly the same output.
What could be at fault here? Why are the cairo bindings working so slow? (I suppose there isn't too much cairo-specific stuff here, perhaps it's a general FFI question?)
I filed a bug report about this some months ago, having noticed similar slowness: gtk2hs ticket #1228 "cairo performance is very bad" http://hackage.haskell.org/trac/gtk2hs/ticket/1228 My conclusion was that it isn't FFI being slow, but some other reason, possibly too much redirection / high level fanciness in the implementation of cairo bindings that the compiler can't see through to optimize aggressively, or possibly some Double / CDouble / realToFrac rubbishness. Claude

Hi Claude,
I suspected that the issue could be about unsafe foreign imports - all
imports in the cairo bindings are "safe".
I compiled myself a version of cairo bindings with the "rectangle" and
"fill" functions marked as unsafe.
Unfortunately that didn't help the case at all, even though the core
changed FFI calls from "__pkg_ccall_GC" to "__pkg_ccall". The performance
stayed the same; the overhead is elsewhere.
On Wed, Nov 2, 2011 at 1:31 PM, Claude Heiland-Allen
On 02/11/11 09:17, Eugene Kirpichov wrote:
Hello,
I've got two very simple programs that draw a very simple picture using cairo, doing a couple hundred thousand of cairo calls. One program is in C++. The other is in Haskell and uses the cairo library bindings.
The C++ program completes in a fraction of a second, the Haskell program takes about 7-8 seconds to run. They produce exactly the same output.
What could be at fault here? Why are the cairo bindings working so slow? (I suppose there isn't too much cairo-specific stuff here, perhaps it's a general FFI question?)
I filed a bug report about this some months ago, having noticed similar slowness:
gtk2hs ticket #1228 "cairo performance is very bad" http://hackage.haskell.org/**trac/gtk2hs/ticket/1228http://hackage.haskell.org/trac/gtk2hs/ticket/1228
My conclusion was that it isn't FFI being slow, but some other reason, possibly too much redirection / high level fanciness in the implementation of cairo bindings that the compiler can't see through to optimize aggressively, or possibly some Double / CDouble / realToFrac rubbishness.
Claude
______________________________**_________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/**mailman/listinfo/haskell-cafehttp://www.haskell.org/mailman/listinfo/haskell-cafe
-- Eugene Kirpichov Principal Engineer, Mirantis Inc. http://www.mirantis.com/ Editor, http://fprog.ru/

On 11/02/2011 09:51 AM, Eugene Kirpichov wrote:
Hi Claude,
I suspected that the issue could be about unsafe foreign imports - all imports in the cairo bindings are "safe". I compiled myself a version of cairo bindings with the "rectangle" and "fill" functions marked as unsafe.
Unfortunately that didn't help the case at all, even though the core changed FFI calls from "__pkg_ccall_GC" to "__pkg_ccall". The performance stayed the same; the overhead is elsewhere.
doing a ltrace, i think the reason is pretty obvious, there's a lot of GMP calls: __gmpz_init(0x7f5043171730, 1, 0x7f5043171750, 0x7f5043171740, 0x7f50431d2508) = 0x7f50431d2530 __gmpz_mul(0x7f5043171730, 0x7f5043171750, 0x7f5043171740, 0x7f50431d2538, 0x7f50431d2508) = 1 __gmpz_init(0x7f5043171728, 1, 0x7f5043171748, 0x7f5043171738, 0x7f50431d2538) = 0x7f50431d2568 __gmpz_mul(0x7f5043171728, 0x7f5043171748, 0x7f5043171738, 0x7f50431d2570, 0x7f50431d2538) = 1 __gmpn_gcd_1(0x7f50431d2580, 1, 1, 1, 1) = 1 <repeated thousand of time> before each call cairo calls. just to make sure, the C version doesn't exhibit this behavior. -- Vincent

Oh. This is pretty crazy, I wonder what they're doing with GMP so much...
I modified the Haskell program to use cairo directly, even with safe calls,
and it now takes the same time as the C program.
{-# LANGUAGE ForeignFunctionInterface #-}
module Main where
import qualified Graphics.Rendering.Cairo as C
import Control.Monad
import Foreign
import Foreign.C.Types
import Foreign.C.String
foreign import ccall "cairo.h cairo_image_surface_create"
cairo_image_surface_create :: CInt -> CInt -> CInt -> IO (Ptr ())
foreign import ccall "cairo.h cairo_create" cairo_create :: Ptr () -> IO
(Ptr ())
foreign import ccall "cairo.h cairo_set_source_rgb" cairo_set_source_rgb ::
Ptr () -> CDouble -> CDouble -> CDouble -> IO ()
foreign import ccall "cairo.h cairo_rectangle" cairo_rectangle :: Ptr () ->
CDouble -> CDouble -> CDouble -> CDouble -> IO ()
foreign import ccall "cairo.h cairo_fill" cairo_fill :: Ptr () -> IO ()
foreign import ccall "cairo.h cairo_surface_write_to_png"
cairo_surface_write_to_png :: Ptr () -> CString -> IO ()
main = do
s <- cairo_image_surface_create 0 1024 768
cr <- cairo_create s
cairo_set_source_rgb cr 0 255 0
forM_ [0,2..1024] $ \x -> do
forM_ [0,2..768] $ \y -> do
cairo_rectangle cr x y 1 1
cairo_fill cr
pic <- newCString "picture.png"
cairo_surface_write_to_png s pic
On Wed, Nov 2, 2011 at 1:58 PM, Vincent Hanquez
On 11/02/2011 09:51 AM, Eugene Kirpichov wrote:
Hi Claude,
I suspected that the issue could be about unsafe foreign imports - all imports in the cairo bindings are "safe". I compiled myself a version of cairo bindings with the "rectangle" and "fill" functions marked as unsafe.
Unfortunately that didn't help the case at all, even though the core changed FFI calls from "__pkg_ccall_GC" to "__pkg_ccall". The performance stayed the same; the overhead is elsewhere.
doing a ltrace, i think the reason is pretty obvious, there's a lot of GMP calls:
__gmpz_init(0x7f5043171730, 1, 0x7f5043171750, 0x7f5043171740, 0x7f50431d2508) = 0x7f50431d2530 __gmpz_mul(0x7f5043171730, 0x7f5043171750, 0x7f5043171740, 0x7f50431d2538, 0x7f50431d2508) = 1 __gmpz_init(0x7f5043171728, 1, 0x7f5043171748, 0x7f5043171738, 0x7f50431d2538) = 0x7f50431d2568 __gmpz_mul(0x7f5043171728, 0x7f5043171748, 0x7f5043171738, 0x7f50431d2570, 0x7f50431d2538) = 1 __gmpn_gcd_1(0x7f50431d2580, 1, 1, 1, 1) = 1 <repeated thousand of time>
before each call cairo calls.
just to make sure, the C version doesn't exhibit this behavior.
-- Vincent
______________________________**_________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/**mailman/listinfo/haskell-cafehttp://www.haskell.org/mailman/listinfo/haskell-cafe
-- Eugene Kirpichov Principal Engineer, Mirantis Inc. http://www.mirantis.com/ Editor, http://fprog.ru/

+gtk2hs-users
On Wed, Nov 2, 2011 at 2:10 PM, Eugene Kirpichov
Oh. This is pretty crazy, I wonder what they're doing with GMP so much...
I modified the Haskell program to use cairo directly, even with safe calls, and it now takes the same time as the C program.
{-# LANGUAGE ForeignFunctionInterface #-} module Main where
import qualified Graphics.Rendering.Cairo as C import Control.Monad import Foreign import Foreign.C.Types import Foreign.C.String
foreign import ccall "cairo.h cairo_image_surface_create" cairo_image_surface_create :: CInt -> CInt -> CInt -> IO (Ptr ()) foreign import ccall "cairo.h cairo_create" cairo_create :: Ptr () -> IO (Ptr ()) foreign import ccall "cairo.h cairo_set_source_rgb" cairo_set_source_rgb :: Ptr () -> CDouble -> CDouble -> CDouble -> IO () foreign import ccall "cairo.h cairo_rectangle" cairo_rectangle :: Ptr () -> CDouble -> CDouble -> CDouble -> CDouble -> IO () foreign import ccall "cairo.h cairo_fill" cairo_fill :: Ptr () -> IO () foreign import ccall "cairo.h cairo_surface_write_to_png" cairo_surface_write_to_png :: Ptr () -> CString -> IO ()
main = do s <- cairo_image_surface_create 0 1024 768 cr <- cairo_create s cairo_set_source_rgb cr 0 255 0 forM_ [0,2..1024] $ \x -> do forM_ [0,2..768] $ \y -> do cairo_rectangle cr x y 1 1 cairo_fill cr pic <- newCString "picture.png" cairo_surface_write_to_png s pic
On Wed, Nov 2, 2011 at 1:58 PM, Vincent Hanquez
wrote: On 11/02/2011 09:51 AM, Eugene Kirpichov wrote:
Hi Claude,
I suspected that the issue could be about unsafe foreign imports - all imports in the cairo bindings are "safe". I compiled myself a version of cairo bindings with the "rectangle" and "fill" functions marked as unsafe.
Unfortunately that didn't help the case at all, even though the core changed FFI calls from "__pkg_ccall_GC" to "__pkg_ccall". The performance stayed the same; the overhead is elsewhere.
doing a ltrace, i think the reason is pretty obvious, there's a lot of GMP calls:
__gmpz_init(0x7f5043171730, 1, 0x7f5043171750, 0x7f5043171740, 0x7f50431d2508) = 0x7f50431d2530 __gmpz_mul(0x7f5043171730, 0x7f5043171750, 0x7f5043171740, 0x7f50431d2538, 0x7f50431d2508) = 1 __gmpz_init(0x7f5043171728, 1, 0x7f5043171748, 0x7f5043171738, 0x7f50431d2538) = 0x7f50431d2568 __gmpz_mul(0x7f5043171728, 0x7f5043171748, 0x7f5043171738, 0x7f50431d2570, 0x7f50431d2538) = 1 __gmpn_gcd_1(0x7f50431d2580, 1, 1, 1, 1) = 1 <repeated thousand of time>
before each call cairo calls.
just to make sure, the C version doesn't exhibit this behavior.
-- Vincent
______________________________**_________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/**mailman/listinfo/haskell-cafehttp://www.haskell.org/mailman/listinfo/haskell-cafe
-- Eugene Kirpichov Principal Engineer, Mirantis Inc. http://www.mirantis.com/ Editor, http://fprog.ru/
-- Eugene Kirpichov Principal Engineer, Mirantis Inc. http://www.mirantis.com/ Editor, http://fprog.ru/

On 11/02/2011 10:10 AM, Eugene Kirpichov wrote:
Oh. This is pretty crazy, I wonder what they're doing with GMP so much...
I modified the Haskell program to use cairo directly, even with safe calls, and it now takes the same time as the C program.
yep, i ended up doing the exact same thing for testing, foreign import ccall "cairo_rectangle" my_rectangle :: CI.Cairo -> CDouble -> CDouble -> CDouble -> CDouble -> IO () and just replacing the rectangle call make almost all the difference for me (almost as fast as C) -- Vincent

Any idea how to debug why all the GMP calls?
I'm looking at even the auto-generated source for cairo bindings, but I
don't see anything at all that could lead to *thousands* of them.
On Wed, Nov 2, 2011 at 2:14 PM, Vincent Hanquez
On 11/02/2011 10:10 AM, Eugene Kirpichov wrote:
Oh. This is pretty crazy, I wonder what they're doing with GMP so much...
I modified the Haskell program to use cairo directly, even with safe calls, and it now takes the same time as the C program.
yep, i ended up doing the exact same thing for testing,
foreign import ccall "cairo_rectangle" my_rectangle :: CI.Cairo -> CDouble -> CDouble -> CDouble -> CDouble -> IO ()
and just replacing the rectangle call make almost all the difference for me (almost as fast as C)
-- Vincent
-- Eugene Kirpichov Principal Engineer, Mirantis Inc. http://www.mirantis.com/ Editor, http://fprog.ru/

On Wed, Nov 2, 2011 at 8:15 AM, Eugene Kirpichov
Any idea how to debug why all the GMP calls? I'm looking at even the auto-generated source for cairo bindings, but I don't see anything at all that could lead to *thousands* of them.
Found them. Look at the Types module and you'll see cFloatConv :: (RealFloat a, RealFloat b) => a -> b cFloatConv = realToFrac This function (or its cousins peekFloatConv, withFloatConv...) are used *everywhere*. Looking at this module with ghc-core we see that GHC compiled a generic version of cFloatConv: Graphics.Rendering.Cairo.Types.$wcFloatConv :: forall a_a3TN b_a3TO. (RealFloat a_a3TN, RealFrac b_a3TO) => a_a3TN -> b_a3TO [GblId, Arity=3, Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=3, Value=True, ConLike=True, Cheap=True, Expandable=True, Guidance=IF_ARGS [3 3 0] 12 0}] Graphics.Rendering.Cairo.Types.$wcFloatConv = \ (@ a_a3TN) (@ b_a3TO) (w_s5zg :: RealFloat a_a3TN) (ww_s5zj :: RealFrac b_a3TO) (w1_s5zA :: a_a3TN) -> fromRational @ b_a3TO ($p2RealFrac @ b_a3TO ww_s5zj) (toRational @ a_a3TN ($p1RealFrac @ a_a3TN ($p1RealFloat @ a_a3TN w_s5zg)) w1_s5zA) Note that this is basically cFloatConv = fromRational . toRational. *However*, GHC also compiled a Double -> Double specialization: Graphics.Rendering.Cairo.Types.cFloatConv1 :: Double -> Double [GblId, Arity=1, Unf=Unf{Src=InlineStable, TopLvl=True, Arity=1, Value=True, ConLike=True, Cheap=True, Expandable=True, Guidance=ALWAYS_IF(unsat_ok=True,boring_ok=False) Tmpl= \ (eta_B1 [Occ=Once!] :: Double) -> case eta_B1 of _ { D# ww_a5v3 [Occ=Once] -> case $w$ctoRational ww_a5v3 of _ { (# ww2_a5v8 [Occ=Once], ww3_a5v9 [Occ=Once] #) -> $wfromRat ww2_a5v8 ww3_a5v9 } }}] Graphics.Rendering.Cairo.Types.cFloatConv1 = \ (eta_B1 :: Double) -> case eta_B1 of _ { D# ww_a5v3 -> case $w$ctoRational ww_a5v3 of _ { (# ww2_a5v8, ww3_a5v9 #) -> $wfromRat ww2_a5v8 ww3_a5v9 } } ...which is also equivalent to fromRational . toRational however with the type class inlined! Oh, god... Cheers, -- Felipe.

+gtk2hs-devel
On Wed, Nov 2, 2011 at 8:15 AM, Eugene Kirpichov
Any idea how to debug why all the GMP calls? I'm looking at even the auto-generated source for cairo bindings, but I don't see anything at all that could lead to *thousands* of them.
Found them. Look at the Types module and you'll see cFloatConv :: (RealFloat a, RealFloat b) => a -> b cFloatConv = realToFrac This function (or its cousins peekFloatConv, withFloatConv...) are used *everywhere*. Looking at this module with ghc-core we see that GHC compiled a generic version of cFloatConv: Graphics.Rendering.Cairo.Types.$wcFloatConv :: forall a_a3TN b_a3TO. (RealFloat a_a3TN, RealFrac b_a3TO) => a_a3TN -> b_a3TO [GblId, Arity=3, Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=3, Value=True, ConLike=True, Cheap=True, Expandable=True, Guidance=IF_ARGS [3 3 0] 12 0}] Graphics.Rendering.Cairo.Types.$wcFloatConv = \ (@ a_a3TN) (@ b_a3TO) (w_s5zg :: RealFloat a_a3TN) (ww_s5zj :: RealFrac b_a3TO) (w1_s5zA :: a_a3TN) -> fromRational @ b_a3TO ($p2RealFrac @ b_a3TO ww_s5zj) (toRational @ a_a3TN ($p1RealFrac @ a_a3TN ($p1RealFloat @ a_a3TN w_s5zg)) w1_s5zA) Note that this is basically cFloatConv = fromRational . toRational. *However*, GHC also compiled a Double -> Double specialization: Graphics.Rendering.Cairo.Types.cFloatConv1 :: Double -> Double [GblId, Arity=1, Unf=Unf{Src=InlineStable, TopLvl=True, Arity=1, Value=True, ConLike=True, Cheap=True, Expandable=True, Guidance=ALWAYS_IF(unsat_ok=True,boring_ok=False) Tmpl= \ (eta_B1 [Occ=Once!] :: Double) -> case eta_B1 of _ { D# ww_a5v3 [Occ=Once] -> case $w$ctoRational ww_a5v3 of _ { (# ww2_a5v8 [Occ=Once], ww3_a5v9 [Occ=Once] #) -> $wfromRat ww2_a5v8 ww3_a5v9 } }}] Graphics.Rendering.Cairo.Types.cFloatConv1 = \ (eta_B1 :: Double) -> case eta_B1 of _ { D# ww_a5v3 -> case $w$ctoRational ww_a5v3 of _ { (# ww2_a5v8, ww3_a5v9 #) -> $wfromRat ww2_a5v8 ww3_a5v9 } } ...which is also equivalent to fromRational . toRational however with the type class inlined! Oh, god... Cheers, -- Felipe.

Yay!!!
I made a small change in Types.chs and got my original cairo-binding-based
program to be just as blazing fast. The only problem I have with this is
that I used multiparameter type classes.
Dear gtk2hs team! Is it possible to incorporate my changes? I'm pretty sure
people will be happy by an order-of-magnitude speedup. Probably the stuff
could be wrapped in #define's for those who aren't using GHC and can't use
multiparameter type classes?
I am pretty sure I could have done the same with rewrite rules, but I tried
for a while and to no avail.
FAILED SOLUTION: rewrite rules
cFloatConv :: (RealFloat a, RealFloat b) => a -> b
cFloatConv = realToFrac
{-# NOINLINE cFloatConv #-}
{-# RULES "cFloatConv/float2Double" cFloatConv = float2Double #-}
{-# RULES "cFloatConv/double2Float" cFloatConv = double2Float #-}
{-# RULES "cFloatConv/self" cFloatConv = id #-}
For some reason, the rules don't fire. Anyone got an idea why?
SUCCEEDED SOLUTION: multiparameter type classes
I rewrote cFloatConv like this:
import GHC.Float
class (RealFloat a, RealFloat b) => CFloatConv a b where
cFloatConv :: a -> b
cFloatConv = realToFrac
instance CFloatConv Double Double where cFloatConv = id
instance CFloatConv Double CDouble
instance CFloatConv CDouble Double
instance CFloatConv Float Float where cFloatConv = id
instance CFloatConv Float Double where cFloatConv = float2Double
instance CFloatConv Double Float where cFloatConv = double2Float
and replaced a couple of constraints in functions below by usage of
CFloatConv.
On Wed, Nov 2, 2011 at 2:25 PM, Felipe Almeida Lessa wrote: +gtk2hs-devel On Wed, Nov 2, 2011 at 8:15 AM, Eugene Kirpichov Any idea how to debug why all the GMP calls?
I'm looking at even the auto-generated source for cairo bindings, but I
don't see anything at all that could lead to *thousands* of them. Found them. Look at the Types module and you'll see cFloatConv :: (RealFloat a, RealFloat b) => a -> b
cFloatConv = realToFrac This function (or its cousins peekFloatConv, withFloatConv...) are
used *everywhere*. Looking at this module with ghc-core we see that GHC compiled a
generic version of cFloatConv: Graphics.Rendering.Cairo.Types.$wcFloatConv
:: forall a_a3TN b_a3TO.
(RealFloat a_a3TN, RealFrac b_a3TO) =>
a_a3TN -> b_a3TO
[GblId,
Arity=3, Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=3, Value=True,
ConLike=True, Cheap=True, Expandable=True,
Guidance=IF_ARGS [3 3 0] 12 0}]
Graphics.Rendering.Cairo.Types.$wcFloatConv =
\ (@ a_a3TN)
(@ b_a3TO)
(w_s5zg :: RealFloat a_a3TN)
(ww_s5zj :: RealFrac b_a3TO)
(w1_s5zA :: a_a3TN) ->
fromRational
@ b_a3TO
($p2RealFrac @ b_a3TO ww_s5zj)
(toRational
@ a_a3TN
($p1RealFrac
@ a_a3TN ($p1RealFloat @ a_a3TN w_s5zg))
w1_s5zA) Note that this is basically cFloatConv = fromRational . toRational. *However*, GHC also compiled a Double -> Double specialization: Graphics.Rendering.Cairo.Types.cFloatConv1
:: Double -> Double
[GblId,
Arity=1, Unf=Unf{Src=InlineStable, TopLvl=True, Arity=1, Value=True,
ConLike=True, Cheap=True, Expandable=True,
Guidance=ALWAYS_IF(unsat_ok=True,boring_ok=False)
Tmpl= \ (eta_B1 [Occ=Once!] :: Double) ->
case eta_B1 of _ { D# ww_a5v3 [Occ=Once] ->
case $w$ctoRational ww_a5v3
of _ { (# ww2_a5v8 [Occ=Once], ww3_a5v9 [Occ=Once] #) ->
$wfromRat ww2_a5v8 ww3_a5v9
}
}}]
Graphics.Rendering.Cairo.Types.cFloatConv1 =
\ (eta_B1 :: Double) ->
case eta_B1 of _ { D# ww_a5v3 ->
case $w$ctoRational ww_a5v3
of _ { (# ww2_a5v8, ww3_a5v9 #) ->
$wfromRat ww2_a5v8 ww3_a5v9
}
} ...which is also equivalent to fromRational . toRational however with
the type class inlined! Oh, god... Cheers, --
Felipe. _______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe --
Eugene Kirpichov
Principal Engineer, Mirantis Inc. http://www.mirantis.com/
Editor, http://fprog.ru/

Sorry for re-sending, my previous attempt got ignored by gtk2hs-devel
mailing list as I wasn't subscribed. Now I am.
On Wed, Nov 2, 2011 at 3:14 PM, Eugene Kirpichov
Yay!!!
I made a small change in Types.chs and got my original cairo-binding-based program to be just as blazing fast. The only problem I have with this is that I used multiparameter type classes.
Dear gtk2hs team! Is it possible to incorporate my changes? I'm pretty sure people will be happy by an order-of-magnitude speedup. Probably the stuff could be wrapped in #define's for those who aren't using GHC and can't use multiparameter type classes?
I am pretty sure I could have done the same with rewrite rules, but I tried for a while and to no avail.
FAILED SOLUTION: rewrite rules cFloatConv :: (RealFloat a, RealFloat b) => a -> b cFloatConv = realToFrac {-# NOINLINE cFloatConv #-} {-# RULES "cFloatConv/float2Double" cFloatConv = float2Double #-} {-# RULES "cFloatConv/double2Float" cFloatConv = double2Float #-} {-# RULES "cFloatConv/self" cFloatConv = id #-}
For some reason, the rules don't fire. Anyone got an idea why?
SUCCEEDED SOLUTION: multiparameter type classes
I rewrote cFloatConv like this:
import GHC.Float class (RealFloat a, RealFloat b) => CFloatConv a b where cFloatConv :: a -> b cFloatConv = realToFrac
instance CFloatConv Double Double where cFloatConv = id instance CFloatConv Double CDouble instance CFloatConv CDouble Double instance CFloatConv Float Float where cFloatConv = id instance CFloatConv Float Double where cFloatConv = float2Double instance CFloatConv Double Float where cFloatConv = double2Float
and replaced a couple of constraints in functions below by usage of CFloatConv.
On Wed, Nov 2, 2011 at 2:25 PM, Felipe Almeida Lessa < felipe.lessa@gmail.com> wrote:
+gtk2hs-devel
On Wed, Nov 2, 2011 at 8:15 AM, Eugene Kirpichov
wrote: Any idea how to debug why all the GMP calls? I'm looking at even the auto-generated source for cairo bindings, but I don't see anything at all that could lead to *thousands* of them.
Found them. Look at the Types module and you'll see
cFloatConv :: (RealFloat a, RealFloat b) => a -> b cFloatConv = realToFrac
This function (or its cousins peekFloatConv, withFloatConv...) are used *everywhere*.
Looking at this module with ghc-core we see that GHC compiled a generic version of cFloatConv:
Graphics.Rendering.Cairo.Types.$wcFloatConv :: forall a_a3TN b_a3TO. (RealFloat a_a3TN, RealFrac b_a3TO) => a_a3TN -> b_a3TO [GblId, Arity=3,
Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=3, Value=True, ConLike=True, Cheap=True, Expandable=True, Guidance=IF_ARGS [3 3 0] 12 0}] Graphics.Rendering.Cairo.Types.$wcFloatConv = \ (@ a_a3TN) (@ b_a3TO) (w_s5zg :: RealFloat a_a3TN) (ww_s5zj :: RealFrac b_a3TO) (w1_s5zA :: a_a3TN) -> fromRational @ b_a3TO ($p2RealFrac @ b_a3TO ww_s5zj) (toRational @ a_a3TN ($p1RealFrac @ a_a3TN ($p1RealFloat @ a_a3TN w_s5zg)) w1_s5zA)
Note that this is basically cFloatConv = fromRational . toRational.
*However*, GHC also compiled a Double -> Double specialization:
Graphics.Rendering.Cairo.Types.cFloatConv1 :: Double -> Double [GblId, Arity=1,
Unf=Unf{Src=InlineStable, TopLvl=True, Arity=1, Value=True, ConLike=True, Cheap=True, Expandable=True, Guidance=ALWAYS_IF(unsat_ok=True,boring_ok=False) Tmpl= \ (eta_B1 [Occ=Once!] :: Double) -> case eta_B1 of _ { D# ww_a5v3 [Occ=Once] -> case $w$ctoRational ww_a5v3 of _ { (# ww2_a5v8 [Occ=Once], ww3_a5v9 [Occ=Once] #) -> $wfromRat ww2_a5v8 ww3_a5v9 } }}] Graphics.Rendering.Cairo.Types.cFloatConv1 = \ (eta_B1 :: Double) -> case eta_B1 of _ { D# ww_a5v3 -> case $w$ctoRational ww_a5v3 of _ { (# ww2_a5v8, ww3_a5v9 #) -> $wfromRat ww2_a5v8 ww3_a5v9 } }
...which is also equivalent to fromRational . toRational however with the type class inlined! Oh, god...
Cheers,
-- Felipe.
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
-- Eugene Kirpichov Principal Engineer, Mirantis Inc. http://www.mirantis.com/ Editor, http://fprog.ru/
-- Eugene Kirpichov Principal Engineer, Mirantis Inc. http://www.mirantis.com/ Editor, http://fprog.ru/

On Wed, Nov 2, 2011 at 9:14 AM, Eugene Kirpichov
Yay!!! I made a small change in Types.chs and got my original cairo-binding-based program to be just as blazing fast. The only problem I have with this is that I used multiparameter type classes.
Nice! Looking forward to it being included in cairo codebase and released on Hackage. =D Cheers, -- Felipe.

Thanks a lot for this. I've been developing a Graphic Adventure IDE in
haskell that I'm about
to release. It uses Cairo to draw game-state diagrams and this will
sure solve my speed issues.
2011/11/2 Felipe Almeida Lessa
On Wed, Nov 2, 2011 at 9:14 AM, Eugene Kirpichov
wrote: Yay!!! I made a small change in Types.chs and got my original cairo-binding-based program to be just as blazing fast. The only problem I have with this is that I used multiparameter type classes.
Nice! Looking forward to it being included in cairo codebase and released on Hackage. =D
Cheers,
-- Felipe.
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Hi Eugene, did you try using the SPECIALIZE pragma? It is part of the Haskell 98 and Haskell 2010 specifications. On 02.11.2011, at 12:14, Eugene Kirpichov wrote:
Yay!!!
I made a small change in Types.chs and got my original cairo-binding-based program to be just as blazing fast. The only problem I have with this is that I used multiparameter type classes.
Dear gtk2hs team! Is it possible to incorporate my changes? I'm pretty sure people will be happy by an order-of-magnitude speedup. Probably the stuff could be wrapped in #define's for those who aren't using GHC and can't use multiparameter type classes?
I am pretty sure I could have done the same with rewrite rules, but I tried for a while and to no avail.
FAILED SOLUTION: rewrite rules cFloatConv :: (RealFloat a, RealFloat b) => a -> b cFloatConv = realToFrac {-# NOINLINE cFloatConv #-} {-# RULES "cFloatConv/float2Double" cFloatConv = float2Double #-} {-# RULES "cFloatConv/double2Float" cFloatConv = double2Float #-} {-# RULES "cFloatConv/self" cFloatConv = id #-}
See [1] in GHC User Guide. cFloatConv :: (RealFloat a, RealFloat b) => a -> b cFloatConv = realToFrac -- or try fromRational . toRational {-# SPECIALIZE cFloatConv :: Float -> Double #-} {-# SPECIALIZE cFloatConv :: Double -> Float #-} I did not try to compile or even benchmark this code. But I think it might help in your case. Cheers, Jean [1]: http://www.haskell.org/ghc/docs/latest/html/users_guide/pragmas.html#special...

Hi,
No, I didn't, as I read in the GHC docs that it is deprecated in favor of
the RULES pragma (I wanted to replace specifically with floatToDouble and
doubleToFloat).
On Wed, Nov 2, 2011 at 5:24 PM, Jean-Marie Gaillourdet
Hi Eugene,
did you try using the SPECIALIZE pragma? It is part of the Haskell 98 and Haskell 2010 specifications.
On 02.11.2011, at 12:14, Eugene Kirpichov wrote:
Yay!!!
I made a small change in Types.chs and got my original cairo-binding-based program to be just as blazing fast. The only problem I have with this is that I used multiparameter type classes.
Dear gtk2hs team! Is it possible to incorporate my changes? I'm pretty sure people will be happy by an order-of-magnitude speedup. Probably the stuff could be wrapped in #define's for those who aren't using GHC and can't use multiparameter type classes?
I am pretty sure I could have done the same with rewrite rules, but I tried for a while and to no avail.
FAILED SOLUTION: rewrite rules cFloatConv :: (RealFloat a, RealFloat b) => a -> b cFloatConv = realToFrac {-# NOINLINE cFloatConv #-} {-# RULES "cFloatConv/float2Double" cFloatConv = float2Double #-} {-# RULES "cFloatConv/double2Float" cFloatConv = double2Float #-} {-# RULES "cFloatConv/self" cFloatConv = id #-}
See [1] in GHC User Guide.
cFloatConv :: (RealFloat a, RealFloat b) => a -> b cFloatConv = realToFrac -- or try fromRational . toRational
{-# SPECIALIZE cFloatConv :: Float -> Double #-} {-# SPECIALIZE cFloatConv :: Double -> Float #-}
I did not try to compile or even benchmark this code. But I think it might help in your case.
Cheers, Jean
[1]: http://www.haskell.org/ghc/docs/latest/html/users_guide/pragmas.html#special...
-- Eugene Kirpichov Principal Engineer, Mirantis Inc. http://www.mirantis.com/ Editor, http://fprog.ru/

On Wed, Nov 2, 2011 at 11:24 AM, Jean-Marie Gaillourdet
Hi Eugene,
did you try using the SPECIALIZE pragma? It is part of the Haskell 98 and Haskell 2010 specifications.
I don't think it's going to make any difference, as the core already have an specialized poor version. See my first e-mail. -- Felipe.

Heh.
Guess what!
A simple {-# INLINE cFloatConv #-} helped to the same extent!
Axel, I think this change should be pretty easy to incorporate, and it
probably makes sense to inline all other functions in Types.chs too.
Would you like me to send the trivial darcs patch or the gtk2hs team
will take care of this?
On Wed, Nov 2, 2011 at 7:29 PM, Felipe Almeida Lessa
On Wed, Nov 2, 2011 at 11:24 AM, Jean-Marie Gaillourdet
wrote: Hi Eugene,
did you try using the SPECIALIZE pragma? It is part of the Haskell 98 and Haskell 2010 specifications.
I don't think it's going to make any difference, as the core already have an specialized poor version. See my first e-mail.
-- Felipe.
-- Eugene Kirpichov Principal Engineer, Mirantis Inc. http://www.mirantis.com/ Editor, http://fprog.ru/

On 11/2/11 9:24 AM, Jean-Marie Gaillourdet wrote:
Hi Eugene,
did you try using the SPECIALIZE pragma? It is part of the Haskell 98 and Haskell 2010 specifications.
The problem with SPECIALIZE is that you still have to give a parametric definition for the function, whereas the whole point of specializing realToFrac is in order to use non-parametric definitions like the built-in machine ops. -- Live well, ~wren

On 11/2/11 7:14 AM, Eugene Kirpichov wrote:
I rewrote cFloatConv like this:
import GHC.Float class (RealFloat a, RealFloat b) => CFloatConv a b where cFloatConv :: a -> b cFloatConv = realToFrac
instance CFloatConv Double Double where cFloatConv = id instance CFloatConv Double CDouble instance CFloatConv CDouble Double instance CFloatConv Float Float where cFloatConv = id instance CFloatConv Float Double where cFloatConv = float2Double instance CFloatConv Double Float where cFloatConv = double2Float
If you're going the MPTC route, I suggest you use logfloat:Data.Number.RealToFrac[1]. I don't have the CDouble and CFloat instances, but I could add them. The instances themselves are only moderately more clever than yours ---namely using CPP for portability to non-GHC compilers--- but I think it's good for people to rally around one implementation of the solution instead of having a bunch of copies of the same thing, each poorly maintained because of the distributedness. [1] http://hackage.haskell.org/packages/archive/logfloat/0.12.1/doc/html/Data-Nu... -- Live well, ~wren

Thanks! I'll definitely consider your library in the future, but for now,
as we can see, there's no necessity in rewriting cFloatConv at all - {-#
INLINE #-} suffices :)
On Thu, Nov 3, 2011 at 3:30 AM, wren ng thornton
On 11/2/11 7:14 AM, Eugene Kirpichov wrote:
I rewrote cFloatConv like this:
import GHC.Float class (RealFloat a, RealFloat b) => CFloatConv a b where cFloatConv :: a -> b cFloatConv = realToFrac
instance CFloatConv Double Double where cFloatConv = id instance CFloatConv Double CDouble instance CFloatConv CDouble Double instance CFloatConv Float Float where cFloatConv = id instance CFloatConv Float Double where cFloatConv = float2Double instance CFloatConv Double Float where cFloatConv = double2Float
If you're going the MPTC route, I suggest you use logfloat:Data.Number.**RealToFrac[1]. I don't have the CDouble and CFloat instances, but I could add them. The instances themselves are only moderately more clever than yours ---namely using CPP for portability to non-GHC compilers--- but I think it's good for people to rally around one implementation of the solution instead of having a bunch of copies of the same thing, each poorly maintained because of the distributedness.
[1] http://hackage.haskell.org/**packages/archive/logfloat/0.** 12.1/doc/html/Data-Number-**RealToFrac.htmlhttp://hackage.haskell.org/packages/archive/logfloat/0.12.1/doc/html/Data-Nu...
-- Live well, ~wren
______________________________**_________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/**mailman/listinfo/haskell-cafehttp://www.haskell.org/mailman/listinfo/haskell-cafe
-- Eugene Kirpichov Principal Engineer, Mirantis Inc. http://www.mirantis.com/ Editor, http://fprog.ru/
participants (8)
-
Claude Heiland-Allen
-
Daniel Fischer
-
Eugene Kirpichov
-
Felipe Almeida Lessa
-
Ivan Perez
-
Jean-Marie Gaillourdet
-
Vincent Hanquez
-
wren ng thornton