
Hey, thanks, Daniel. I hadn't come across rewrite rules yet. They definitely look like something worth learning, though I'm not sure I'm prepared to start making custom versions of OpenGL.Raw... It looks like I managed to put that battle off for another day, however. I did look at how realToFrac is implemented and (as you mention) it does the fromRational . toRational transform pair suggested in a number of sources, including Real World Haskell. Looking at what toRational is doing, creating a ratio of integers out of a float it seems like a crazy amount of effort to go through just to convert floating point numbers. Looking at the RealFloat class rather that Real and Fractional, it seems like this is a much more efficient way to go: floatToFloat :: (RealFloat a, RealFloat b) => a -> b floatToFloat = (uncurry encodeFloat) . decodeFloat I substituted this in for realToFrac and I'm back to close to my original performance. Playing with a few test cases in ghci, it looks numerically equivalent to realToFrac. This begs the question though-- am I doing something dangerous here? Why isn't this the standard approach? If I understand what's happening, decodeFloat and encodeFloat are breaking the floating point numbers up into their constituent parts-- presumably by bit masking the raw binary. That would explain the performance improvement. I suppose there is some implementation dependence here, but as long as the encode and decode are implemented as a matched set then I think I'm good. Cheers-- Greg On Sep 15, 2010, at 1:56 AM, Daniel Fischer wrote:
On Wednesday 15 September 2010 02:51:01, Greg wrote:
First, to anyone who recognizes me by name, thanks to the help I've been getting here I've managed to put together a fairly complex set of code files that all compile together nicely, run, and do exactly what I wanted them to do. Success!
The trouble is that my implementation is dog slow
Fortunately, this isn't the first time I've been in over my head and I started by putting up some simpler scaffolding- which runs much more quickly. Working backwards, it looks like the real bottle neck is in the data types I've created, the type variables I've introduced, and the conversion code I needed to insert to make it all happy.
I'm not sure it helps, but I've attached a trimmed down version of the relevant code. What should be happening is my pair is being converted to the canonical form for Coord2D which is Cartesian2D and then converted again to Vertex2. There shouldn't be any change made to the values, they're only being handed from one container to another in this case (Polar coordinates would require computation, but I've stripped that out for the time being). However, those handoffs require calls to realToFrac to make the type system happy, and that has to be what is eating up all my CPU.
Not all, but probably a big chunk of it. The problem is that the default implementation of realToFrac is
realToFrac = fromRational . toRational
a) with that implementation, realToFrac :: Double -> Double is not the identity (doesn't respect NaNs) b) it's slow, there are no special operations to convert Double, Float etc. from/to Rational.
For a lot of types, GHC provides rewrite rules (you need to compile with optimisations to have them fire) which give faster versions (with somewhat different behaviour, e.g. realToFrac :: Double -> Double is rewritten to id, realToFrac between Float and Double uses primitive widening/narrowing ops, for several newtype wrappers around Float/Double there are rules too).
I think there are probably 4 calls to realToFrac. If I walk through the code, the result, given the pair p, should be: Vertex2 (realToFrac (realToFrac (fst p))) (realToFrac (realToFrac (snd p)))
I'd like to maintain type independence if possible, but I expect most uses of this code to feed Doubles in for processing and probably feed GLclampf (Floats, I believe)
newtype wrapper around CFloat, which is a newtype wrapper around Float
Unfortunately, there are no rewrite rules in the module where it is defined, apparently neither any other module that has access to the constructor. And the constructor is not accessible from any of the exposed modules, so as far as I know, you can't provide your own rewrite rules.
to the OpenGL layer. If there's a way to do so, I wouldn't mind optimizing for that particular set of types. I've tried GLdouble, and it doesn't really improve things with the current code.
Is there a way to short circuit those realToFrac calls if we know the input and output are the same type? Is there a way merge the nested calls?
You can try rewrite rules
{-# RULES "realToFrac2/realToFrac" realToFrac . realToFrac = realToFrac "realToFrac/id" realToFrac = id #-}
but I'm afraid the second won't work at all, then you'd have to specify all interesting cases yourself (there are rules for the cases Double -> Double and Float -> Float in GHC.Float, rules for converting from/to CFloat and CDouble in Foreign.C.Types, so those should be fine too) "realToFrac/GLclampf->GLclampf" realToFrac = id :: GLclampf -> GLclampf and what ese you need. Whether the first one will help (or even work), I don't know either, you have to try.
Any other thoughts on what I can do here? The slow down between the two implementations is at least 20x, which seems like a steep penalty to pay.
In case of emergency, put the needed rewrite rules into the source of OpenGLRaw yourself.
And while I'm at it, is turning on FlexibleInstances the only way to create an instance for (a,a)?
Yes. Haskell98 doesn't allow such instance declarations, so you need the extension.