
On Wednesday 15 September 2010 02:51:01, Greg wrote:
First, to anyone who recognizes me by name, thanks to the help I've been getting here I've managed to put together a fairly complex set of code files that all compile together nicely, run, and do exactly what I wanted them to do. Success!
The trouble is that my implementation is dog slow
Fortunately, this isn't the first time I've been in over my head and I started by putting up some simpler scaffolding- which runs much more quickly. Working backwards, it looks like the real bottle neck is in the data types I've created, the type variables I've introduced, and the conversion code I needed to insert to make it all happy.
I'm not sure it helps, but I've attached a trimmed down version of the relevant code. What should be happening is my pair is being converted to the canonical form for Coord2D which is Cartesian2D and then converted again to Vertex2. There shouldn't be any change made to the values, they're only being handed from one container to another in this case (Polar coordinates would require computation, but I've stripped that out for the time being). However, those handoffs require calls to realToFrac to make the type system happy, and that has to be what is eating up all my CPU.
Not all, but probably a big chunk of it. The problem is that the default implementation of realToFrac is realToFrac = fromRational . toRational a) with that implementation, realToFrac :: Double -> Double is not the identity (doesn't respect NaNs) b) it's slow, there are no special operations to convert Double, Float etc. from/to Rational. For a lot of types, GHC provides rewrite rules (you need to compile with optimisations to have them fire) which give faster versions (with somewhat different behaviour, e.g. realToFrac :: Double -> Double is rewritten to id, realToFrac between Float and Double uses primitive widening/narrowing ops, for several newtype wrappers around Float/Double there are rules too).
I think there are probably 4 calls to realToFrac. If I walk through the code, the result, given the pair p, should be: Vertex2 (realToFrac (realToFrac (fst p))) (realToFrac (realToFrac (snd p)))
I'd like to maintain type independence if possible, but I expect most uses of this code to feed Doubles in for processing and probably feed GLclampf (Floats, I believe)
newtype wrapper around CFloat, which is a newtype wrapper around Float Unfortunately, there are no rewrite rules in the module where it is defined, apparently neither any other module that has access to the constructor. And the constructor is not accessible from any of the exposed modules, so as far as I know, you can't provide your own rewrite rules.
to the OpenGL layer. If there's a way to do so, I wouldn't mind optimizing for that particular set of types. I've tried GLdouble, and it doesn't really improve things with the current code.
Is there a way to short circuit those realToFrac calls if we know the input and output are the same type? Is there a way merge the nested calls?
You can try rewrite rules {-# RULES "realToFrac2/realToFrac" realToFrac . realToFrac = realToFrac "realToFrac/id" realToFrac = id #-} but I'm afraid the second won't work at all, then you'd have to specify all interesting cases yourself (there are rules for the cases Double -> Double and Float -> Float in GHC.Float, rules for converting from/to CFloat and CDouble in Foreign.C.Types, so those should be fine too) "realToFrac/GLclampf->GLclampf" realToFrac = id :: GLclampf -> GLclampf and what ese you need. Whether the first one will help (or even work), I don't know either, you have to try.
Any other thoughts on what I can do here? The slow down between the two implementations is at least 20x, which seems like a steep penalty to pay.
In case of emergency, put the needed rewrite rules into the source of OpenGLRaw yourself.
And while I'm at it, is turning on FlexibleInstances the only way to create an instance for (a,a)?
Yes. Haskell98 doesn't allow such instance declarations, so you need the extension.