
On Wednesday 15 September 2010 20:50:13, Greg wrote:
I hadn't come across rewrite rules yet. They definitely look like something worth learning,
Absolutely. GHC's optimiser is good, but there are a lot of cases where you need to push it via rewrite rules if you write polymorphic code or if you want to eliminate intermediate data structures (e.g. list fusion).
though I'm not sure I'm prepared to start making custom versions of OpenGL.Raw...
Yes, if you can work around the issues without that, it's better to leave it in peace :) Though you might ask the maintainer for rewrite rules.
It looks like I managed to put that battle off for another day, however. I did look at how realToFrac is implemented and (as you mention) it does the fromRational . toRational transform pair suggested in a number of sources, including Real World Haskell. Looking at what toRational is doing, creating a ratio of integers out of a float it seems like a crazy amount of effort to go through just to convert floating point numbers.
I just did some benchmarking. I benchmarked foldl' (+) 0 [convert (1 / intToDoub k) | k <- [1 .. 100000]] where intToDoub :: Int -> Double intToDoub = fromIntegral for several functions convert :: Double -> Float (actually, the type has been (RealFloat a, RealFloat b) => a -> b, but it was used at a = Double, b = Float). Everything was compiled with -O2, so the rewrite rules fired, in particular intToDoub was replaced by a primop (int2Double#), so that one's ultra cheap. For convert = realToFrac (hence by the rewrite rules the primop double2Float# was used), I got pretty good times, mean was 6.76 ms. For convert = floatToFloat from below, the times were not too bad, with a mean of 26.3 ms. A factor of roughly four for this benchmark (the factor for the conversion itself will be higher, but not exorbitantly) means it's usable in many situations, but not in performance critical situations where the conversion takes a significant amount of the running time. If you're converting to draw stuff with OpenGL (or some other graphics library), the conversion will take only a relatively small part of the time, so it's fine. For convert = fromRational . toRational (so no rewrite rules), the times were rather appalling: mean was 3.34 seconds. A factor of nearly 500 versus double2Float#. toRational is bad. Looking at instance Real Double where toRational x = (m%1)*(b%1)^^n where (m,n) = decodeFloat x b = floatRadix x (same for Float), and the implementations of (^^) and (^), {-# SPECIALISE (^) :: Integer -> Integer -> Integer, Integer -> Int -> Integer, Int -> Int -> Int #-} (^) :: (Num a, Integral b) => a -> b -> a x0 ^ y0 | y0 < 0 = error "Negative exponent" | y0 == 0 = 1 | otherwise = f x0 y0 where -- f : x0 ^ y0 = x ^ y f x y | even y = f (x * x) (y `quot` 2) | y == 1 = x | otherwise = g (x * x) ((y - 1) `quot` 2) x -- g : x0 ^ y0 = (x ^ y) * z g x y z | even y = g (x * x) (y `quot` 2) z | y == 1 = x * z | otherwise = g (x * x) ((y - 1) `quot` 2) (x * z) -- | raise a number to an integral power {-# SPECIALISE (^^) :: Rational -> Int -> Rational #-} (^^) :: (Fractional a, Integral b) => a -> b -> a x ^^ n = if n >= 0 then x^n else recip (x^(negate n)) together with the multiplication and recip for Rationals, I have to say ouch! There's no special implementation and rewrite rule for powers of Rationals, so on each multiplication in (^), the gcd of numerator and denominator is calculated, *although as powers of the original numerator and denominator they are guaranteed to be coprime*. Considering how slow a division of Integers is, awwwwwww noooooo. So let's look at a better implementation of toRational: toRat :: RealFloat a => a -> Rational toRat x = case decodeFloat x of (m,e) -> case floatRadix x of b -> if e < 0 then (m % (b^(negate e))) else (m * b^e) :% 1 (inlined a better implementation of powers for Rationals). Benchmarking convert = fromRational . toRat show a significant improvement, the mean dropped to 2.75 seconds. Still appalling, but it's a nice improvement and I don't see any quick opportunities to improve that conversion. So let's come to the last, fromRational. That's a compicated function, and unfortunately it has to be and I've no good idea to improve it. fromRational is really evil (in terms of clock cycles). Replacing fromRational with a dummy that just forces the evaluation of its argument and returns NaN, ±Infinity, or 0 for all real Rational values, dummy . toRational had a mean of 623.5 ms and dummy . toRat had a mean of 200.7 ms. So toRat is a jolly good improvement over toRational, but it's still awfully slow. And since fromRational takes much much longer anyway, it's a not too impressive gain for realToFrac.
Looking at the RealFloat class rather that Real and Fractional, it seems like this is a much more efficient way to go:
floatToFloat :: (RealFloat a, RealFloat b) => a -> b floatToFloat = (uncurry encodeFloat) . decodeFloat
Yes, that's much more efficient, as witnessed by the benchmark results. But.
I substituted this in for realToFrac and I'm back to close to my original performance. Playing with a few test cases in ghci, it looks numerically equivalent to realToFrac.
This begs the question though--
No. Sorry, but I can't bear that misuse: http://en.wikipedia.org/wiki/Begging_the_question It raises/demands/invites the question, but it doesn't beg it.
am I doing something dangerous here?
Yes and no.
Why isn't this the standard approach?
Because it will wreak unspeakable havoc when someone creates a RealFloat instance with a floatRadix > 2. A floatRadix of 10 or some power of 2 (16, 256?) could be even reasonable. But for conversions between RealFloat types with the same floatRadix, it's sort of okay, only it clobbers NaNs and (for some conversions) Infinities. However, realToFrac does that too.
If I understand what's happening, decodeFloat and encodeFloat are breaking the floating point numbers up into their constituent parts-- presumably by bit masking the raw binary.
Probably.
That would explain the performance improvement. I suppose there is some implementation dependence here, but as long as the encode and decode are implemented as a matched set then I think I'm good.
Not entirely matched, for Float -> Float and Double -> Double, NaN -> -Infinity, maybe denormalized values break too. ±Infinity is mapped to a finite value at Float -> Double But since toRational uses decodeFloat and fromRational uses encodeFloat, floatToFloat is in that respect no worse than realToFrac without rewrite rules.
Cheers-- Greg