
On 17.07.2008, at 17:42, Ian Lynagh wrote:
On Thu, Jul 17, 2008 at 05:18:01PM +0200, Henning Thielemann wrote:
Complex.magnitude must prevent overflows, that is, if you just square 1e200::Double you get an overflow, although the end result may be also around 1e200. I guess, that to this end Complex.magnitude will separate mantissa and exponent, but this is done via Integers, I'm afraid.
Here's the code:
{-# SPECIALISE magnitude :: Complex Double -> Double #-} magnitude :: (RealFloat a) => Complex a -> a magnitude (x:+y) = scaleFloat k (sqrt ((scaleFloat mk x)^(2::Int) + (scaleFloat mk y)^(2::Int))) where k = max (exponent x) (exponent y) mk = - k
So the slowdown may be due to the scaling, presumably to prevent overflow as you say. However, the e^(2 :: Int) may also be causing a slowdown, as (^) is lazy in its first argument; I'm not sure if there is a rule that will rewrite that to e*e. Stefan, perhaps you can try timing with the above code, and also with:
{-# SPECIALISE magnitude :: Complex Double -> Double #-} magnitude :: (RealFloat a) => Complex a -> a magnitude (x:+y) = scaleFloat k (sqrt (sqr (scaleFloat mk x) + sqr (scaleFloat mk y))) where k = max (exponent x) (exponent y) mk = - k sqr x = x * x
and let us know what the results are?
thanks ian, here are the absolute runtimes (non-instrumented code) and the corresponding entries in the profile: c_magnitude0 (Complex.Data.magnitude) 0m7.249s c_magnitude1 (non-scaling version) 0m1.176s c_magnitude2 (scaling version, strict square) 0m3.278s %time %alloc (inherited) c_magnitude0 91.6 90.2 c_magnitude1 41.7 49.6 c_magnitude2 81.5 71.1 interestingly, just pasting the original ghc library implementation seems to slow things down considerably (0m12.264s) when compiling with -O2 -funbox-strict-fields -fvia-C -optc-O2 -fdicts-cheap -fno-method-sharing -fglasgow-exts when leaving away -fdicts-cheap and -fno-method-sharing the execution time for the pasted library code reduces to 0m6.873s. seems like some options that are useful (or even necessary?) for stream fusion rule reduction, may produce non-optimal code in other cases? <sk>