[Haskell-cafe] Re: Haskell version of ray tracer code is much slower than the original ML

22 Jun 2007

      On Fri, Jun 22, 2007 at 01:16:54PM +0100, Simon Marlow wrote:
...
Philip Armstrong wrote:
...
IIRC, it is possible to issue an instruction to the x86 FP unit which
makes all operations work on 64-bit Doubles, even though there are
80-bits available internally. Which then means there's no requirement
to spill intermediate results to memory in order to get the rounding
correct.
For some background on why GHC doesn't do this, see the comment "MORE 
FLOATING POINT MUSINGS..." in
http://darcs.haskell.org/ghc/compiler/nativeGen/MachInstrs.hs
Twisty. I guess 'slow, but correct, with switches to go faster at the
price of correctness' is about the best option.
...
You probably want SSE2.  If I ever get around to finishing it, the GHC 
native code generator will be able to generate SSE2 code on x86 someday, 
like it currently does for x86-64.  For now, to get good FP performance on 
x86, you probably want
-fvia-C -fexcess-precision -optc-mfpmath=sse2
Reading the gcc manpage, I think you mean -optc-msse2
-optc-mfpmath=sse. -mfpmath=sse2 doesn't appear to be an option.

(I note in passing that the ghc darcs head produces binaries from
ray.hs which are about 15% slower than ghc 6.6.1 ones btw. Same
optimisation options used both times.)

cheers, Phil

-- 
http://www.kantaka.co.uk/ .oOo. public key: http://www.kantaka.co.uk/gpg.txt