
On Fri, Jun 22, 2007 at 01:16:54PM +0100, Simon Marlow wrote:
Philip Armstrong wrote:
IIRC, it is possible to issue an instruction to the x86 FP unit which makes all operations work on 64-bit Doubles, even though there are 80-bits available internally. Which then means there's no requirement to spill intermediate results to memory in order to get the rounding correct.
For some background on why GHC doesn't do this, see the comment "MORE FLOATING POINT MUSINGS..." in
http://darcs.haskell.org/ghc/compiler/nativeGen/MachInstrs.hs
Twisty. I guess 'slow, but correct, with switches to go faster at the price of correctness' is about the best option.
You probably want SSE2. If I ever get around to finishing it, the GHC native code generator will be able to generate SSE2 code on x86 someday, like it currently does for x86-64. For now, to get good FP performance on x86, you probably want
-fvia-C -fexcess-precision -optc-mfpmath=sse2
Reading the gcc manpage, I think you mean -optc-msse2 -optc-mfpmath=sse. -mfpmath=sse2 doesn't appear to be an option. (I note in passing that the ghc darcs head produces binaries from ray.hs which are about 15% slower than ghc 6.6.1 ones btw. Same optimisation options used both times.) cheers, Phil -- http://www.kantaka.co.uk/ .oOo. public key: http://www.kantaka.co.uk/gpg.txt