
On 09/06/2011 08:44 PM, austin seipp wrote:
On Thu, Jun 9, 2011 at 1:53 PM, Andrew Coppin
wrote: I'm still left wondering if using 32-bit instructions to manipulate 64-bit values is actually that much slower.
The problem is you're probably going to need to spill things (somewhere) in order to operate on the upper 32bits of any given register in the non trivial case. x86 already is pathetic in its 8 GP registers. amd64 brings it up to 16.
Well, that's true enough. Given that AMD64 adds more registers, I'm surprised that this apparently makes such a small difference to wall-clock run-times. (But perhaps it makes a bigger difference for GHC. I don't know.)
I'm wondering if you could write the operations you want is small C stub functions, and FFI to them and do it that way. I don't really know enough about this sort of thing to know whether that'll work...
It's highly unlikely that the cost of a foreign call (even an unsafe one) in terms of just CPU cycles will be cheaper than executing a few primitive arithmetic ops on the CPU.
Yeah, you're probably right there actually. Too bad GHC doesn't support inline assembly yet... (Or does it? I know it supports inline Core now.)