Re: [Haskell-cafe] GHC optimisations

21 Aug 2007

      On Tue, Aug 21, 2007 at 05:25:49AM -0700, Stefan O'Rear wrote:
...
On Tue, Aug 21, 2007 at 01:14:20PM +0100, Rodrigo Queiro wrote:
...
On my system, the C version runs about 9x faster than the haskell
version (with -O3 and -O2 -fvia-c -optc-O3 respectively). However, GCC
seems to produce about 70 lines of assembly for the main loop,
compared to about 10 from GHC. I suspect the speed difference is the
result of some heavy optimisation by GCC, which would need to be
hand-tuned for GHC. (I would be interested to see what this would be.
Unfortunately I don't know x86 assembly well enough to understand the
GCC output.)
GCC is carrying out two major optimisations that ghc is missing here:
replacing the tail call with a jump directly into the function body
(having stuffed the correct arguments into the appropriate registers)
and unrolling the loop. That's pretty much it. Neither are what I'd
call 'heavy' optimisations.
...
The fundamental problem is that GHC doesn't have enough registers to to
a good job with Haskell.  Normal Haskell code  makes extensive use of
the GHC stack for function calls, the C stack for signal handlers, the
capability base pointer for thread state, and the heap for everything
else.  Which doesn't really leave us in a good state for optimizing.  In
particular, x86 ghc ALWAYS passes parameters on the stack, even for tail
calls.  I didn't actually bother to check, but I'm pretty sure that's
what the OP was noticing - if you look carefully it's not actually
pushing or popping anything, just using stack memory.
Yes, absolutely.
...
Situations are far better on x86_64 (16 registers) and ppc (32
registers).  There is some work being done on the backend to improve
this (in particular, a new and much better register allocator and a
parameter-aware Cmm system).
<fires up ppc box>

Ouch. That's even worse:

$ ./sum 100000000

C version: 0.16s
Haskell  : 1.40s

Looking at the generated assembler, the ppc version has exactly the
same problem that the x86 version does. It carries out the
calculation, the stores the result in some memory locations and calls f
again so that the preamble to f can pull those same results out of the
memory locations in order to put them back into the same registers
again!

(I'm using ghc 6.6.1 on Debian unstable btw for anyone following along.)

cheers, Phil

-- 
http://www.kantaka.co.uk/ .oOo. public key: http://www.kantaka.co.uk/gpg.txt

Re: [Haskell-cafe] GHC optimisations

Philip Armstrong