Re: [Haskell-cafe] GHC optimisations

21 Aug 2007

      On Tue, Aug 21, 2007 at 01:14:20PM +0100, Rodrigo Queiro wrote:
...
On my system, the C version runs about 9x faster than the haskell
version (with -O3 and -O2 -fvia-c -optc-O3 respectively). However, GCC
seems to produce about 70 lines of assembly for the main loop,
compared to about 10 from GHC. I suspect the speed difference is the
result of some heavy optimisation by GCC, which would need to be
hand-tuned for GHC. (I would be interested to see what this would be.
Unfortunately I don't know x86 assembly well enough to understand the
GCC output.)
The fundamental problem is that GHC doesn't have enough registers to to
a good job with Haskell.  Normal Haskell code  makes extensive use of
the GHC stack for function calls, the C stack for signal handlers, the
capability base pointer for thread state, and the heap for everything
else.  Which doesn't really leave us in a good state for optimizing.  In
particular, x86 ghc ALWAYS passes parameters on the stack, even for tail
calls.  I didn't actually bother to check, but I'm pretty sure that's
what the OP was noticing - if you look carefully it's not actually
pushing or popping anything, just using stack memory.

Situations are far better on x86_64 (16 registers) and ppc (32
registers).  There is some work being done on the backend to improve
this (in particular, a new and much better register allocator and a
parameter-aware Cmm system).

Stefan

Re: [Haskell-cafe] GHC optimisations

Stefan O'Rear