Re: [Haskell-cafe] GHC optimisations

21 Aug 2007


      Don's reply didn't reach me for some reason, but pulling it out of the
previous response:
...
On 21/08/07, Donald Bruce Stewart  wrote:
...
phil:
...
The generated assembler suggests (if I've read it correctly) that gcc
is spotting that it can replace the tail call with a jump in the C
version, but for some reason it can't spot it for the Haskell version
when compiling with -fvia-C (and neither does ghc itself using
-fasm). So the haskell version ends up pushing and popping values on
and off the stack for every call to f, which is a bit sad.
That doesn't sound quite right. The C version should get a tail call ,
with gcc -O2, the Haskell version should be a tail call anyway.
Just to be clear; the Haskell version is a tail call, but it's pushing
the values to and from memory (well, cache really of course) for every
call to f, which is killing the performance.
...
...
Let's see:
C
    $ gcc -O t.c -o t
    $ time ./t 1000000000
    zsh: segmentation fault (core dumped)  ./t 1000000000
    ./t 1000000000  0.02s user 0.22s system 5% cpu 4.640 total
Turning on -O2
$ time ./t 1000000000
    -243309312
    ./t 1000000000  1.89s user 0.00s system 97% cpu 1.940 total
-O3 does better thanks to the loop unrolling, see timings bellow.
...
...
And GHC:
$ ghc -O2 A.hs -o A
    $ time ./A 1000000000
    -243309312
    ./A 1000000000  3.21s user 0.01s system 97% cpu 3.289 total
So, what, 1.6x slower than gcc -O2
Seems ok without any tuning.
You're getting much better timings than I am!

$ time -p ./sum-hs 1000000000
-243309312
real 3.75
user 3.70
$ time -p ./sum-c-O2 1000000000
-243309312
real 1.40
user 1.35
$ time -p ./sum-c-O3 1000000000
-243309312
real 1.21
user 1.18

(My box has a AMD Athlon64 3000+ CPU fwiw, but the powerpc version is
even worse when compared to it's respective C binary!)

Phil

-- 
http://www.kantaka.co.uk/ .oOo. public key: http://www.kantaka.co.uk/gpg.txt

Re: [Haskell-cafe] GHC optimisations

Philip Armstrong