
Don's reply didn't reach me for some reason, but pulling it out of the previous response:
On 21/08/07, Donald Bruce Stewart
wrote: phil:
The generated assembler suggests (if I've read it correctly) that gcc is spotting that it can replace the tail call with a jump in the C version, but for some reason it can't spot it for the Haskell version when compiling with -fvia-C (and neither does ghc itself using -fasm). So the haskell version ends up pushing and popping values on and off the stack for every call to f, which is a bit sad.
That doesn't sound quite right. The C version should get a tail call , with gcc -O2, the Haskell version should be a tail call anyway.
Just to be clear; the Haskell version is a tail call, but it's pushing the values to and from memory (well, cache really of course) for every call to f, which is killing the performance.
Let's see:
C $ gcc -O t.c -o t $ time ./t 1000000000 zsh: segmentation fault (core dumped) ./t 1000000000 ./t 1000000000 0.02s user 0.22s system 5% cpu 4.640 total
Turning on -O2
$ time ./t 1000000000 -243309312 ./t 1000000000 1.89s user 0.00s system 97% cpu 1.940 total
-O3 does better thanks to the loop unrolling, see timings bellow.
And GHC:
$ ghc -O2 A.hs -o A $ time ./A 1000000000 -243309312 ./A 1000000000 3.21s user 0.01s system 97% cpu 3.289 total
So, what, 1.6x slower than gcc -O2 Seems ok without any tuning.
You're getting much better timings than I am! $ time -p ./sum-hs 1000000000 -243309312 real 3.75 user 3.70 $ time -p ./sum-c-O2 1000000000 -243309312 real 1.40 user 1.35 $ time -p ./sum-c-O3 1000000000 -243309312 real 1.21 user 1.18 (My box has a AMD Athlon64 3000+ CPU fwiw, but the powerpc version is even worse when compared to it's respective C binary!) Phil -- http://www.kantaka.co.uk/ .oOo. public key: http://www.kantaka.co.uk/gpg.txt