
On my system, the C version runs about 9x faster than the haskell
version (with -O3 and -O2 -fvia-c -optc-O3 respectively). However, GCC
seems to produce about 70 lines of assembly for the main loop,
compared to about 10 from GHC. I suspect the speed difference is the
result of some heavy optimisation by GCC, which would need to be
hand-tuned for GHC. (I would be interested to see what this would be.
Unfortunately I don't know x86 assembly well enough to understand the
GCC output.)
On 21/08/07, Donald Bruce Stewart
phil:
On Mon, Aug 20, 2007 at 09:57:38PM +0100, Simon Peyton-Jones wrote:
GHC does some constant folding, but little by way of strength reduction, or using shifts instead of multiplication. It's pretty easy to add more: it's all done in a single module. Look at primOpRules in the module PrelRules.
Patches welcome! But please also supply test-suite tests that check the correctness of the rules.
Sucking another example out of comp.lang.functional:
This:
import System
f :: Int -> Int -> Int f s n = if n > 0 then f (s+n) (n-1) else s
main = do [n] <- getArgs putStrLn $ show $ f 0 (read n)
is 3-4x slower than this:
#include
#include #include int f(int s, int n) { return n > 0 ? f(s+n, n-1) : s; }
int main(int argc, char *argv[]) { assert(argc == 2); printf("%d\n", f(0, strtol(argv[1],0,0))); }
The generated assembler suggests (if I've read it correctly) that gcc is spotting that it can replace the tail call with a jump in the C version, but for some reason it can't spot it for the Haskell version when compiling with -fvia-C (and neither does ghc itself using -fasm). So the haskell version ends up pushing and popping values on and off the stack for every call to f, which is a bit sad.
That doesn't sound quite right. The C version should get a tail call , with gcc -O2, the Haskell version should be a tail call anyway.
Let's see:
C $ gcc -O t.c -o t $ time ./t 1000000000 zsh: segmentation fault (core dumped) ./t 1000000000 ./t 1000000000 0.02s user 0.22s system 5% cpu 4.640 total
Turning on -O2
$ time ./t 1000000000 -243309312 ./t 1000000000 1.89s user 0.00s system 97% cpu 1.940 total
And GHC:
$ ghc -O2 A.hs -o A $ time ./A 1000000000 -243309312 ./A 1000000000 3.21s user 0.01s system 97% cpu 3.289 total
So, what, 1.6x slower than gcc -O2 Seems ok without any tuning.
-- Don _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe