
John Meacham wrote:
On Wed, Jan 18, 2006 at 08:54:43PM +0300, Bulat Ziganshin wrote:
sorry, with the "gcc -O3 -ffast-math -fstrict-aliasing -funroll-loops" the C version is 50 times faster than best Haskell one... it's the loop from C version:
I believe something similar to what I noted here is the culprit: http://www.haskell.org//pipermail/glasgow-haskell-users/2005-October/009174....
it is fixable, but not without modifying ghc.
Ah, I see what you mean by indirect jumps. Those indirect jumps go away if you compile with -optc-O2 or -fasm, they're droppings left by inadequacies in gcc's standard -O optimisation. Actually, -fasm does better by one instruction than gcc on this example: .globl Test_zdwfac_info Test_zdwfac_info: movq (%rbp),%rax cmpq $1,%rax jne .LcmO movq 8(%rbp),%r13 addq $16,%rbp jmp *(%rbp) .LcmO: leaq -1(%rax),%rcx imulq 8(%rbp),%rax movq %rax,8(%rbp) movq %rcx,(%rbp) jmp Test_zdwfac_info vs. gcc -O2: Test_zdwfac_info: .text .align 8 movq (%rbp), %rdx cmpq $1, %rdx je .L6 .L3: movq 8(%rbp), %rax imulq %rdx, %rax decq %rdx movq %rdx, (%rbp) movq %rax, 8(%rbp) jmp Test_zdwfac_info .p2align 4,,7 .L6: movq 8(%rbp), %r13 addq $16, %rbp jmp *(%rbp) We should probably reverse the sense of that branch, like gcc does. The memory accesses are still there, of course. Hopefully someday I'll get around to trying to use more registers on x86_64 again. Cheers, Simon