Re: performance issues in simple arithmetic code

28 Apr 2011


      On 27 April 2011 20:01, Denys Rtveliashvili  wrote:
...
The lack of expected magic is in the assembler code:
-------------------
    addq $16,%r12
    cmpq 144(%r13),%r12
    ja .Lcz1
    movl $1117,%ecx
    movl $1113,%r10d
    movl $1111,%r11d
    movq 7(%rbx),%rax
    cqto
    idivq %r11
    cqto
    idivq %r10
    cqto
    idivq %rcx
    movq $ghczmprim_GHCziTypes_Izh_con_info,-8(%r12)
    movq %rax,0(%r12)
    leaq -7(%r12),%rbx
    addq $8,%rbp
    jmp *0(%rbp)
-------------------
Question: can't it use cheap multiplication and shift instead of expensive
division here? I know that such optimisation is implemented at least to some
extent for C--. I suppose it also won't do anything smart for expressions
like a*4 or a/4 for the same reason.
There isn't really any optimisation done on Cmm and the native code
generator doesn't do much optimisation itself, hence you get the more
direct forward translation. This kind of code is where the LLVM
backend does well in comparison. I haven't tried benchmarking the
performance of -fasm vs -fllvm for this code but if you eyeball the
assembly code produced by -fllvm then you'll see it uses shifts and
other magic.

Cheers,
David

Re: performance issues in simple arithmetic code

David Terei