Re: [Haskell-cafe] Re: speed: ghc vs gcc

21 Feb 2009

      bertram.felgenhauer:
...
This is odd, but it doesn't hurt the inner loop, which only involves
$wsum01_XPd, and is identical to $wfold_s15t above.
...
Checking the asm:
    $ ghc -O2 -fasm
sQ3_info:
    .LcRt:
      cmpq 8(%rbp),%rsi
      jg .LcRw
      leaq 1(%rsi),%rax
      addq %rsi,%rbx
      movq %rax,%rsi
      jmp sQ3_info
So for some reason ghc ends up doing the (n + 1) addition before the
(acc + n) addition in this case - this accounts for the extra
instruction, because both n+1 and n need to be kept around for the
duration of the addq (which does the acc + n addition).
Yep, well spotted.
...
...
Checking via C:
$ ghc -O2 -optc-O3 -fvia-C
Better code, but still a bit slower:
sQ3_info:
          cmpq        8(%rbp), %rsi
          jg  .L8
          addq        %rsi, %rbx
          leaq        1(%rsi), %rsi
          jmp sQ3_info
This code is identical (up to renaming registers and one offset that
I can't fully explain, but is probably related to a slight difference
in handling pointer tags between the two versions of the code) to the
"nice assembly" above.
Indeed, which is gratifying.
...
...
Running:
$ time   ./B
        500000000500000000
        ./B  1.01s user 0.01s system 97% cpu 1.035 total
Hmm, about 5% slower, are you sure this isn't just noise?
If not noise, it may be some alignment effect. Hard to say.
I couldn't get it under 1s from a dozen runs, so assuming some small
effect with alignment.

Why we get the extra test in the outer loop though, not sure. That's new
too I think -- at least I've not seen that pattern before.

-- Don

Re: [Haskell-cafe] Re: speed: ghc vs gcc

Don Stewart