Re: [GHC] #10062: Codegen on sequential FFI calls is not very good

30 Aug 2015

      #10062: Codegen on sequential FFI calls is not very good
-------------------------------------+-------------------------------------
        Reporter:  chadaustin        |                   Owner:
            Type:  bug               |                  Status:  new
        Priority:  normal            |               Milestone:
       Component:  Compiler          |                 Version:  7.8.3
  (CodeGen)                          |
      Resolution:                    |                Keywords:
Operating System:  Unknown/Multiple  |            Architecture:
 Type of failure:  Runtime           |  Unknown/Multiple
  performance bug                    |               Test Case:
      Blocked By:                    |                Blocking:
 Related Tickets:                    |  Differential Revisions:
-------------------------------------+-------------------------------------
Description changed by bgamari:

Old description:
...
I'm writing a library for efficiently building up a byte buffer.  The
fastest approach I've found is via FFI, with restricted effects like ST.
It's over twice as fast as ByteString Builder.
Consider this example API usage: https://github.com/chadaustin/buffer-
builder/blob/6bd0a39c56f63ab751faf29f9784ac87d52638be/bench/Bench.hs#L46
It compiles into an instruction sequence containing direct, sequenced FFI
calls.  For example, the last three calls work out to:
addq $8,%rsp
        movq %rbx,%rdi
        movq 72(%rsp),%rax
        movq %rax,%rsi
        subq $8,%rsp
        movl $0,%eax
        call bw_append_bsz
addq $8,%rsp
        movq %rbx,%rdi
        movl $35,%esi
        subq $8,%rsp
        movl $0,%eax
        call bw_append_byte
addq $8,%rsp
        movq %rbx,%rdi
        movq 64(%rsp),%rax
        movq %rax,%rsi
        subq $8,%rsp
        movl $0,%eax
        call bw_append_bsz
I don't know why rsp is being changed so much.  I also can't explain the
assignment to eax before the call.  (It should also be xorl eax,eax, I
would think.)
To my reading, the above instruction sequence could be reduced to:
movq %rbx,%rdi
        movq 64(%rsp),%rsi
        call bw_append_bsz
movq %rbx,%rdi
        movl $35,%esi
        call bw_append_byte
movq %rbx,%rdi
        movq 56(%rsp),%rsi
        call bw_append_bsz
To reproduce, check out git@github.com:chadaustin/buffer-builder.git at
revision 6bd0a39c56f63ab751faf29f9784ac87d52638be
cabal configure --enable-benchmarks
cabal bench
And then look at the ./dist/build/bench/bench-tmp/bench/Bench.dump-asm
file.
This is specifically on OS X 64-bit with GHC 7.8.3, but I saw similar
code generation on GHC 7.6 on Linux 64-bit.
New description:

 I'm writing a library for efficiently building up a byte buffer.  The
 fastest approach I've found is via FFI, with restricted effects like ST.
 It's over twice as fast as ByteString Builder.

 Consider this example API usage: https://github.com/chadaustin/buffer-
 builder/blob/6bd0a39c56f63ab751faf29f9784ac87d52638be/bench/Bench.hs#L46

 It compiles into an instruction sequence containing direct, sequenced FFI
 calls.  For example, the last three calls work out to:

 {{{
 addq $8,%rsp
 movq %rbx,%rdi
 movq 72(%rsp),%rax
 movq %rax,%rsi
 subq $8,%rsp
 movl $0,%eax
 call bw_append_bsz

 addq $8,%rsp
 movq %rbx,%rdi
 movl $35,%esi
 subq $8,%rsp
 movl $0,%eax
 call bw_append_byte

 addq $8,%rsp
 movq %rbx,%rdi
 movq 64(%rsp),%rax
 movq %rax,%rsi
 subq $8,%rsp
 movl $0,%eax
 call bw_append_bsz
 }}}

 I don't know why `rsp` is being changed so much.  I also can't explain the
 assignment to `eax` before the call.  (It should also be `xorl eax,eax`, I
 would think.)

 To my reading, the above instruction sequence could be reduced to:

 {{{
 movq %rbx,%rdi
 movq 64(%rsp),%rsi
 call bw_append_bsz

 movq %rbx,%rdi
 movl $35,%esi
 call bw_append_byte

 movq %rbx,%rdi
 movq 56(%rsp),%rsi
 call bw_append_bsz
 }}}

 To reproduce, check out `git@github.com:chadaustin/buffer-builder.git` at
 revision 6bd0a39c56f63ab751faf29f9784ac87d52638be

 {{{
 cabal configure --enable-benchmarks
 cabal bench
 }}}

 And then look at the `./dist/build/bench/bench-tmp/bench/Bench.dump-asm`
 file.

 This is specifically on OS X 64-bit with GHC 7.8.3, but I saw similar code
 generation on GHC 7.6 on Linux 64-bit.

--

--
Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10062#comment:7
GHC http://www.haskell.org/ghc/
The Glasgow Haskell Compiler