Simon Marlow writes:
This sounds right to me. Did you submit a patch?
Note that dynamic linking with LLVM is likely to produce significantly
worse code that with the NCG right now, because the LLVM back end uses
dynamic references even for symbols in the same package, whereas the NCG
back-end uses direct static references for these.
Today with the help of Edward Yang I examined the code produced by the
LLVM backend in light of this statement. I was surprised to find that
LLVM's code appears to be no worse than the NCG with respect to
intra-package references.
My test case can be found here[2] and can be built with the included
`build.sh` script. The test consists of two modules build into a shared
library. One module, `LibTest`, exports a few simple members while the
other module (`LibTest2`) defines members that consume them. Care is
taken to ensure the members are not inlined.
The tests were done on x86_64 running LLVM 3.4 and GHC HEAD with the
patches[1] I referred to in my last message. Please let me know if I've
missed something.
# Evaluation
## First example ##
The first member is a simple `String` (defined in `LibTest`),
helloWorld :: String
helloWorld = "Hello World!"
The use-site is quite straightforward,
testHelloWorld :: IO String
testHelloWorld = return helloWorld
With `-O1` the code looks reasonable in both cases. Most importantly,
both backends use IP relative addressing to find the symbol.
### LLVM ###
0000000000000ef8 :
ef8: 48 8b 45 00 mov 0x0(%rbp),%rax
efc: 48 8d 1d cd 11 20 00 lea 0x2011cd(%rip),%rbx # 2020d0
f03: ff e0 jmpq *%rax
0000000000000f28 :
f28: eb ce jmp ef8
f2a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
### NCG ###
0000000000000d58 :
d58: 48 8d 1d 71 13 20 00 lea 0x201371(%rip),%rbx # 2020d0
d5f: ff 65 00 jmpq *0x0(%rbp)
0000000000000d88 :
d88: eb ce jmp d58
With `-O0` the code is substantially longer but the relocation behavior
is still correct, as one would expect.
Looking at the definition of `helloWorld`[3] itself it becomes clear that
the LLVM backend is more likely to use PLT relocations over GOT. In
general, `stg_*` primitives are called through the PLT. As far as I can
tell, both of these call mechanisms will incur two memory
accesses. However, in the case of the PLT the call will consist of two
JMPs whereas the GOT will consist of only one. Is this a cause for
concern? Could these two jumps interfere with prediction?
In general the LLVM backend produces a few more instructions than the
NCG although this doesn't appear to be related to handling of
relocations. For instance, the inexplicable (to me) `mov` at the
beginning of LLVM's `rKw_info`.
## Second example ##
The second example demonstrates an actual call,
-- Definition (in LibTest)
infoRef :: Int -> Int
infoRef n = n + 1
-- Call site
testInfoRef :: IO Int
testInfoRef = return (infoRef 2)
With `-O1` this produces the following code,
### LLVM ###
0000000000000fb0 :
fb0: 48 8b 45 00 mov 0x0(%rbp),%rax
fb4: 48 8d 1d a5 10 20 00 lea 0x2010a5(%rip),%rbx # 202060
fbb: ff e0 jmpq *%rax
0000000000000fe0 :
fe0: eb ce jmp fb0
### NCG ###
0000000000000e10 :
e10: 48 8d 1d 51 12 20 00 lea 0x201251(%rip),%rbx # 202068
e17: ff 65 00 jmpq *0x0(%rbp)
0000000000000e40 :
e40: eb ce jmp e10
Again, it seems that LLVM is a bit more verbose but seems to handle
intra-package calls efficiently.
[1] https://github.com/bgamari/ghc/commits/llvm-dynamic
[2] https://github.com/bgamari/ghc-linking-tests/tree/master/ghc-test
[3] `helloWorld` definitions:
LLVM:
00000000000010a8 :
10a8: 50 push %rax
10a9: 4c 8d 75 f0 lea -0x10(%rbp),%r14
10ad: 4d 39 fe cmp %r15,%r14
10b0: 73 07 jae 10b9
10b2: 49 8b 45 f0 mov -0x10(%r13),%rax
10b6: 5a pop %rdx
10b7: ff e0 jmpq *%rax
10b9: 4c 89 ef mov %r13,%rdi
10bc: 48 89 de mov %rbx,%rsi
10bf: e8 0c fd ff ff callq dd0
10c4: 48 85 c0 test %rax,%rax
10c7: 74 22 je 10eb
10c9: 48 8b 0d 18 0f 20 00 mov 0x200f18(%rip),%rcx # 201fe8 <_DYNAMIC+0x228>
10d0: 48 89 4d f0 mov %rcx,-0x10(%rbp)
10d4: 48 89 45 f8 mov %rax,-0x8(%rbp)
10d8: 48 8d 05 21 00 00 00 lea 0x21(%rip),%rax # 1100
10df: 4c 89 f5 mov %r14,%rbp
10e2: 49 89 c6 mov %rax,%r14
10e5: 58 pop %rax
10e6: e9 b5 fc ff ff jmpq da0
10eb: 48 8b 03 mov (%rbx),%rax
10ee: 5a pop %rdx
10ef: ff e0 jmpq *%rax
NCG:
0000000000000ef8 :
ef8: 48 8d 45 f0 lea -0x10(%rbp),%rax
efc: 4c 39 f8 cmp %r15,%rax
eff: 72 3f jb f40
f01: 4c 89 ef mov %r13,%rdi
f04: 48 89 de mov %rbx,%rsi
f07: 48 83 ec 08 sub $0x8,%rsp
f0b: b8 00 00 00 00 mov $0x0,%eax
f10: e8 1b fd ff ff callq c30
f15: 48 83 c4 08 add $0x8,%rsp
f19: 48 85 c0 test %rax,%rax
f1c: 74 20 je f3e
f1e: 48 8b 1d cb 10 20 00 mov 0x2010cb(%rip),%rbx # 201ff0 <_DYNAMIC+0x238>
f25: 48 89 5d f0 mov %rbx,-0x10(%rbp)
f29: 48 89 45 f8 mov %rax,-0x8(%rbp)
f2d: 4c 8d 35 1c 00 00 00 lea 0x1c(%rip),%r14 # f50
f34: 48 83 c5 f0 add $0xfffffffffffffff0,%rbp
f38: ff 25 7a 10 20 00 jmpq *0x20107a(%rip) # 201fb8 <_DYNAMIC+0x200>
f3e: ff 23 jmpq *(%rbx)
f40: 41 ff 65 f0 jmpq *-0x10(%r13)