
# The Problem Dynamic linking is currently broken with the LLVM code generator. This can be easily seen by attempting to compile GHC with, GhcDynamic = YES DYNAMIC_BY_DEFAULT = YES DYNAMIC_GHC_PROGRAMS = YES BuildFlavour = quick-llvm This build will fail with a error along the lines of, dll-split: internal error: invalid closure, info=0x402ec0 (GHC version 7.7.20131212 for x86_64_unknown_linux) Please report this as a GHC bug: http://www.haskell.org/ghc/reportabug After some poking around with the help of Peter Wortmann, it seems clear that this is due to a subtle difference in how LLVM emits function symbols. While the NCG emits these symbols with `.type @object`, LLVM emits `.type @function`. It appears that the `.type` annotation guides the linker in choosing the relocation mechanism to use for the symbol. While `@object` symbols use the Global Offset Table, `@function` symbols are relocated through the Procedure Linking Table, a table of trampoline calls which are fixed up at runtime. This means that static references to functions end up pointing not to the object itself (and its info table) but instead to some linker-generated assembly. When the garbage collector attempts to examine the info table of one of these references, it finds nonsense and fails. # A solution Peter demonstrated that manually modifying the assembler produced by llc, passing this through GHC's mangler, and assembling the result yields a functional binary. As far as I can tell, LLVM's intermediate language doesn't expose any way to force a function to `.type @object`. Unfortunately this means that, at least for now, the only fix is to augment the mangler with logic to perform this transform. I've done this in my `llvm-dynamic` branch[1] (in addition to finding a bug in the `rewriteInstructions` function used by AVX rewriting). This branch compiles on my x86_64 machine to produce what appears to be a functional compiler. Unfortunately installation issues (which I'll describe shortly in a new thread) prevent me from verifying this. I'm currently waiting for a build on my ARM box but assuming this fix works this means that GHC could (finally) have first-class, stable ARM support. Comments? Cheers, - Ben [1] https://github.com/bgamari/ghc/tree/llvm-dynamic

This sounds right to me. Did you submit a patch? Note that dynamic linking with LLVM is likely to produce significantly worse code that with the NCG right now, because the LLVM back end uses dynamic references even for symbols in the same package, whereas the NCG back-end uses direct static references for these. Cheers, Simon On 14/12/2013 22:13, Ben Gamari wrote:
# The Problem
Dynamic linking is currently broken with the LLVM code generator. This can be easily seen by attempting to compile GHC with,
GhcDynamic = YES DYNAMIC_BY_DEFAULT = YES DYNAMIC_GHC_PROGRAMS = YES BuildFlavour = quick-llvm
This build will fail with a error along the lines of,
dll-split: internal error: invalid closure, info=0x402ec0 (GHC version 7.7.20131212 for x86_64_unknown_linux) Please report this as a GHC bug: http://www.haskell.org/ghc/reportabug
After some poking around with the help of Peter Wortmann, it seems clear that this is due to a subtle difference in how LLVM emits function symbols. While the NCG emits these symbols with `.type @object`, LLVM emits `.type @function`.
It appears that the `.type` annotation guides the linker in choosing the relocation mechanism to use for the symbol. While `@object` symbols use the Global Offset Table, `@function` symbols are relocated through the Procedure Linking Table, a table of trampoline calls which are fixed up at runtime. This means that static references to functions end up pointing not to the object itself (and its info table) but instead to some linker-generated assembly. When the garbage collector attempts to examine the info table of one of these references, it finds nonsense and fails.
# A solution
Peter demonstrated that manually modifying the assembler produced by llc, passing this through GHC's mangler, and assembling the result yields a functional binary.
As far as I can tell, LLVM's intermediate language doesn't expose any way to force a function to `.type @object`. Unfortunately this means that, at least for now, the only fix is to augment the mangler with logic to perform this transform. I've done this in my `llvm-dynamic` branch[1] (in addition to finding a bug in the `rewriteInstructions` function used by AVX rewriting).
This branch compiles on my x86_64 machine to produce what appears to be a functional compiler. Unfortunately installation issues (which I'll describe shortly in a new thread) prevent me from verifying this. I'm currently waiting for a build on my ARM box but assuming this fix works this means that GHC could (finally) have first-class, stable ARM support.
Comments?
Cheers,
- Ben
[1] https://github.com/bgamari/ghc/tree/llvm-dynamic
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

Simon Marlow
This sounds right to me. Did you submit a patch?
Not yet, I'm currently fighting through some build system issues which are preventing me from actually installing and testing the compiler on my ARM box.
Note that dynamic linking with LLVM is likely to produce significantly worse code that with the NCG right now, because the LLVM back end uses dynamic references even for symbols in the same package, whereas the NCG back-end uses direct static references for these.
Right. However it (hopefully) works on ARM which is more than I can say about the NCG. Moreover, I'm hopeful that it will be possible to fix LLVM's output. Would this not simply be a matter of flagging package-local symbols with LLVM's `private` linkage type[1]? In the case where you have references both internal and external to the package could you not define two overlapping symbols, one flagged with `private` and the other `external`? Perhaps I'm missing a subtlety? Cheers, - Ben [1] http://llvm.org/docs/LangRef.html#linkage-types

Simon Marlow
This sounds right to me. Did you submit a patch?
Note that dynamic linking with LLVM is likely to produce significantly worse code that with the NCG right now, because the LLVM back end uses dynamic references even for symbols in the same package, whereas the NCG back-end uses direct static references for these.
Today with the help of Edward Yang I examined the code produced by the
LLVM backend in light of this statement. I was surprised to find that
LLVM's code appears to be no worse than the NCG with respect to
intra-package references.
My test case can be found here[2] and can be built with the included
`build.sh` script. The test consists of two modules build into a shared
library. One module, `LibTest`, exports a few simple members while the
other module (`LibTest2`) defines members that consume them. Care is
taken to ensure the members are not inlined.
The tests were done on x86_64 running LLVM 3.4 and GHC HEAD with the
patches[1] I referred to in my last message. Please let me know if I've
missed something.
# Evaluation
## First example ##
The first member is a simple `String` (defined in `LibTest`),
helloWorld :: String
helloWorld = "Hello World!"
The use-site is quite straightforward,
testHelloWorld :: IO String
testHelloWorld = return helloWorld
With `-O1` the code looks reasonable in both cases. Most importantly,
both backends use IP relative addressing to find the symbol.
### LLVM ###
0000000000000ef8

great work! :)
On Fri, Dec 27, 2013 at 3:21 PM, Ben Gamari
Simon Marlow
writes: This sounds right to me. Did you submit a patch?
Note that dynamic linking with LLVM is likely to produce significantly worse code that with the NCG right now, because the LLVM back end uses dynamic references even for symbols in the same package, whereas the NCG back-end uses direct static references for these.
Today with the help of Edward Yang I examined the code produced by the LLVM backend in light of this statement. I was surprised to find that LLVM's code appears to be no worse than the NCG with respect to intra-package references.
My test case can be found here[2] and can be built with the included `build.sh` script. The test consists of two modules build into a shared library. One module, `LibTest`, exports a few simple members while the other module (`LibTest2`) defines members that consume them. Care is taken to ensure the members are not inlined.
The tests were done on x86_64 running LLVM 3.4 and GHC HEAD with the patches[1] I referred to in my last message. Please let me know if I've missed something.
# Evaluation
## First example ##
The first member is a simple `String` (defined in `LibTest`),
helloWorld :: String helloWorld = "Hello World!"
The use-site is quite straightforward,
testHelloWorld :: IO String testHelloWorld = return helloWorld
With `-O1` the code looks reasonable in both cases. Most importantly, both backends use IP relative addressing to find the symbol.
### LLVM ###
0000000000000ef8
: ef8: 48 8b 45 00 mov 0x0(%rbp),%rax efc: 48 8d 1d cd 11 20 00 lea 0x2011cd(%rip),%rbx # 2020d0 f03: ff e0 jmpq *%rax 0000000000000f28
: f28: eb ce jmp ef8 f2a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) ### NCG ###
0000000000000d58
: d58: 48 8d 1d 71 13 20 00 lea 0x201371(%rip),%rbx # 2020d0 d5f: ff 65 00 jmpq *0x0(%rbp) 0000000000000d88
: d88: eb ce jmp d58 With `-O0` the code is substantially longer but the relocation behavior is still correct, as one would expect.
Looking at the definition of `helloWorld`[3] itself it becomes clear that the LLVM backend is more likely to use PLT relocations over GOT. In general, `stg_*` primitives are called through the PLT. As far as I can tell, both of these call mechanisms will incur two memory accesses. However, in the case of the PLT the call will consist of two JMPs whereas the GOT will consist of only one. Is this a cause for concern? Could these two jumps interfere with prediction?
In general the LLVM backend produces a few more instructions than the NCG although this doesn't appear to be related to handling of relocations. For instance, the inexplicable (to me) `mov` at the beginning of LLVM's `rKw_info`.
## Second example ##
The second example demonstrates an actual call,
-- Definition (in LibTest) infoRef :: Int -> Int infoRef n = n + 1
-- Call site testInfoRef :: IO Int testInfoRef = return (infoRef 2)
With `-O1` this produces the following code,
### LLVM ###
0000000000000fb0
: fb0: 48 8b 45 00 mov 0x0(%rbp),%rax fb4: 48 8d 1d a5 10 20 00 lea 0x2010a5(%rip),%rbx # 202060 fbb: ff e0 jmpq *%rax 0000000000000fe0
: fe0: eb ce jmp fb0 ### NCG ###
0000000000000e10
: e10: 48 8d 1d 51 12 20 00 lea 0x201251(%rip),%rbx # 202068 e17: ff 65 00 jmpq *0x0(%rbp) 0000000000000e40
: e40: eb ce jmp e10 Again, it seems that LLVM is a bit more verbose but seems to handle intra-package calls efficiently.
[1] https://github.com/bgamari/ghc/commits/llvm-dynamic [2] https://github.com/bgamari/ghc-linking-tests/tree/master/ghc-test [3] `helloWorld` definitions:
LLVM: 00000000000010a8
: 10a8: 50 push %rax 10a9: 4c 8d 75 f0 lea -0x10(%rbp),%r14 10ad: 4d 39 fe cmp %r15,%r14 10b0: 73 07 jae 10b9 10b2: 49 8b 45 f0 mov -0x10(%r13),%rax 10b6: 5a pop %rdx 10b7: ff e0 jmpq *%rax 10b9: 4c 89 ef mov %r13,%rdi 10bc: 48 89 de mov %rbx,%rsi 10bf: e8 0c fd ff ff callq dd0 10c4: 48 85 c0 test %rax,%rax 10c7: 74 22 je 10eb 10c9: 48 8b 0d 18 0f 20 00 mov 0x200f18(%rip),%rcx # 201fe8 <_DYNAMIC+0x228> 10d0: 48 89 4d f0 mov %rcx,-0x10(%rbp) 10d4: 48 89 45 f8 mov %rax,-0x8(%rbp) 10d8: 48 8d 05 21 00 00 00 lea 0x21(%rip),%rax # 1100 10df: 4c 89 f5 mov %r14,%rbp 10e2: 49 89 c6 mov %rax,%r14 10e5: 58 pop %rax 10e6: e9 b5 fc ff ff jmpq da0 10eb: 48 8b 03 mov (%rbx),%rax 10ee: 5a pop %rdx 10ef: ff e0 jmpq *%rax NCG:
0000000000000ef8
: ef8: 48 8d 45 f0 lea -0x10(%rbp),%rax efc: 4c 39 f8 cmp %r15,%rax eff: 72 3f jb f40 f01: 4c 89 ef mov %r13,%rdi f04: 48 89 de mov %rbx,%rsi f07: 48 83 ec 08 sub $0x8,%rsp f0b: b8 00 00 00 00 mov $0x0,%eax f10: e8 1b fd ff ff callq c30 f15: 48 83 c4 08 add $0x8,%rsp f19: 48 85 c0 test %rax,%rax f1c: 74 20 je f3e f1e: 48 8b 1d cb 10 20 00 mov 0x2010cb(%rip),%rbx # 201ff0 <_DYNAMIC+0x238> f25: 48 89 5d f0 mov %rbx,-0x10(%rbp) f29: 48 89 45 f8 mov %rax,-0x8(%rbp) f2d: 4c 8d 35 1c 00 00 00 lea 0x1c(%rip),%r14 # f50 f34: 48 83 c5 f0 add $0xfffffffffffffff0,%rbp f38: ff 25 7a 10 20 00 jmpq *0x20107a(%rip) # 201fb8 <_DYNAMIC+0x200> f3e: ff 23 jmpq *(%rbx) f40: 41 ff 65 f0 jmpq *-0x10(%r13) _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

On 27/12/13 20:21, Ben Gamari wrote:
Simon Marlow
writes: This sounds right to me. Did you submit a patch?
Note that dynamic linking with LLVM is likely to produce significantly worse code that with the NCG right now, because the LLVM back end uses dynamic references even for symbols in the same package, whereas the NCG back-end uses direct static references for these.
Today with the help of Edward Yang I examined the code produced by the LLVM backend in light of this statement. I was surprised to find that LLVM's code appears to be no worse than the NCG with respect to intra-package references.
My test case can be found here[2] and can be built with the included `build.sh` script. The test consists of two modules build into a shared library. One module, `LibTest`, exports a few simple members while the other module (`LibTest2`) defines members that consume them. Care is taken to ensure the members are not inlined.
The tests were done on x86_64 running LLVM 3.4 and GHC HEAD with the patches[1] I referred to in my last message. Please let me know if I've missed something.
This is good news, however what worries me is that I still don't understand *why* you got these results. Where in the LLVM backend is the magic that does something special for intra-package references? I know where it is in the NCG backend - CLabel.labelDynamic - but I can't see this function used at all in the LLVM backend. So what is the mechanism that lets LLVM optimise these calls? Is it happening magically in the linker, perhaps? But that would only be possible when using -Bsymbolic or -Bsymbolic-functions, which is a choice made at link time. As far as I can tell, all we do is pass a flag to llc to tell it to compile for dynamic/PIC, in DriverPipeline.runPhase. Cheers, Simon
# Evaluation
## First example ##
The first member is a simple `String` (defined in `LibTest`),
helloWorld :: String helloWorld = "Hello World!"
The use-site is quite straightforward,
testHelloWorld :: IO String testHelloWorld = return helloWorld
With `-O1` the code looks reasonable in both cases. Most importantly, both backends use IP relative addressing to find the symbol.
### LLVM ###
0000000000000ef8
: ef8: 48 8b 45 00 mov 0x0(%rbp),%rax efc: 48 8d 1d cd 11 20 00 lea 0x2011cd(%rip),%rbx # 2020d0 f03: ff e0 jmpq *%rax 0000000000000f28
: f28: eb ce jmp ef8 f2a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) ### NCG ###
0000000000000d58
: d58: 48 8d 1d 71 13 20 00 lea 0x201371(%rip),%rbx # 2020d0 d5f: ff 65 00 jmpq *0x0(%rbp) 0000000000000d88
: d88: eb ce jmp d58 With `-O0` the code is substantially longer but the relocation behavior is still correct, as one would expect.
Looking at the definition of `helloWorld`[3] itself it becomes clear that the LLVM backend is more likely to use PLT relocations over GOT. In general, `stg_*` primitives are called through the PLT. As far as I can tell, both of these call mechanisms will incur two memory accesses. However, in the case of the PLT the call will consist of two JMPs whereas the GOT will consist of only one. Is this a cause for concern? Could these two jumps interfere with prediction?
In general the LLVM backend produces a few more instructions than the NCG although this doesn't appear to be related to handling of relocations. For instance, the inexplicable (to me) `mov` at the beginning of LLVM's `rKw_info`.
## Second example ##
The second example demonstrates an actual call,
-- Definition (in LibTest) infoRef :: Int -> Int infoRef n = n + 1
-- Call site testInfoRef :: IO Int testInfoRef = return (infoRef 2)
With `-O1` this produces the following code,
### LLVM ###
0000000000000fb0
: fb0: 48 8b 45 00 mov 0x0(%rbp),%rax fb4: 48 8d 1d a5 10 20 00 lea 0x2010a5(%rip),%rbx # 202060 fbb: ff e0 jmpq *%rax 0000000000000fe0
: fe0: eb ce jmp fb0 ### NCG ###
0000000000000e10
: e10: 48 8d 1d 51 12 20 00 lea 0x201251(%rip),%rbx # 202068 e17: ff 65 00 jmpq *0x0(%rbp) 0000000000000e40
: e40: eb ce jmp e10 Again, it seems that LLVM is a bit more verbose but seems to handle intra-package calls efficiently.
[1] https://github.com/bgamari/ghc/commits/llvm-dynamic [2] https://github.com/bgamari/ghc-linking-tests/tree/master/ghc-test [3] `helloWorld` definitions:
LLVM: 00000000000010a8
: 10a8: 50 push %rax 10a9: 4c 8d 75 f0 lea -0x10(%rbp),%r14 10ad: 4d 39 fe cmp %r15,%r14 10b0: 73 07 jae 10b9 10b2: 49 8b 45 f0 mov -0x10(%r13),%rax 10b6: 5a pop %rdx 10b7: ff e0 jmpq *%rax 10b9: 4c 89 ef mov %r13,%rdi 10bc: 48 89 de mov %rbx,%rsi 10bf: e8 0c fd ff ff callq dd0 10c4: 48 85 c0 test %rax,%rax 10c7: 74 22 je 10eb 10c9: 48 8b 0d 18 0f 20 00 mov 0x200f18(%rip),%rcx # 201fe8 <_DYNAMIC+0x228> 10d0: 48 89 4d f0 mov %rcx,-0x10(%rbp) 10d4: 48 89 45 f8 mov %rax,-0x8(%rbp) 10d8: 48 8d 05 21 00 00 00 lea 0x21(%rip),%rax # 1100 10df: 4c 89 f5 mov %r14,%rbp 10e2: 49 89 c6 mov %rax,%r14 10e5: 58 pop %rax 10e6: e9 b5 fc ff ff jmpq da0 10eb: 48 8b 03 mov (%rbx),%rax 10ee: 5a pop %rdx 10ef: ff e0 jmpq *%rax NCG:
0000000000000ef8
: ef8: 48 8d 45 f0 lea -0x10(%rbp),%rax efc: 4c 39 f8 cmp %r15,%rax eff: 72 3f jb f40 f01: 4c 89 ef mov %r13,%rdi f04: 48 89 de mov %rbx,%rsi f07: 48 83 ec 08 sub $0x8,%rsp f0b: b8 00 00 00 00 mov $0x0,%eax f10: e8 1b fd ff ff callq c30 f15: 48 83 c4 08 add $0x8,%rsp f19: 48 85 c0 test %rax,%rax f1c: 74 20 je f3e f1e: 48 8b 1d cb 10 20 00 mov 0x2010cb(%rip),%rbx # 201ff0 <_DYNAMIC+0x238> f25: 48 89 5d f0 mov %rbx,-0x10(%rbp) f29: 48 89 45 f8 mov %rax,-0x8(%rbp) f2d: 4c 8d 35 1c 00 00 00 lea 0x1c(%rip),%r14 # f50 f34: 48 83 c5 f0 add $0xfffffffffffffff0,%rbp f38: ff 25 7a 10 20 00 jmpq *0x20107a(%rip) # 201fb8 <_DYNAMIC+0x200> f3e: ff 23 jmpq *(%rbx) f40: 41 ff 65 f0 jmpq *-0x10(%r13)

Simon Marlow
On 27/12/13 20:21, Ben Gamari wrote:
Simon Marlow
writes: This sounds right to me. Did you submit a patch?
Note that dynamic linking with LLVM is likely to produce significantly worse code that with the NCG right now, because the LLVM back end uses dynamic references even for symbols in the same package, whereas the NCG back-end uses direct static references for these.
Today with the help of Edward Yang I examined the code produced by the LLVM backend in light of this statement. I was surprised to find that LLVM's code appears to be no worse than the NCG with respect to intra-package references.
My test case can be found here[2] and can be built with the included `build.sh` script. The test consists of two modules build into a shared library. One module, `LibTest`, exports a few simple members while the other module (`LibTest2`) defines members that consume them. Care is taken to ensure the members are not inlined.
The tests were done on x86_64 running LLVM 3.4 and GHC HEAD with the patches[1] I referred to in my last message. Please let me know if I've missed something.
This is good news, however what worries me is that I still don't understand *why* you got these results. Where in the LLVM backend is the magic that does something special for intra-package references?
As far as I can tell, the backend itself does nothing in particular to handle this.
I know where it is in the NCG backend - CLabel.labelDynamic - but I can't see this function used at all in the LLVM backend.
Right. For the record, I took a first stab at implementing[1] the logic that I thought would needed to get LLVM to do efficient dynamic linking before taking this measurement. I probably should have reused more of the machinery used by the NCG however. I don't believe I managed to get this code stable before dropping it when I realized that LLVM already somehow did the right thing.
So what is the mechanism that lets LLVM optimise these calls? Is it happening magically in the linker, perhaps? But that would only be possible when using -Bsymbolic or -Bsymbolic-functions, which is a choice made at link time.
This seems like the most likely explanation but given we don't pass this flag I really don't see why the linker would do this. More research is necessary it seems.
As far as I can tell, all we do is pass a flag to llc to tell it to compile for dynamic/PIC, in DriverPipeline.runPhase.
Right. Very mysterious. Cheers, - Ben [1] https://github.com/bgamari/ghc/tree/llvm-intra-package
participants (3)
-
Ben Gamari
-
Carter Schonwald
-
Simon Marlow