
Hi all, I’ve recently been trying to better understand how and where time is spent at compile-time when running Template Haskell splices, and one of the areas I’ve been struggling to figure out is the operation of the linker. From reading the source code, here’s a summary of what I think I’ve figured out so far: - TH splices are executed using the GHCi interpreter, though it may be internal or external (if -fexternal-interpreter is used). - Regardless of which mode is used, TH splices need their dependencies loaded into the interpreter context before they can be run. This is handled by the call to loadDecls in hscCompileCoreExpr', which in turn calls loadDependencies in GHC.Linker.Loader. - loadDependencies loads packages and modules in different ways. Package dependencies are just loaded via the appropriate built shared libraries, but modules from the current package have to be loaded a different way, via loadObjects (also in GHC.Linker.Loader). Here, however, is where I get a bit lost. GHC has two strategies for loading individual objects, which it chooses between depending on whether the current value of interpreterDynamic is True. But I don’t actually understand what interpreterDynamic means! The Haddock comment just says that it determines whether or not the “interpreter uses the Dynamic way”, but I don’t see why that matters. My understanding was that GHCi *always* requires dynamic linking, since it is, after all, loading code dynamically. Under what circumstances would interpreterDynamic ever be False? Furthermore, I don’t actually understand precisely how and why this influences the choice of loading strategy. In the case that interpreterDynamic is True, GHC appears to convert the desired dyn_o object into a shared library by calling the system linker, then loads that, which can be very slow but otherwise works. However, when interpreterDynamic is False, it loads the object directly. Both paths eventually call into “the RTS linker”, implemented in rts/Linker.c, to actually load the resulting object. I have found precious little information on what the RTS linker does, in which contexts it’s used, or how precisely it works. Note [runtime-linker-phases] at the top of Linker.c has some information, but it’s mostly a high-level description of what the code actually does rather than an explanation of its role in the bigger picture. Does anyone know of any resources they could possibly point me to that help to explain how all the pieces fit together here? I’ve spent quite a bit of time reading the code, but I’m afraid I still haven’t managed to see the forest for the trees. Thanks, Alexis

Hi Alexis, Most information on this can be found on the Wiki, where a lot of these design decisions are made. e.g. https://gitlab.haskell.org/ghc/ghc/-/wikis/dynamic-ghc-programs The points you've figured out are correct so far, to answer some of your questions:
But I don’t actually understand what interpreterDynamic means! The Haddock comment just says that it determines whether or not the “interpreter uses the Dynamic way”, but I don’t see why that matters. My understanding was that GHCi *always* requires dynamic linking, since it is, after all, loading code dynamically.
DynamicWay essentially means whether or not the runtime linker uses the platform linker under the hood. When dynamic way your object files will be linked into a shared library by the RTS linker and that shared library loaded. This means that the linker itself doesn't have to do a bunch of work such as relocation processing etc. For most Unix platforms this is the default. The downside of this approach is that on every change, i.e. if you load a new object file into scope, you have to relink the shared library, unload the old one, and link the new one in. This brings with it its own set of problems, such as what happens to references you already hold to symbols on the old shared library etc.
Under what circumstances would interpreterDynamic ever be False?
In the case that interpreterDynamic is True, GHC appears to convert the desired dyn_o object into a shared library by calling the system linker,
For instance, on Windows. Linking on Windows using the system linker is
generally slower, so creating multiple shared libraries on the fly is time
consuming. There are also some practical issues, for instance base is so
big that it doesn't fit into a single DLL.
or how Windows handles data and code accesses to symbols Shared libraries,
etc. This means that on Windows we load object files and internally do all
relocation processing, run initializers etc. Everything you would need to
do to be able to run the code inside the object file.
There are several other platforms as well, such as Android, where there's
no system linker to call etc.
then loads that, which can be very slow but otherwise works. However, when
interpreterDynamic is False, it loads the object directly. Both paths
eventually call into “the RTS linker”, implemented in rts/Linker.c, to
actually load the resulting object.
Yes, the end goal is to be able to resolve a function name to an address.
So whichever strategy is chosen, we must in the end register the functions
with the RTS. Though loading a shared lib is much less error prone than
loading the object files directly. It also uses less memory and can benefit
from linker level optimizations that we don't implement in the RTS linker.
Also loading a shared library has additional benefits such as that the
system loader deals with running initializers, registering exception
tables, etc.
Hope this clarified it somewhat, but if you have any more questions feel
free to ask.
Regards,
Tamar
On Wed, Jun 1, 2022 at 2:38 AM Alexis King
Hi all,
I’ve recently been trying to better understand how and where time is spent at compile-time when running Template Haskell splices, and one of the areas I’ve been struggling to figure out is the operation of the linker. From reading the source code, here’s a summary of what I think I’ve figured out so far:
- TH splices are executed using the GHCi interpreter, though it may be internal or external (if -fexternal-interpreter is used).
- Regardless of which mode is used, TH splices need their dependencies loaded into the interpreter context before they can be run. This is handled by the call to loadDecls in hscCompileCoreExpr', which in turn calls loadDependencies in GHC.Linker.Loader.
- loadDependencies loads packages and modules in different ways. Package dependencies are just loaded via the appropriate built shared libraries, but modules from the current package have to be loaded a different way, via loadObjects (also in GHC.Linker.Loader).
Here, however, is where I get a bit lost. GHC has two strategies for loading individual objects, which it chooses between depending on whether the current value of interpreterDynamic is True. But I don’t actually understand what interpreterDynamic means! The Haddock comment just says that it determines whether or not the “interpreter uses the Dynamic way”, but I don’t see why that matters. My understanding was that GHCi *always* requires dynamic linking, since it is, after all, loading code dynamically. Under what circumstances would interpreterDynamic ever be False?
Furthermore, I don’t actually understand precisely how and why this influences the choice of loading strategy. In the case that interpreterDynamic is True, GHC appears to convert the desired dyn_o object into a shared library by calling the system linker, then loads that, which can be very slow but otherwise works. However, when interpreterDynamic is False, it loads the object directly. Both paths eventually call into “the RTS linker”, implemented in rts/Linker.c, to actually load the resulting object.
I have found precious little information on what the RTS linker does, in which contexts it’s used, or how precisely it works. Note [runtime-linker-phases] at the top of Linker.c has some information, but it’s mostly a high-level description of what the code actually does rather than an explanation of its role in the bigger picture. Does anyone know of any resources they could possibly point me to that help to explain how all the pieces fit together here? I’ve spent quite a bit of time reading the code, but I’m afraid I still haven’t managed to see the forest for the trees.
Thanks, Alexis _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Hi Alexis,
let me try to provide the high up view. I'm sorry if I'm going a bit
overboard on details you already know. But let's
start with clearing up a misconception first. No, GHCi does not always
require dynamic linking.
At the very abstract level we have a compiler that knows how to turn
various inputs into object code. This includes, C, Cmm, Haskell, assembly
(.c, .S, .cmm, .hs -> .o). Thus a complete haskell package ends up as a
bunch of object code files. We know that for dynamic linkers, we may need
slightly different arguments (e.g. PIC).
We next roll these up into archives (.a) and occasionally a pre-linked
object file (e.g. link the whole set of object files into one object file,
and resolve internal references); as well as a dynamic (shared object,
dylib) file.
For GHCi's purposes (and TH), we ultimately want to call a haskell function
to produce some AST to splice in. This haskell
function might not be defined in a different package, but the same, so
we'll have to deal with some in-flight packages anyway.
We may habe some Byte Code Object (BCO) glue code to invoke the haskell
function, which GHCi will interpret during evaluation. However that
function can depend on a large dependency tree, and we don't have BCO for
everything. I still think it would be nice to have an abstract machine and
an Intermediate Representation/ByteCode, that's a much larger project
though. Also until recently BCO's couldn't encode unboxed types/sums even.
So given the BCO glue code, we really want to call into object code (also
for performance). You can instruct GHCi to prefer object code as well via
-fobject-code.
This now leads us to the need of getting the object code somehow into
memory and running it. The dynamic system linker approach would be to turn
the object code with the function we want to call into a shared library,
and just hand that over to the linker (e.g. dlopen).
However, GHC has for a long time grown it's own in-memory static linker. As
such it has the capability to load object file (.o) and resolve them on the
fly. There is no need for system shared libraries, a system linker, and to
deal with potential bugs in that linker. It also means we can link on
platforms that don't have a system linker or a severely restricted one
(e.g. iOS).
So from a high level you can look at GHC's RTS linker as a special feature
of GHC that allows us to not need a system
provided dynamic linker, if there is none available, or using it is
undesirable.
Whether or not stuff is loaded through the internal or external interpreter
has near no difference. You _can_ load different abi's through the external
iserv (as that iserv can be built against a different abi).
Hope this helps a bit? Feel free to ask more questions.
Cheers,
Moritz
On Wed, 1 Jun 2022 at 03:38, Alexis King
Hi all,
I’ve recently been trying to better understand how and where time is spent at compile-time when running Template Haskell splices, and one of the areas I’ve been struggling to figure out is the operation of the linker. From reading the source code, here’s a summary of what I think I’ve figured out so far:
- TH splices are executed using the GHCi interpreter, though it may be internal or external (if -fexternal-interpreter is used).
- Regardless of which mode is used, TH splices need their dependencies loaded into the interpreter context before they can be run. This is handled by the call to loadDecls in hscCompileCoreExpr', which in turn calls loadDependencies in GHC.Linker.Loader.
- loadDependencies loads packages and modules in different ways. Package dependencies are just loaded via the appropriate built shared libraries, but modules from the current package have to be loaded a different way, via loadObjects (also in GHC.Linker.Loader).
Here, however, is where I get a bit lost. GHC has two strategies for loading individual objects, which it chooses between depending on whether the current value of interpreterDynamic is True. But I don’t actually understand what interpreterDynamic means! The Haddock comment just says that it determines whether or not the “interpreter uses the Dynamic way”, but I don’t see why that matters. My understanding was that GHCi *always* requires dynamic linking, since it is, after all, loading code dynamically. Under what circumstances would interpreterDynamic ever be False?
Furthermore, I don’t actually understand precisely how and why this influences the choice of loading strategy. In the case that interpreterDynamic is True, GHC appears to convert the desired dyn_o object into a shared library by calling the system linker, then loads that, which can be very slow but otherwise works. However, when interpreterDynamic is False, it loads the object directly. Both paths eventually call into “the RTS linker”, implemented in rts/Linker.c, to actually load the resulting object.
I have found precious little information on what the RTS linker does, in which contexts it’s used, or how precisely it works. Note [runtime-linker-phases] at the top of Linker.c has some information, but it’s mostly a high-level description of what the code actually does rather than an explanation of its role in the bigger picture. Does anyone know of any resources they could possibly point me to that help to explain how all the pieces fit together here? I’ve spent quite a bit of time reading the code, but I’m afraid I still haven’t managed to see the forest for the trees.
Thanks, Alexis _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
participants (3)
-
Alexis King
-
Moritz Angermann
-
Phyx