RE: RFC: ghc's dynamic linker

Simon Marlow

27 Aug 2002 27 Aug '02

4:19 p.m.

Ok, let's start with a possible API to the library I think you're asking for: loadLibrary :: FilePath -> IO () lookupEntity :: String -> IO a (there's no type checking at dynamic link time of course, so you get to claim the returned object is whatever type you like). Right, now what would it take to implement this. As Duncan points out, this is almost possible already using the GHCi dynamic linker, which is available to any program compiled with GHC via the FFI. The interface is fairly straightforward, eg: foreign import "initLinker" unsafe initLinker :: IO () foreign import "lookupSymbol" unsafe c_lookupSymbol :: CString -> IO (Ptr a) foreign import "loadObj" unsafe c_loadObj :: CString -> IO Int but the main problem is that the dynamic linker can't link new modules to symbols in the currently running binary. So, in order to link a new Haskell module, you first have to load up a fresh copy of the 'base' and 'haskell98' packages, just like GHCi does. It *almost* works to do this, except that you get strange effects, one of which is that you have two copies of stdout each with their own buffer. Going the final step and allowing linkage to the current binary is possible, it just means the linker has to know how to read the symbol table out of the binary, and you have to avoid running 'strip'. I believe reading the symbol table is quite straightforward, the main problem being that on Unix you don't actually know where the binary lives, so you have to wire it in or search the PATH. Another problem is that you don't normally link a *complete* copy of the base package into your binary, you only link the bits you need. Linking the whole lot would mean every binary would be about 10M; but doing this on the basis of a flag which you turn on when you want to do dynamic linking maybe isn't so bad. There are a couple of other options: - make your program into a collection of dynamically-linked libraries itself. i.e. have a little stub main() which links with the RTS, and loads up 'base' followed by your program when it starts. The startup cost would be high (we don't do lazy linking in Haskell), but you'd only get one copy of the base package and this is possible right now. - make GHC generate objects that play nicely with the standarad dynamic linker on your system. This is entirely non-trivial, I believe. See previous discussions on this list. However, it might get easier in the future; I'm currently working on removing the need to distinguish code from data in GHC's RTS, which will eliminate some of the problems. Summary: extending GHC's dynamic linker to be able to slurp in the symbol table from the currently running binary would be useful, and is a good bite-sized GHC hacker task. I can't guarantee that we'll get around to it in a timely fashion, but contributions are, as always, entirely welcome... Cheers, Simon

Show replies by date

Mike Thomas

27 Aug 27 Aug

11:21 p.m.

New subject: RFC: ghc's dynamic linker

Hi there.

...

Ok, let's start with a possible API to the library I think you're asking for:

loadLibrary :: FilePath -> IO ()

"loadLibrary" is probably not a good choice as there is a Win32 function "LoadLibrary" which deals specifically with DLL's as opposed to object files which is the kind of file (or modifications thereof) that I understand this thread to be about. .....

...

Summary: extending GHC's dynamic linker to be able to slurp in the symbol table from the currently running binary would be useful, and is a good bite-sized GHC hacker task. I can't guarantee that we'll get around to it in a timely fashion, but contributions are, as always, entirely welcome...

By way of reference: I have been working recently on the Mingw32 version of Gnu Common Lisp which has a cross platform system of dynamically compiling and loading object modules (ie the Common Lisp "compile-file" and "compile" functions). There are two alternative pathways for this at present: 1. The old way (which I believe originally came from GNU Emacs) - During the early stages of building the GCL system, a program called "rsym" is run on the execuatable to produce a list of symbols and addresses. These data are then loaded into an appropriate data structure in the running system and the image is saved again. When "compile" is called on a function it is compiled to C, that C is in turn compiled to an object file by the local C compiler and then certain symbolic information needed by GCL is literally appended to the object file without further ado. Specially written linking code slurps up the modified object file and links it into the running GCL system. This works on a number of platforms. 2. The BFD way - Similar stuff is done but via the BFD library. This works on many Debian platforms (about 16 I believe) including 64 bits and non-intel architectures. The code is generic. It does not work on Windows as the BFD library seems to be missing functionality in that area. I believe that SGI might also be a problem too. We hope to make this the standard way of dynamically linking object files but I am caught by my own lack of interest in and ignorance of how BFD works. The source code (horrific stuff from many years of uncontrolled but undoubtedly clever hackery) is available from CVS at: http://savannah.gnu.org/projects/gcl/ I can refer you to the appropriate bits if you are interested. Naturally it is all GNU copylefted and therefore incompatible with GHC, but it could be used as a road map by interested parties. Cheers Mike Thomas.

Alastair Reid

28 Aug 28 Aug

12:34 a.m.

New subject: RFC: ghc's dynamic linker

...

2. The BFD way

I just want to remark that anyone contemplating hacking on object files should steer well clear of BFD which is the most inconsistently implemented, poorly (though abundantly) documented obstacle to quality programming it has ever been my misfortune to use. Particularily galling is the fact that BFD provides little indication (in documentation, compile-time errors, etc.) that certain functionality is completely unimplemented on a particular platform. In contrast, libelf is widely available (multiple implementations exist), well documented, a joy to use and covers everything (well, everything that GHC runs on) but Windows. I once had a tool to manipulate .o files using BFD. When I needed to upgrade it slightly (to access debugging information as well as normal symbols), I spent the best part of a week failing to get this small piece of additional functionality to work with BFD and less than a day rewriting the whole thing from scratch with libelf. -- Alastair Reid alastair@reid-consulting-uk.ltd.uk Reid Consulting (UK) Limited http://www.reid-consulting-uk.ltd.uk/alastair/ ps It's not clear to me whether BFD uses the wrong abstractions or simply tries to do too much and so ends up doing a lot badly. There's even the (slim) possibility that BFD just needs better documentation and a few small, simple example programs instead of the multi-thousand line programs it ships with. It might be interesting to try to write a BFD2 which fixes BFDs most glaring problems based on the enormous amount of experience embodied by BFD's implementation.

Mike Thomas

12:40 a.m.

New subject: RFC: ghc's dynamic linker

Hi Alastair.

...

I just want to remark that anyone contemplating hacking on object files should steer well clear of BFD which is the most inconsistently implemented, poorly (though abundantly) documented obstacle to quality programming it has ever been my misfortune to use. Particularily galling is the fact that BFD provides little indication (in documentation, compile-time errors, etc.) that certain functionality is completely unimplemented on a particular platform.

Thanks for that - I have had a similar experience and am not experienced in this side of programming so I was finding the BFD library to be particularly opaque. In the context of GCL it is indeed Linux Elf platforms which work nicely with BFD. Cheers Mike Thomas.

Duncan Coutts

12:51 a.m.

New subject: RFC: ghc's dynamic linker

On 28 Aug 2002 01:34:37 +0100 Alastair Reid wrote:

...

...
2. The BFD way

I was just about to ask about this. I was going to ask what ghc uses for manipulating .o files. I've come accross BFD through gdb but have never had to use it seriously. I'll take Alastair's advice and try to steer clear if possible!

...

I just want to remark that anyone contemplating hacking on object files should steer well clear of BFD which is the most inconsistently implemented, poorly (though abundantly) documented obstacle to quality programming it has ever been my misfortune to use. Particularily galling is the fact that BFD provides little indication (in documentation, compile-time errors, etc.) that certain functionality is completely unimplemented on a particular platform.

In contrast, libelf is widely available (multiple implementations exist), well documented, a joy to use and covers everything (well, everything that GHC runs on) but Windows.

Duncan

Duncan Coutts

27 Aug 27 Aug

11:50 p.m.

New subject: RFC: ghc's dynamic linker

On Tue, 27 Aug 2002 17:19:15 +0100 "Simon Marlow" wrote:

...

Right, now what would it take to implement this. As Duncan points out, this is almost possible already using the GHCi dynamic linker, which is available to any program compiled with GHC via the FFI. The interface is fairly straightforward, eg:

[snip] This is what Andre Pang has done, modulo any changes between ghc 5.03 & 5.04. http://www.algorithm.com.au/wiki-files/hacking/haskell/chiba-0.2.tar.gz Andre says: The actual runtime loader itself is in a runtime_loader/ directory in the tarball. The best example of how to use it is in the tests/ChibaTest* files.

...

but the main problem is that the dynamic linker can't link new modules to symbols in the currently running binary. So, in order to link a new Haskell module, you first have to load up a fresh copy of the 'base' and 'haskell98' packages, just like GHCi does. It *almost* works to do this, except that you get strange effects, one of which is that you have two copies of stdout each with their own buffer.

This is exatly what Andre does in Chiba, he has to load extra copies of certian interface modules, but it is ok since the Haskell modules are stateless.

...

Going the final step and allowing linkage to the current binary is possible, it just means the linker has to know how to read the symbol table out of the binary, and you have to avoid running 'strip'. I believe reading the symbol table is quite straightforward, the main problem being that on Unix you don't actually know where the binary lives, so you have to wire it in or search the PATH.

Would it be easier/better to exlicitly specify a symbol map file for the linker to use to locate the appropriate points in the current binary. Then perhaps we need a flag to ask ghc to spit out a symbol map along with the .o. Alternatively there tools to extract the map from a .o, I don't know - I'm not a bin-utils guru!

...

Another problem is that you don't normally link a *complete* copy of the base package into your binary, you only link the bits you need. Linking the whole lot would mean every binary would be about 10M; but doing this on the basis of a flag which you turn on when you want to do dynamic linking maybe isn't so bad.

The only bit that I would want to include completely is the API module which would likely be quite small as it would only rexport other parts of the program though a smaller simpler interface. Ah, I see what you're saying now, we'd have to include the whole of the standard library, or indeed any library that we wanted the plugins to be able to use. The system's dynamic linker doesn't have this problem because it always has all of the libraries avaliable and just loads them on demand. With static linking we have to predict what would be wanted beforehand. Aaarg! Perhaps linking all of the standard library wouldn't be so bad (using a special flag of course) since only the bits that are used get loaded into memory, leaving just the large disk overhead.

...

There are a couple of other options:

- make your program into a collection of dynamically-linked libraries itself. i.e. have a little stub main() which links with the RTS, and loads up 'base' followed by your program when it starts. The startup cost would be high (we don't do lazy linking in Haskell), but you'd only get one copy of the base package and this is possible right now.

I don't understand this, would you mind explainging a bit more.

...

Summary: extending GHC's dynamic linker to be able to slurp in the symbol table from the currently running binary would be useful, and is a good bite-sized GHC hacker task. I can't guarantee that we'll get around to it in a timely fashion, but contributions are, as always, entirely welcome...

Having made the suggestion, I'ts only right that I contribute my (limited) skills. I have done some gdb hacking before (not out of choice you understand!) so I ought to know a bit about .o's ELF sections and such. Duncan

Andre Pang

28 Aug 28 Aug

5:56 a.m.

New subject: RFC: ghc's dynamic linker

On Tue, Aug 27, 2002 at 05:19:15 +0100, Simon Marlow wrote:

...

'haskell98' packages, just like GHCi does. It *almost* works to do this, except that you get strange effects, one of which is that you have two copies of stdout each with their own buffer.

If it's not too much trouble, do you mind explaining why this is so? It's just to satisfy my curiosity; don't worry if it's too long-winded or contains really heavy wizardry :).

...

Going the final step and allowing linkage to the current binary is possible, it just means the linker has to know how to read the symbol table out of the binary, and you have to avoid running 'strip'. I believe reading the symbol table is quite straightforward, the main problem being that on Unix you don't actually know where the binary lives, so you have to wire it in or search the PATH.

You've already got the symbol table in memory though, right? Is it absolutely necessary to re-read the binary? BTW, I tried using objcopy (part of binutils) to 'merge' together several plugin modules by copying over all the symbols in a bunch of files to a single .o file. Loading that up using the GHCI linker didn't work :(. If there's no reason why it shouldn't work, I'll try again ... it's entirely possible that I stuffed up somewhere.

...

Another problem is that you don't normally link a *complete* copy of the base package into your binary, you only link the bits you need. Linking the whole lot would mean every binary would be about 10M; but doing this on the basis of a flag which you turn on when you want to do dynamic linking maybe isn't so bad.

How about a feature (maybe a tool separate to GHC) which can find the dependencies required for a particular symbol, and removes all the excess baggage? e.g. You have a program called, uhh, "Program", and a plugin called, uhh, "Plugin", with Program containing the symbols 1, 2, 3, and Plugin containing symbols A B C. Symbol "1" in Program uses the "head" function from the standard library, so you need to compile that into Program, and symbol "B" in Plugin uses the "tail" function, so you need to compile that in: Program: 1 head 2 3 Plugin: A B tail C That should work, no? Maybe it's even possible to do this right now using a combination of evil GHC hacks and binutils? However, then you have the problem that the RTS doesn't _know_ that it has to load the "tail" symbol when it loads the plugin. Program will just load symbols A, B, C, and then die a sad death when it realises it can't resolve the symbols (since the tail symbol required for B is missing). I guess you could work around this by using some "stub" function (like "dependentSymbols") which the linker first loads. In Plugin.hs: dependentSymbols = ["tail"] In Program.hs: loadModule "plugin" -- Load the symbols which A, B, C require loadFunction "dependentSymbols" resolveFunctions mapM_ (loadFunction) dependentSymbols -- Load A, B, C themselves mapM_ (loadFunction) ["A", "B", "C"] Hopefully I'm not describing non-issues here ...

...

- make your program into a collection of dynamically-linked libraries itself. i.e. have a little stub main() which links with the RTS, and loads up 'base' followed by your program when it starts. The startup cost would be high (we don't do lazy linking in Haskell), but you'd only get one copy of the base package and this is possible right now.

I was thinking of doing this when I started my own project. However, I don't think it's really acceptable, because: 1. You still need the base Haskell libraries on the system, which means that you either ship it with your application, or the user needs GHC installed on their system. (I'm a big fan of the "it should just work" principle when user downloads and installs applications.) If the user has GHC installed on their system, it probably also needs to be the same version of GHC, otherwise you will probably run into Bad Problems. 2. As you say, startup cost (time) is high. This is fine for some applications, but my next project will be invoked as a CGI, where the ~2 second overhead involved at startup really kills performance (to the point where it won't scale to handle lots of users). Of course, the big advantage is that you can do this right now.

...

- make GHC generate objects that play nicely with the standarad dynamic linker on your system. This is entirely non-trivial, I believe. See previous discussions on this list. However, it might get easier in the future; I'm currently working on removing the need to distinguish code from data in GHC's RTS, which will eliminate some of the problems.

Just a comment: it's, well, interesting how GHC has this fantastic method of importing modules at runtime, which is similar (at least in what it achieves) to the dynamic linker. I dunno, it feels like reinvent-wheel syndrome. Not saying that's a bad or good thing, just an observation. -- #ozone/algorithm - trust.in.love.to.save

Andre Pang

7:52 a.m.

New subject: RFC: ghc's dynamic linker

On Wed, Aug 28, 2002 at 03:56:36 +1000, Andre Pang wrote:

...

In Plugin.hs:

dependentSymbols = ["tail"]

In Program.hs:

loadModule "plugin" -- Load the symbols which A, B, C require loadFunction "dependentSymbols" resolveFunctions mapM_ (loadFunction) dependentSymbols -- Load A, B, C themselves mapM_ (loadFunction) ["A", "B", "C"]

Hopefully I'm not describing non-issues here ...

Sorry, ignore that above code sample; it's complete hogwash -- resolveSymbols tries to resolve all the functions in _all_ currently loaded modules, not just the functions that you load with loadFunction. That makes that code rather incorrect. I blame the sushi (mmm, sushi). -- #ozone/algorithm - trust.in.love.to.save

Alastair Reid

10:29 a.m.

New subject: RFC: ghc's dynamic linker

...

Just a comment: it's, well, interesting how GHC has this fantastic method of importing modules at runtime, which is similar (at least in what it achieves) to the dynamic linker. I dunno, it feels like reinvent-wheel syndrome. Not saying that's a bad or good thing, just an observation.

I've played with dynamic linking quite a bit myself. The problem with dl_open and friends is that it was designed with very, very narrow goals and the API doesn't provide anything not required to meet those goals. It would be quite straightforward for the library to support multiple namespaces so that you can load two versions of the same library and have some of your code use one namespace and some use the other namespace. No joy. (I don't think GHC needs this but it's something we wanted for Knit. Knit is a component-programming extension of C - if you think 'ML functors', you're in the right ballpark.) It would probably be trivial for the library to provide a way to say 'Please insert symbol "X" into your lookup table' but it doesn't. (IIRC, this is the essential function that the standard library lacks.) -- Alastair Reid alastair@reid-consulting-uk.ltd.uk Reid Consulting (UK) Limited http://www.reid-consulting-uk.ltd.uk/alastair/

8361

Age (days ago)

8362

Last active (days ago)

List overview

Download

8 comments

5 participants

participants (5)

Alastair Reid
Andre Pang
Duncan Coutts
Mike Thomas
Simon Marlow