
Hi, I'm hacking on shared library support for GHC and it's coming along quite nicely. http://hpaste.org/192 My initial hacks are available from: http://clemens.endorphin.org/patches/ghc-20070605-initial-shared-libs.patch (works only with x86-64 atm, on i386 the NCG dies in the register allocator when compiling cmm files RTS) http://clemens.endorphin.org/patches/cabal-20070528-initial-shared-library.p... libtool usually takes care of creating shared libraries under *nix system. libtool solves a few minor problems associated with: 1) creating shared libraries 2) linking programs that depend on shared libraries 3) running programs that depend on shared libraries libtool is tailored to C compilers and the general opinion from #ghc towards libtool seems to be: "hands off". From the list above, I will try to sketch solutions without libtool. 1) creating shared libraries: At the moment, my second patch teaches Cabal how to build shared libraries. Basically, this is: * add -fPIC to the compiler invocation (and -optc-fPIC for c-sources), * invoke "ld -shared -Bsymbolic -o foo.so obj1.o obj2.o ...". ATM, ld is not invoked with the inter-library dependencies for the shared library being built. This is not problematic as the final executable will include all dependencies due to the ghc package dependency tracking. But DT_NEEDED on ELF influences the sequence in which shared library initializers are run. I have not yet investigated if this leads to any problems. To solve this little shortcoming, the ld-invocation could be delegated to GHC. "ghc -o libHSfoo.so Foo1.o Foo2.o". We already have a similar facility for DLLs (see MkDLL in DriverPipeline.hs). This could be abstracted into MkShared, and platform specific knowledge could be encapsulated in GHC. The benefit would be that we could easily access the package information and we could create shared libraries that contain proper DT_NEEDED sections. 2) Linking programs Linking should work out of the box: "ghc -dynamic -o HelloWorld HelloWorld.o" creates dynamically linked executable. 3) Running programs This a typical problem: ./HelloWorld ./HelloWorld: error while loading shared libraries: libHShaskell98-1.0_dyn.so: cannot open shared object file: No such file or directory There are several ways to add search paths for dynamic linking: either we do it temporarily or we encode the search paths into the executables. On ELF platforms, this works by adding -rpath to the linker flags. This adds two new entries in the .dynamic section (DT_RPATH, DT_RUNPATH) both responsible for signalling additional search paths to the dynamic linker, ld.so. According to Simon Marlow, Windows has similar mechanism via manifest files. Let's see how libtool handles this situation. libtool differentiates between installed and uninstalled libraries. When linking against installed libraries not in the standard search path, libtool uses -rpath to add these search paths to the created executable. When linking against uninstalled libraries, libtool still uses -rpath but pointing to the directory the uninstalled library is going to be installed in. libtool derives this information from the .la files+Makefiles. In any case, libtool creates a wrapper in the build directory that takes care of executing the program linked against uninstalled shared libraries. There are two strategies for accomplishing this: * add the paths of the uninstalled shared libraries to LD_LIBRARY_PATH * relink the executable with additional -rpath's libtool chooses the second strategy. How do we translate these solutions to GHC? The first question is whether we expect ghc -dynamic -package uninstalled-package -o Hello Hello.o ./Hello to work or whether we require manual intervention in these cases. If we expect this to work without intervention, we have the same options as libtool: * create a wrapper that takes care of locating the uninstalled shared libraries and sets LD_LIBRARY_PATH. * create a binary with rpath of the uninstalled libraries, and create an additional executable for deployment without these rpaths. In any case we have to modify the installer scripts to either know about where to locate the real binary, either ask ghc where to find the real binary, or either delegate the installation to ghc. The last option is basically the libtool way, "libtool --mode=install ..." When we decide to create a deployable executable at the "-o" spot, we need to * modify the invocation to manually pick up the libraries by modifying LD_LIBRARY_PATH.. this is pretty unpractical. * delegate invocation to ghc. Maybe "ghc --execute HelloWorld". libtool has a similar mechanism for executing 3rd party programs in the "dynamic environment" of the compiler executable. For instance, "gdb HelloWorld" would fails for libtool as HelloWorld is a wrapper, but "libtool --mode=execute gdb HelloWorld" works, as libtool rewrites to HelloWorld to .libs/lt-HelloWorld. And now something completely different: Create a custom ELF program interpreter for Haskell programs. Using INTERP in the ELF program header, loads up this interpreter and delegates control to it. Usually this is /lib/ld-linux.so.2, the dynamic linker, but we can replace that. Haskell has its own idea of libraries/packages. We have package.conf which gives us the location of the installed libraries. This is ok for static linking, as at link time ghc is running and knows how to invoke gcc with the correct paths. It does not matter, if package.conf is updated afterwards as the statically linked programs contain a copy of the library anyway. For dynamic linking this phase is delayed and when we encode rpath such as "/usr/lib/network-2.0/ghc-6.6/", we can not update to network-2.1 without breaking this executables. A custom programming loading stub could access the global and local package.conf and extract the library path for the dependencies and execve /lib64/ld-linux.so.2 --library-path=<paths of the dependencies> HelloWorld <args> This certainly gives us more flexibility than encoding all these rpaths statically into HelloWorld. To solve the inplace execution directly from the build directory, we might create .HelloWorld.package.conf in case a non-standard package.conf is used (non-standard=different from global and local) and have the stub loader to check for this file. I agree that the last scheme sounds a bit wild, but I argue that that's what ELF designers had in mind when they specified the INTERP header. Of course, this is only a solution for ELF platforms. Opinions :) ? -- Fruhwirth Clemens - http://clemens.endorphin.org

Clemens Fruhwirth wrote:
At the moment, my second patch teaches Cabal how to build shared libraries. Basically, this is: * add -fPIC to the compiler invocation (and -optc-fPIC for c-sources), * invoke "ld -shared -Bsymbolic -o foo.so obj1.o obj2.o ...".
ATM, ld is not invoked with the inter-library dependencies for the shared library being built. This is not problematic as the final executable will include all dependencies due to the ghc package dependency tracking. But DT_NEEDED on ELF influences the sequence in which shared library initializers are run. I have not yet investigated if this leads to any problems.
To solve this little shortcoming, the ld-invocation could be delegated to GHC. "ghc -o libHSfoo.so Foo1.o Foo2.o".
You'd presumably want this to be "ghc -shared", yes? It's definitely preferable to cook this knowledge into ghc, rather than to have the smarts in Cabal. The advantage of keeping the knowledge in GHC is that it's more decoupled, and it mirrors the behaviour people expect from other compilers (this is what gcc does, for example).
3) Running programs
Let's see how libtool handles this situation.
I would recommend against following libtool's lead in this area. Libtool's fondness for cooking RPATH into binaries makes it very difficult to deal with, because it's quite common for those binaries to get installed and distributed, RPATH and all. RPATH should only be used by a user who knows they have a large-calibre weapon pointed at their foot.
How do we translate these solutions to GHC? The first question is whether we expect
ghc -dynamic -package uninstalled-package -o Hello Hello.o ./Hello
to work or whether we require manual intervention in these cases.
Manual intervention is definitely the right thing. I could perhaps see Cabal's build command having a --in-place option that spits out a little shell script that augments LD_LIBRARY_PATH appropriately, but anything more will lead to trouble.
I agree that the last scheme sounds a bit wild, but I argue that that's what ELF designers had in mind when they specified the INTERP header.
Let's not go there, please :-) Such a move would be a big maintenance problem in its own right, and would make a lot of extra work for people packaging GHC for different distributions as they would need to cook up hacks for e.g. local SELinux policies regarding special ELF attributes and memory protections.

On 6/12/07, Bryan O'Sullivan
Let's see how libtool handles this situation.
I would recommend against following libtool's lead in this area. Libtool's fondness for cooking RPATH into binaries makes it very difficult to deal with, because it's quite common for those binaries to get installed and distributed, RPATH and all. RPATH should only be used by a user who knows they have a large-calibre weapon pointed at their foot.
The NetBSD Project has a different opinion and strongly encourages the use of RPATHs. Letting the user alter the lookup path for installed binaries is seen as a security problem. Since every NetBSD system has its libraries in one directory and third-party libraries are in /usr/pkg/lib, there's no problem in hardcoding the paths. Just another data point... -- Rich JID: rich@neswold.homeunix.net AIM: rnezzy

Rich Neswold wrote:
The NetBSD Project has a different opinion and strongly encourages the use of RPATHs.
NetBSD is of course entitled to its opinions. Indeed, this reinforces my point that use of RPATH should not be cooked into portable tools, because it's a policy that varies across ELF-using platforms.

At Tue, 12 Jun 2007 08:28:08 -0700,
Bryan O'Sullivan
Clemens Fruhwirth wrote:
To solve this little shortcoming, the ld-invocation could be delegated to GHC. "ghc -o libHSfoo.so Foo1.o Foo2.o".
You'd presumably want this to be "ghc -shared", yes?
It's definitely preferable to cook this knowledge into ghc, rather than to have the smarts in Cabal. The advantage of keeping the knowledge in GHC is that it's more decoupled, and it mirrors the behaviour people expect from other compilers (this is what gcc does, for example).
Full ack.
3) Running programs
Let's see how libtool handles this situation.
I would recommend against following libtool's lead in this area. Libtool's fondness for cooking RPATH into binaries makes it very difficult to deal with, because it's quite common for those binaries to get installed and distributed, RPATH and all. RPATH should only be used by a user who knows they have a large-calibre weapon pointed at their foot.
Did I understand that correctly that you don't want to see binaries with rpath's pointing to install directories such as /usr/lib/gcc-6.6? So, this forces us to use a wrapper in all cases.
I agree that the last scheme sounds a bit wild, but I argue that that's what ELF designers had in mind when they specified the INTERP header.
Let's not go there, please :-) Such a move would be a big maintenance problem in its own right, and would make a lot of extra work for people packaging GHC for different distributions as they would need to cook up hacks for e.g. local SELinux policies regarding special ELF attributes and memory protections.
Running i386 binaries on x86-64 platform basically does the same thing: switch the ELF program interpreter (ld-linux-x86_64.so.2 to ld-linux.so.2), so if these projects can't handle the full glory of the ELF specification then it's probably not our problem. Yes, I agree that this might not be the most trouble-free solution, but certainly it's the most flexible one. But, let's consider another approach before going in that direction: Push the responsiblity for maintaining dynamic library information to "ghc-pkg register". The custom ELF interpreter proposed above would basically cook information from package.conf into stuff like "ld.so --library-path <..>". We can play the game differently and prepare the information of package.conf when running ghc-pkg register, instead of delaying this to program startup. On way of digesting this information would be to collect all dynamic libraries in a single directory; either something haskell specific /usr/lib/ghc-6.6/dynlibs or even /usr/lib. If we start to create links from /usr/lib/libHSbase.so -> /usr/lib/ghc-6.6/libHSbase.so, we might even drop the wrapper. Other variants is to have ghc-pkg register <info-for-network-package> generate little stubs like export LD_LIBRARY_PATH=/usr/lib/network-2.0/ghc-6.6/:$LD_LIBRARY_PATH in /usr/lib/ghc-6.6/package-scripts/. Every deployed wrapper could then have the form #!/bin/sh source /usr/lib/ghc-6.6/package-scripts/* $0.real-binary "$*" -- Fruhwirth Clemens - http://clemens.endorphin.org

On Wed, Jun 13, 2007 at 03:00:04PM +0200, Clemens Fruhwirth wrote:
3) Running programs
Let's see how libtool handles this situation.
I would recommend against following libtool's lead in this area. Libtool's fondness for cooking RPATH into binaries makes it very difficult to deal with, because it's quite common for those binaries to get installed and distributed, RPATH and all. RPATH should only be used by a user who knows they have a large-calibre weapon pointed at their foot.
Did I understand that correctly that you don't want to see binaries with rpath's pointing to install directories such as /usr/lib/gcc-6.6? So, this forces us to use a wrapper in all cases.
Please think seriously about mangling the names of Haskell libraries to include version information and dropping them in $PREFIX/lib with every other language's libraries. Haskell is not special, and users expect libraries to be in /usr/lib. No wrapper needed, no RPATH needed, as far as I can see no fanciness at all. Stefan

On 6/13/07, Stefan O'Rear
Did I understand that correctly that you don't want to see binaries with rpath's pointing to install directories such as /usr/lib/gcc-6.6? So, this forces us to use a wrapper in all cases.
Please think seriously about mangling the names of Haskell libraries to include version information and dropping them in $PREFIX/lib with every other language's libraries. Haskell is not special, and users expect libraries to be in /usr/lib. No wrapper needed, no RPATH needed, as far as I can see no fanciness at all.
Actually, Haskell libraries ought to be placed in /usr/local/lib (or /usr/pkg/lib for systems that use the Package System: http://www.pkgsrc.org). Haskell libraries shouldn't mingle with the base OS libraries. But it shouldn't be separate from the other third-party libraries, either. -- Rich JID: rich@neswold.homeunix.net AIM: rnezzy

Rich Neswold wrote:
Actually, Haskell libraries ought to be placed in /usr/local/lib (or /usr/pkg/lib for systems that use the Package System: http://www.pkgsrc.org).
They should go in $(libdir), as Simon suggests. If a particular OS wants to override libdir to put them somewhere else, that's then easy to arrange. Stefan is also correct that the version of GHC used should be cooked into the name of the shared library. So a library named foo would be installed as libfoo-ghc661.so.1, or something similar (using the soname would work too). This will allow system package managers to automatically resolve dependencies safely, and to keep copies of a shared library built by different versions of GHC around without them clashing.

Bryan O'Sullivan wrote:
Rich Neswold wrote:
Actually, Haskell libraries ought to be placed in /usr/local/lib (or /usr/pkg/lib for systems that use the Package System: http://www.pkgsrc.org).
They should go in $(libdir), as Simon suggests. If a particular OS wants to override libdir to put them somewhere else, that's then easy to arrange.
Stefan is also correct that the version of GHC used should be cooked into the name of the shared library. So a library named foo would be installed as libfoo-ghc661.so.1, or something similar (using the soname would work too).
Good point - we should definitely do that. Cheers, Simon

On Wed, Jun 13, 2007 at 11:39:27AM -0700, Bryan O'Sullivan wrote:
Stefan is also correct that the version of GHC used should be cooked into the name of the shared library. So a library named foo would be installed as libfoo-ghc661.so.1, or something similar (using the soname
A minor point, but libghc661-foo.so.1 is probably better so they all clump together in directory listings etc. Thanks Ian

On Wed, Jun 13, 2007 at 10:05:11AM -0500, Rich Neswold wrote:
On 6/13/07, Stefan O'Rear
wrote: Did I understand that correctly that you don't want to see binaries with rpath's pointing to install directories such as /usr/lib/gcc-6.6? So, this forces us to use a wrapper in all cases.
Please think seriously about mangling the names of Haskell libraries to include version information and dropping them in $PREFIX/lib with every other language's libraries. Haskell is not special, and users expect libraries to be in /usr/lib. No wrapper needed, no RPATH needed, as far as I can see no fanciness at all.
Actually, Haskell libraries ought to be placed in /usr/local/lib (or /usr/pkg/lib for systems that use the Package System: http://www.pkgsrc.org). Haskell libraries shouldn't mingle with the base OS libraries. But it shouldn't be separate from the other third-party libraries, either.
No, haskell libraries ought to be placed in /usr/lib when they are being installed by the OS. I don't want my GHC 6.8 debian install to pollute /usr/local. (/usr/local is fine for local builds of course; that's why I used $PREFIX). Stefan

Clemens Fruhwirth wrote:
Libtool's fondness for cooking RPATH into binaries makes it very difficult to deal with, because it's quite common for those binaries to get installed and distributed, RPATH and all. RPATH should only be used by a user who knows they have a large-calibre weapon pointed at their foot.
Did I understand that correctly that you don't want to see binaries with rpath's pointing to install directories such as /usr/lib/gcc-6.6?
That's right.
So, this forces us to use a wrapper in all cases.
Not necessarily. Many systems provide a global mechanism to manage the paths that ld.so searches for shared objects. This is the standard on Linux, for example (/etc/ld.so.conf). For systems that don't provide this mechanism, one possibility would be to provide an option that cooks the rpath in.

On Wed, Jun 13, 2007 at 11:33:36AM -0700, Bryan O'Sullivan wrote:
Clemens Fruhwirth wrote:
Libtool's fondness for cooking RPATH into binaries makes it very difficult to deal with, because it's quite common for those binaries to get installed and distributed, RPATH and all. RPATH should only be used by a user who knows they have a large-calibre weapon pointed at their foot.
Did I understand that correctly that you don't want to see binaries with rpath's pointing to install directories such as /usr/lib/gcc-6.6?
That's right.
So, this forces us to use a wrapper in all cases.
Not necessarily. Many systems provide a global mechanism to manage the paths that ld.so searches for shared objects. This is the standard on Linux, for example (/etc/ld.so.conf). For systems that don't provide this mechanism, one possibility would be to provide an option that cooks the rpath in.
Or better yet, put them directly in one of the LD_LIBRARY_PATH dirs. $PREFIX/lib/ghc-$VERSION is a relic of the static library system and IMO shouldn't be duplicated. Stefan

On Wed, 2007-06-13 at 15:24 -0700, Stefan O'Rear wrote:
On Wed, Jun 13, 2007 at 11:33:36AM -0700, Bryan O'Sullivan wrote:
Or better yet, put them directly in one of the LD_LIBRARY_PATH dirs. $PREFIX/lib/ghc-$VERSION is a relic of the static library system and IMO shouldn't be duplicated.
Hmm.. well what Felix does is: (a) The bulk of the system lives in $PREFIX/lib/felix$VERSION (b) Libraries are called *.flx or *.h or *.a or *.so and have NO version encoding (c) Executables are compiled and run using a bash script The script is the only program installed in /usr/bin This script selects the install to use, defaulting to the most recent. You can override the install directory on the script command line or with an environment variable. The bash script uses LD_LIBRARY_PATH, etc, and command line switches to the executables, to ensure a coherent 'assembly' of components are used together. If you opt for static linkage, the resulting executable is of course independent of the version. All the standalone binaries require non-defaulted switches to select where components other than load-time shared libs live. This system supports multiple installations and also multiple cross compilation. IMHO: $PREFIX/lib/ghc-$VERSION style isn't a relic. It is the current ridiculous versioning of shared libraries which is a relic of even older faulty design which unfortunately Debian and other package installers copied. In particular, package components should always live in the same directory tree, related by package/versions, and NEVER get split up into lib/bin/include etc. In fact that is a relic of archaic dog slow Unix directory searching, and was a performance hack. It was never the correct design to split packages into directories by function, and the Unix directory tree is only capable of supporting C programs .. and it isn't really suitable for C either. The rules for package managers like Debian are that a component library split into /usr/include, /usr/lib MUST have a man page, and it MUST be a separately usable C/C++-callable component. If the only user of these libraries is the Haskell or other system, and the interfaces aren't documented, then the libraries must NOT be placed in /usr/lib. In other words, unless there is a distinct separable Debian package for the library it must NOT go in /usr/lib. The way most modern languages work, each version has a number of utterly incompatible components. When you use Felix, Ocaml, Mlton, or Haskell version xx.yy you can't use any other version of any of the libraries .. and probably can't use any user libraries either. It's likely the ABI (application binary interface) changes with each version, the set of library entry points changes in some detail, or whatever .. if not, why is a new version being released?? This doesn't happen (usually) with C/C++, but it does sometimes: the C++ ABI changed for Linux recently and it broke Debian for months and months while everything got upgraded. SO in my view, the only thing you might consider sharing between (advanced language of your choice) versions is the source code of software written in the standardised language -- but never, NEVER the support libraries, run time, or compilers. FYI: this is a particularly nasty problem for Ocaml, since the ABI changes with every patch. The debian-ocaml team has to rebuild and upload binaries to the package server every time the compiler is changed .. and end users have to recompile every program they wrote (for bytecode .. for native code there's no dynamic loading anyhow), and every library (for both bytecode and native code). This isn't necessary with Felix because it *defines* execution in terms of source code, not in terms of binaries, which are regarded as mere cached values, and managed automatically (i.e. rebuilt whenever there's a version mismatch). So for something like Haskell I'd recommend the system be split in two parts: (a) the 'system' which provides a distinct installation for every version all in a single directory tree (b) libraries written in standardised Haskell 98 or whatever are separate packages of source code, and are separately maintained and installed Any caching of partial compilations of the standard source libraries should be automatic. Option (b) is rather hard to organise without redesiging your tool chain to work entirely in terms of source code .. but I recommend that anyhow -- Haskell semantics are defined in terms of sources and the 'average' user should know about anything else: it's a basic principle of abstraction. -- John Skaller <skaller at users dot sf dot net> Felix, successor to C++: http://felix.sf.net

Hi John, On Thu, Jun 14, 2007 at 09:17:16AM +1000, skaller wrote:
The rules for package managers like Debian are that a component library split into /usr/include, /usr/lib MUST have a man page,
I'm not sure where you're getting that from? Debian policy: http://www.debian.org/doc/debian-policy/ch-docs.html#s12.1 says Each [...] function *should* have an associated manual page (my emphasis), and doesn't say anything about where the function is in the file system. (it would be great if someone were to tweak haddock to generate suitable manpages, incidentally).
and it MUST be a separately usable C/C++-callable component.
Have you got a reference for that? I can't see it in a quick skim of http://www.debian.org/doc/debian-policy/ch-sharedlibs.html and the FHS. Thanks Ian

On Fri, 2007-06-15 at 17:28 +0100, Ian Lynagh wrote:
Hi John,
On Thu, Jun 14, 2007 at 09:17:16AM +1000, skaller wrote:
The rules for package managers like Debian are that a component library split into /usr/include, /usr/lib MUST have a man page,
I'm not sure where you're getting that from? Debian policy: http://www.debian.org/doc/debian-policy/ch-docs.html#s12.1 says Each [...] function *should* have an associated manual page (my emphasis), and doesn't say anything about where the function is in the file system.
(it would be great if someone were to tweak haddock to generate suitable manpages, incidentally).
and it MUST be a separately usable C/C++-callable component.
Have you got a reference for that? I can't see it in a quick skim of http://www.debian.org/doc/debian-policy/ch-sharedlibs.html and the FHS.
No, it's the concept, not the letter of the Debian Law. When you install a shared library, it is installed as a publically accessible component to be shared. That implies the interface is both documented and callable. A library with undocumented symbols which are private to one application such as GHC, or even a set of applications generated by GHC, has no business in a directory intended for everyone to be able to use it .. especially if the interface and ABI change due to internal restructuring. One way to measure this is: if you removed GHC and applications, and there are (necessarily) no users of the remaining library package .. the library package shouldn't be in the global public place (/usr/lib+include etc). In that case, the lib is an intrinsic part of GHC, and should be in a GHC master package (sub)directory. The problem with Debian is FSH standard doesn't really account for packages like programming languages. If you look you see for example Python lives in /usr/lib/python and ocaml lives in /usr/lib/ocaml. That's an abuse forced by a grossly inadequate directory model. Ocaml and Python aren't libraries. Solaris does this more correctly IMHO: packages live in a vendor specific place, part of the /opt directory tree. Here 'opt' means 'optional', that is, not part of the core operating system and utilities required to maintain it. Anyhow, system like GHC, Ocaml, Python, Felix, etc, just do NOT fit into the C model: bin/ lib/ include/ with or without any versioning. They have bytecode, configuration, and many other kinds of 'object' and 'development source' and 'library' files, some of which may happen to be 'shared libraries' but that's irrelevant. All those things need to be managed in a (compiler-)system specific way. The 'specific' way for C and Unix OS tools is the Debian FSH ... it shouldn't be used for anything else. Note that the Ocaml team is currently grappling with a second nasty problem: Ocaml 3.10 uses a source incompatible pre-processor (camlp4). So now, not only will a non-Debian installed Ocaml library or bytecode executable be broken by an upgrade until the user recompiles from source (the usual situation for Ocaml) but now all sources using camlp4 macros are broken until the end user edits them. So the team is almost forced to support installation of non-conflicting separate versions now, just to allow migration. Once you start versioning stuff .. splitting files into subdirectories like bin/ lib/ include/ is a nightmare. Note that the 'include' part of that utterly fails for C code anyhow.. so even the basic model is flawed for the very kind of code it was designed to support. The bottom line is Debian FSH is archaic and work should be done to eliminate all user level packages from it: only core OS packages should be allowed in /usr. IMHO the best workaround for this problem is to use a thunk/ driver script in /usr/bin and put all the real stuff in a single version specific compile install directory whose location is entirely irrelevant. This is more or less what gcc does, and the gcc model works very well I think. Multiple gcc versions, including cross compiler, can be installed and 'just work' with all the right bits glued together. -- John Skaller <skaller at users dot sf dot net> Felix, successor to C++: http://felix.sf.net

On Sat, 16 Jun 2007 08:21:50 +1000
skaller
One way to measure this is: if you removed GHC and applications, and there are (necessarily) no users of the remaining library package .. the library package shouldn't be in the global public place (/usr/lib+include etc).
As I understand it, the entire point of this effort (shared libraries in GHC) is to allow dynamically linked Haskell executables. In this case, applications outside the GHC toolchain will in fact depend on these shared objects. As a concrete case, a binary darcs package could be a user of libghc66-base.so and libghc66-mtl.so -- with no dependencies on the GHC compiler package itself. Does this pass your litmus test? Cheers, Spencer Janssen

On Fri, 2007-06-15 at 19:40 -0500, Spencer Janssen wrote:
On Sat, 16 Jun 2007 08:21:50 +1000 skaller
wrote: One way to measure this is: if you removed GHC and applications, and there are (necessarily) no users of the remaining library package .. the library package shouldn't be in the global public place (/usr/lib+include etc).
As I understand it, the entire point of this effort (shared libraries in GHC) is to allow dynamically linked Haskell executables. In this case, applications outside the GHC toolchain will in fact depend on these shared objects. As a concrete case, a binary darcs package could be a user of libghc66-base.so and libghc66-mtl.so -- with no dependencies on the GHC compiler package itself.
Does this pass your litmus test?
Yes, it passes the separability test. My darcs wouldn't run otherwise! And versioning the library filename as above is a good idea too. Felix adds _dynamic for shared libs and _static for static link archives to ensure the Linux linker doesn't get confused. However, the libs still aren't fully public if the interfaces are only private details of the GHC tool chain. Hmmm. -- John Skaller <skaller at users dot sf dot net> Felix, successor to C++: http://felix.sf.net

On Sat, 2007-06-16 at 11:33 +1000, skaller wrote:
However, the libs still aren't fully public if the interfaces are only private details of the GHC tool chain. Hmmm.
Note of course that it's only been in the last few years that C++ has stabilised to the point where different implementations can agree on a common ABI. Prior to that, all the C++ libs had public interfaces with an ABI that was was an artifact of which C++ tool chain you were using. Duncan

Thanks for the analysis, Clemens. Here's what I think we should do, firstly for Unix: - GHC distributions will install the shared libraries in standard locations. (OS packagers have the option of using non-std locations together with whatever mechnism is appropriate to get the non-std locations to be registered in ld.so.conf). - an installed GHC will just link binaries as normal; the shared libs are in the standard locations, so the binary will work. - we add an option to GHC, say -hardwire-lib-paths, that tells it to use -rpath (or equivalent) when linking, - an uninstalled GHC uses -hardwire-lib-paths by default. If you want to generate a deployable executable using an uninstalled GHC, you must turn off this behaviour using -no-hardwire-lib-paths. We have some flexibility here: we could reverse this default, but I think I'd often find myself generating binaries that crash because I forgot -hardwire-lib-paths. - a GHC installed in a non-standard location (e.g. your home directory) will also use -hardwire-lib-paths by default. I think we want this, otherwise we'll get several bug reports per day about missing shared libraries. - GHC may warn if you link a binary to any shared libraries that are not in standard locations and you didn't use -hardwire-lib-paths. I'm not sure if this is possible in general, but it would be nice. - binaries that come with a GHC distribution will all be non-hardwired executables with wrapper scripts that set LD_LIBRARY_PATH. This is so that we can choose where to install the bindist at install-time. Many of these executables already have wrapper scripts anyway, this isn't a big deal. For Windows: - In a GHC distribution, ghc.exe is in the same directory as the library DLLs, so by default it will link to them (the binary's directory is searched for DLLs first on Windows). We still have a relocatable GHC installation tree. - GHC uses -hardwire-lib-paths by default, implemented by embedding manifests into binaries it creates. - we provide a way to generate a deployable binary by collecting all the DLLs it refers to in a bundle. Cheers, Simon Clemens Fruhwirth wrote:
I'm hacking on shared library support for GHC and it's coming along quite nicely. http://hpaste.org/192
My initial hacks are available from:
http://clemens.endorphin.org/patches/ghc-20070605-initial-shared-libs.patch (works only with x86-64 atm, on i386 the NCG dies in the register allocator when compiling cmm files RTS)
http://clemens.endorphin.org/patches/cabal-20070528-initial-shared-library.p...
libtool usually takes care of creating shared libraries under *nix system. libtool solves a few minor problems associated with:
1) creating shared libraries 2) linking programs that depend on shared libraries 3) running programs that depend on shared libraries
libtool is tailored to C compilers and the general opinion from #ghc towards libtool seems to be: "hands off". From the list above, I will try to sketch solutions without libtool.
1) creating shared libraries:
At the moment, my second patch teaches Cabal how to build shared libraries. Basically, this is: * add -fPIC to the compiler invocation (and -optc-fPIC for c-sources), * invoke "ld -shared -Bsymbolic -o foo.so obj1.o obj2.o ...".
ATM, ld is not invoked with the inter-library dependencies for the shared library being built. This is not problematic as the final executable will include all dependencies due to the ghc package dependency tracking. But DT_NEEDED on ELF influences the sequence in which shared library initializers are run. I have not yet investigated if this leads to any problems.
To solve this little shortcoming, the ld-invocation could be delegated to GHC. "ghc -o libHSfoo.so Foo1.o Foo2.o". We already have a similar facility for DLLs (see MkDLL in DriverPipeline.hs). This could be abstracted into MkShared, and platform specific knowledge could be encapsulated in GHC. The benefit would be that we could easily access the package information and we could create shared libraries that contain proper DT_NEEDED sections.
2) Linking programs
Linking should work out of the box:
"ghc -dynamic -o HelloWorld HelloWorld.o" creates dynamically linked executable.
3) Running programs
This a typical problem: ./HelloWorld ./HelloWorld: error while loading shared libraries: libHShaskell98-1.0_dyn.so: cannot open shared object file: No such file or directory
There are several ways to add search paths for dynamic linking: either we do it temporarily or we encode the search paths into the executables. On ELF platforms, this works by adding -rpath to the linker flags. This adds two new entries in the .dynamic section (DT_RPATH, DT_RUNPATH) both responsible for signalling additional search paths to the dynamic linker, ld.so. According to Simon Marlow, Windows has similar mechanism via manifest files.
Let's see how libtool handles this situation. libtool differentiates between installed and uninstalled libraries. When linking against installed libraries not in the standard search path, libtool uses -rpath to add these search paths to the created executable. When linking against uninstalled libraries, libtool still uses -rpath but pointing to the directory the uninstalled library is going to be installed in. libtool derives this information from the .la files+Makefiles.
In any case, libtool creates a wrapper in the build directory that takes care of executing the program linked against uninstalled shared libraries. There are two strategies for accomplishing this: * add the paths of the uninstalled shared libraries to LD_LIBRARY_PATH * relink the executable with additional -rpath's libtool chooses the second strategy.
How do we translate these solutions to GHC? The first question is whether we expect
ghc -dynamic -package uninstalled-package -o Hello Hello.o ./Hello
to work or whether we require manual intervention in these cases. If we expect this to work without intervention, we have the same options as libtool:
* create a wrapper that takes care of locating the uninstalled shared libraries and sets LD_LIBRARY_PATH.
* create a binary with rpath of the uninstalled libraries, and create an additional executable for deployment without these rpaths.
In any case we have to modify the installer scripts to either know about where to locate the real binary, either ask ghc where to find the real binary, or either delegate the installation to ghc. The last option is basically the libtool way, "libtool --mode=install ..."
When we decide to create a deployable executable at the "-o" spot, we need to
* modify the invocation to manually pick up the libraries by modifying LD_LIBRARY_PATH.. this is pretty unpractical.
* delegate invocation to ghc. Maybe "ghc --execute HelloWorld". libtool has a similar mechanism for executing 3rd party programs in the "dynamic environment" of the compiler executable. For instance, "gdb HelloWorld" would fails for libtool as HelloWorld is a wrapper, but "libtool --mode=execute gdb HelloWorld" works, as libtool rewrites to HelloWorld to .libs/lt-HelloWorld.
And now something completely different: Create a custom ELF program interpreter for Haskell programs. Using INTERP in the ELF program header, loads up this interpreter and delegates control to it. Usually this is /lib/ld-linux.so.2, the dynamic linker, but we can replace that.
Haskell has its own idea of libraries/packages. We have package.conf which gives us the location of the installed libraries. This is ok for static linking, as at link time ghc is running and knows how to invoke gcc with the correct paths. It does not matter, if package.conf is updated afterwards as the statically linked programs contain a copy of the library anyway. For dynamic linking this phase is delayed and when we encode rpath such as "/usr/lib/network-2.0/ghc-6.6/", we can not update to network-2.1 without breaking this executables.
A custom programming loading stub could access the global and local package.conf and extract the library path for the dependencies and execve
/lib64/ld-linux.so.2 --library-path=<paths of the dependencies> HelloWorld <args>
This certainly gives us more flexibility than encoding all these rpaths statically into HelloWorld. To solve the inplace execution directly from the build directory, we might create .HelloWorld.package.conf in case a non-standard package.conf is used (non-standard=different from global and local) and have the stub loader to check for this file.
I agree that the last scheme sounds a bit wild, but I argue that that's what ELF designers had in mind when they specified the INTERP header. Of course, this is only a solution for ELF platforms.
Opinions :) ? -- Fruhwirth Clemens - http://clemens.endorphin.org
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
participants (9)
-
Bryan O'Sullivan
-
Clemens Fruhwirth
-
Duncan Coutts
-
Ian Lynagh
-
Rich Neswold
-
Simon Marlow
-
skaller
-
Spencer Janssen
-
Stefan O'Rear