[GHC] #11587: Place shared objects in LIBDIR

#11587: Place shared objects in LIBDIR -------------------------------------+------------------------------------- Reporter: bgamari | Owner: Type: bug | Status: new Priority: normal | Milestone: 8.0.1 Component: Package | Version: 7.10.3 system | Keywords: | Operating System: Unknown/Multiple Architecture: | Type of failure: Runtime Unknown/Multiple | performance bug Test Case: | Blocked By: Blocking: | Related Tickets: Differential Rev(s): | Wiki Page: -------------------------------------+------------------------------------- If one compiles a program with `-dynamic`, the resulting executable includes in its `rpath` the library directory of every Haskell package that the program links against. This causes a significant number of excess system calls at program start-up. For instance, in the case of a dynamically linked `ghc` executable on Debian 8, compiling a trivial "hello world" application produces over 800 `open` calls, the majority of which originate from the dynamic linker. e.g., {{{ $ strace -f -e open ghc-7.10.3 -c -fforce-recomp Test.hs 2>&1 | grep open ... open("/usr/lib/ghc/bin/../haske_GGvi737nHHfG6zm2y7Rimi/tls/x86_64/libtinfo.so.5", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) open("/usr/lib/ghc/bin/../haske_GGvi737nHHfG6zm2y7Rimi/tls/libtinfo.so.5", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) open("/usr/lib/ghc/bin/../haske_GGvi737nHHfG6zm2y7Rimi/x86_64/libtinfo.so.5", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) open("/usr/lib/ghc/bin/../haske_GGvi737nHHfG6zm2y7Rimi/libtinfo.so.5", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) open("/usr/lib/ghc/bin/../termi_6iVf4EBnOgfIaaOCLRs8jl/tls/x86_64/libtinfo.so.5", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) open("/usr/lib/ghc/bin/../termi_6iVf4EBnOgfIaaOCLRs8jl/tls/libtinfo.so.5", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) open("/usr/lib/ghc/bin/../termi_6iVf4EBnOgfIaaOCLRs8jl/x86_64/libtinfo.so.5", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) open("/usr/lib/ghc/bin/../termi_6iVf4EBnOgfIaaOCLRs8jl/libtinfo.so.5", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) open("/usr/lib/ghc/bin/../ghc_0AG9TOjDEtx4Ji3wSwHOBe/tls/x86_64/libtinfo.so.5", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) open("/usr/lib/ghc/bin/../ghc_0AG9TOjDEtx4Ji3wSwHOBe/tls/libtinfo.so.5", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) open("/usr/lib/ghc/bin/../ghc_0AG9TOjDEtx4Ji3wSwHOBe/x86_64/libtinfo.so.5", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) ... }}} The dynamic linker must look in nearly 25 Haskell library directories to locate every system library! This is madness. Instead of placing each shared library in its own directory, `$LIBDIR/$PKG_KEY/lib$PKG_KEY.so` as we do currently, why not just place them in `$LIBDIR`, e.g. `$LIBDIR/libPKG_KEY.so`. This would mean that we need to include only one directory, `$LIBDIR`, in `rpath`. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11587 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11587: Place shared objects in LIBDIR -------------------------------------+------------------------------------- Reporter: bgamari | Owner: Type: bug | Status: new Priority: normal | Milestone: 8.0.1 Component: Package system | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Description changed by bgamari: @@ -42,1 +42,1 @@ - them in `$LIBDIR`, e.g. `$LIBDIR/libPKG_KEY.so`. This would mean that we + them in `$LIBDIR`, e.g. `$LIBDIR/lib$PKG_KEY.so`. This would mean that we New description: If one compiles a program with `-dynamic`, the resulting executable includes in its `rpath` the library directory of every Haskell package that the program links against. This causes a significant number of excess system calls at program start-up. For instance, in the case of a dynamically linked `ghc` executable on Debian 8, compiling a trivial "hello world" application produces over 800 `open` calls, the majority of which originate from the dynamic linker. e.g., {{{ $ strace -f -e open ghc-7.10.3 -c -fforce-recomp Test.hs 2>&1 | grep open ... open("/usr/lib/ghc/bin/../haske_GGvi737nHHfG6zm2y7Rimi/tls/x86_64/libtinfo.so.5", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) open("/usr/lib/ghc/bin/../haske_GGvi737nHHfG6zm2y7Rimi/tls/libtinfo.so.5", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) open("/usr/lib/ghc/bin/../haske_GGvi737nHHfG6zm2y7Rimi/x86_64/libtinfo.so.5", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) open("/usr/lib/ghc/bin/../haske_GGvi737nHHfG6zm2y7Rimi/libtinfo.so.5", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) open("/usr/lib/ghc/bin/../termi_6iVf4EBnOgfIaaOCLRs8jl/tls/x86_64/libtinfo.so.5", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) open("/usr/lib/ghc/bin/../termi_6iVf4EBnOgfIaaOCLRs8jl/tls/libtinfo.so.5", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) open("/usr/lib/ghc/bin/../termi_6iVf4EBnOgfIaaOCLRs8jl/x86_64/libtinfo.so.5", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) open("/usr/lib/ghc/bin/../termi_6iVf4EBnOgfIaaOCLRs8jl/libtinfo.so.5", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) open("/usr/lib/ghc/bin/../ghc_0AG9TOjDEtx4Ji3wSwHOBe/tls/x86_64/libtinfo.so.5", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) open("/usr/lib/ghc/bin/../ghc_0AG9TOjDEtx4Ji3wSwHOBe/tls/libtinfo.so.5", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) open("/usr/lib/ghc/bin/../ghc_0AG9TOjDEtx4Ji3wSwHOBe/x86_64/libtinfo.so.5", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) ... }}} The dynamic linker must look in nearly 25 Haskell library directories to locate every system library! This is madness. Instead of placing each shared library in its own directory, `$LIBDIR/$PKG_KEY/lib$PKG_KEY.so` as we do currently, why not just place them in `$LIBDIR`, e.g. `$LIBDIR/lib$PKG_KEY.so`. This would mean that we need to include only one directory, `$LIBDIR`, in `rpath`. -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11587#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11587: Place shared objects in LIBDIR -------------------------------------+------------------------------------- Reporter: bgamari | Owner: Type: bug | Status: new Priority: normal | Milestone: 8.0.1 Component: Package system | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by trommler): * cc: trommler (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11587#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11587: Place shared objects in LIBDIR -------------------------------------+------------------------------------- Reporter: bgamari | Owner: Type: bug | Status: new Priority: normal | Milestone: 8.0.1 Component: Package system | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by rwbarton):
The dynamic linker must look in nearly 25 Haskell library directories to locate every system library! This is madness.
Not only that, but just to spell it all out explicitly, * The dynamic linker has to look in those nearly 25 Haskell library directories for each Haskell library, too, making the behavior quadratic in the number of Haskell packages. * This applies not just to `ghc` itself, but to any executable that uses dynamic Haskell libraries. Some people have programs that use hundreds of different Haskell packages. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11587#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11587: Place shared objects in LIBDIR -------------------------------------+------------------------------------- Reporter: bgamari | Owner: Type: bug | Status: new Priority: normal | Milestone: 8.0.1 Component: Package system | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by chak): * cc: trommler (removed) * cc: Trommler, chak (added) Comment: I have wondered about this set up in the past. More precisely, I think having the dynamic libraries for different packages in separate directories with the interface files etc of the package makes a lot of sense. I keeps the `$LIBDIR` tidy. However, the dynamic linking set up does appear to be rather inefficient and, at least on OS X, it makes relocating GHC distributions very hard. In the Haskell for Mac build, I hence, post-process all executables and dynamic libraries to optimise the linking process. As my ELF is a bit rusty let me explain it in MACH-O terms. I am setting the `RPATH` in all executables such that it points to `$LIBDIR` and I set the library name of each dynamic library to include the package directory. For example, for `base`, we might have `base_GDytRqRVSUX7zckgKqJjgw/libHSbase-4.8.1.0 -GDytRqRVSUX7zckgKqJjgw-ghc7.10.2.dylib`. I also set `RPATH` to be relative to `@loader_PATH`, which gives me a relocatable set of dynamic libraries and GHC executables. So, for example, here is what `base` looks like {{{ LC 03: LC_ID_DYLIB @rpath/base_GDytRqRVSUX7zckgKqJjgw/libHSbase-4.8.1.0 -GDytRqRVSUX7zckgKqJjgw-ghc7.10.2.dylib ... LC 12: LC_LOAD_DYLIB @rpath/integ_2aU3IZNMF9a7mQ0OzsZ0dS /libHSinteger-gmp-1.0.0.0-2aU3IZNMF9a7mQ0OzsZ0dS-ghc7.10.2.dylib LC 13: LC_LOAD_DYLIB @rpath/ghcpr_8TmvWUcS1U1IKHT0levwg3 /libHSghc-prim-0.4.0.0-8TmvWUcS1U1IKHT0levwg3-ghc7.10.2.dylib ... LC 19: LC_RPATH @loader_path/.. }}} This avoids the quadratic explosion of the search space, but still keeps the dynamic libraries in the package directories (and `$LIBDIR` tidier). This is definitely the better set up on OS X. Can't we do something equivalent on Linux? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11587#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11587: Place shared objects in LIBDIR -------------------------------------+------------------------------------- Reporter: bgamari | Owner: Type: bug | Status: new Priority: normal | Milestone: 8.0.1 Component: Package system | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by trommler): I think the real issue is that Haskell's libraries are all in different directories and so the dynamic linker must read at least one directory and one file for each Haskell dynamic library. The quadratic time required to find a library seems to be small compared to the disk access to read all directories in the `RPATH` from the hard disk. The second read of a directory would most of the time hit the file system cache. I tried this experiment on my x86_64 Linux machine (still using spinning hard drives) with a dynamically linked GHC: {{{ $ time ghc --version The Glorious Glasgow Haskell Compilation System, version 7.10.3 real 0m4.805s user 0m0.044s sys 0m0.080s $ time ghc --version The Glorious Glasgow Haskell Compilation System, version 7.10.3 real 0m0.048s user 0m0.024s sys 0m0.024s }}} Given these numbers I am in favour of @bgamari's original suggestion. If we do not want to clutter GHC's `$LIBDIR` then we could still put all Haskell dynamic libraries into one subdirectory with O(1) time cost. The solution in comment:4 could be implemented in Linux (and ELF in general) too but I think performance would be as bad (at least when disk access is slow). To open `foodir/libbar.so` the runtime linker still needs to read `foodir` and then read `libbar.so`. IIRC, GHC 8.0 does not encode the ABI hash in the dynamic library's file name anymore but has it only in the package's directory name. We will need to revisit that decision. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11587#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11587: Place shared objects in LIBDIR -------------------------------------+------------------------------------- Reporter: bgamari | Owner: Type: bug | Status: new Priority: normal | Milestone: 8.2.1 Component: Package system | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * cc: dcoutts (added) * milestone: 8.0.1 => 8.2.1 Comment: Adding Duncan as this will be primarily a Cabal change, but sadly this won't be happening for 8.0.1. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11587#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11587: Place shared objects in LIBDIR -------------------------------------+------------------------------------- Reporter: bgamari | Owner: Type: bug | Status: new Priority: normal | Milestone: 8.2.1 Component: Package system | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by duncan): Note that some more sharing is possible, given typical layouts. For example for nix or for cabal new-build we install into a store. {{{ $store/$pkgid-$hash/libHSpkgname-ver.so }}} (Currently it's actually worse than this since the libname includes hashes too, which should be unnecessary given the separated dirs) I'm not sure if this is possible with ELF, but if we can include part of the directory into the libname / location, and a separate RUN_PATH then we could use a scheme like: {{{ RUN_PATH /home/me/.cabal/store SO_NEEDED pkgname-ver-hash/libHSpkgname-ver.so SO_NEEDED ... }}} This does appear to be possible with MachO, ie like: {{{ LC_RPATH /home/me/.cabal/store LC_LOAD_DYLIB @rpath/pkgname-ver-hash/libHSpkgname-ver.dynlib }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11587#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11587: Place shared objects in LIBDIR -------------------------------------+------------------------------------- Reporter: bgamari | Owner: Type: bug | Status: new Priority: normal | Milestone: 8.2.1 Component: Package system | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by rwbarton): According to my experiments (I only tested on Debian x86_64 with GNU ld, not gold) you can set up the NEEDED entry in that way if either * `libHSpkgname-ver.so` was built with `-soname pkgname-ver-hash /libHSpkgname-ver.so`, or * `libHSpkgname-ver.so` was built without any `-soname` set, and you link to it with {{{ -L/home/me/.cabal/store -l:pkgname-ver-hash/libHSpkgname-ver.so }}} If you set a SONAME when building `libHSpkgname-ver.so`, then it seems to be impossible to create a NEEDED entry of any other value when building a library that has it as a dependency. So if the SONAME is `libHSpkgname- ver.so`, you have to add the directory that the library lives in to the run path, as far as I can tell. I don't know how portable any of this behavior is. GHC sets the SONAME to `libHSpkgname-ver.so`, originally due to 6efacfe8bcbe66dfc3b52397ccbd34a58890520d. I don't know if it would be okay to unset it or to include the directory name in the SONAME. In any case it seems simplest to solve the problem by just putting all the shared libraries in the same directory... -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11587#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11587: Place shared objects in LIBDIR -------------------------------------+------------------------------------- Reporter: bgamari | Owner: Type: bug | Status: new Priority: normal | Milestone: 8.2.1 Component: Build System | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by rwbarton): * component: Package system => Build System Comment: This sort of happened for 8.0.1 after all: there's a new package description field `dynamic-library-dirs` for the location of the shared library and by default Cabal now uses the `$LIBDIR/lib$PKG_KEY.so` layout suggested in this ticket. As I understand it this doesn't yet apply to the libraries distributed with GHC, though; but at least the number of libraries this affects is now constant (~25, rather than however many hundred dependencies a non-GHC program might have). -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11587#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11587: Place shared objects in LIBDIR -------------------------------------+------------------------------------- Reporter: bgamari | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.4.1 Component: Build System | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #12031 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * related: => #12031 Comment: We have this same problem with static libraries and compile-time linking. See #14031. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11587#comment:11 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11587: Place shared objects in LIBDIR -------------------------------------+------------------------------------- Reporter: bgamari | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.4.1 Component: Build System | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #12031 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by joeyhess): With ghc 8.0.2, I'm seeing 2500 ENOENTS. That is a large improvement from before, but still expensive. It adds around 200 ms to the startup time. My program is linked dynamically with 203 haskell libraries, and there are a dozen paths still in RPATH for the libraries bundled with ghc. So, while there are only a few bundled libraries, their RPATHs still multiply badly with the often large numbers of libraries used by haskell programs. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11587#comment:12 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC