haskell package layout: a proposal

Over on my blog, I've put up a proposal for changing the default layout of installed package pieces: http://mtnviewmark.wordpress.com/2010/12/02/haskell-package-layout/ Thoughts? - Mark

Mark Lentczner schrieb:
Over on my blog, I've put up a proposal for changing the default layout of installed package pieces:
http://mtnviewmark.wordpress.com/2010/12/02/haskell-package-layout/
Thoughts?
About 1. A package A with a specific version v must provide a specific API independent of the compiler. If a package B imports A-v it cannot additionally check what compiler was used to compile A (and B, too). Thus documentation should really only depend on the package version. If there is a feature, that is available only for a specific compiler, then this must be moved into a separate package C. Package C can then be compiled completely or not on a given compiler, but not in parts. I have an additional problem with the current layout: You may use the same compiler version on different operating systems and processors, say Solaris/SPARC and Solaris/Intel. It's currently not possible to use them in parallel, unless you use different Cabal directories. In Modula-3 they define Targets, where a target specifies Operating system, Processor, Linker object format and compiler backend (GCC based or othes). In short: The target contains every feature, that can make an installed library different for the same compiler version. I have still a problem with the current local Cabal directory structure. Here everything is build below dist/build. I use Cabal in development and have long dependency chains and different compilers installed. If a basic package changes I have to recompile all dependent packages, but ghc's 'make' feature of recompiling only dependent modules does not help, because I have to recompile all modules, because dist/build cannot hold the compiled files for two compiler versions.

On 04/12/10 06:37, Mark Lentczner wrote:
Over on my blog, I've put up a proposal for changing the default layout of installed package pieces:
http://mtnviewmark.wordpress.com/2010/12/02/haskell-package-layout/
Thoughts?
As I understand it the main issue you bring up is that 1. It's not possible to install multiple versions of packages containing binaries, and/or documentation. 2. It's not possible to install the same version of a package, compiled with different compilers (or compiler versions), when that package contains binaries and/or documentation. and as a corollary 3. It IS possible to install multiple version of a package as long as it only contains a library. 4. It IS possible to install the same version of a package, compiled with different GHC versions, when that package only contains a library. Is this correct? I've come up against this limitation myself when packaging Hackage packages for ArchLinux in such a way that multiple version could be installed in parallel. That actually unearthed another limitation, one that you don't mention: the documentation index file, /usr/share/doc/ghc/html/libraries/index.html on my system. This file can be rebuilt with /usr/share/doc/ghc/html/libraries/gen_contents_index, but it only allows for one entry for each library. Further splitting up the hierarchy to better deal with multiple versions and configurations of a package being installed in parallel is probably a good thing. However, I don't think the split you make is very likely to be adopted. You suggest: $prefix -- /usr/local/haskell if --global, and ~/.cabal if --user $compiler $pkgid bin -- binaries lib -- libraries & .hi files include -- include files libexec -- private binaries share -- data files doc -- documentation html -- html doc man -- man pages Here are my arguments against this particular hierarchy: 1. I'd argue that Unix administrators expect $prefix to default to /usr/local. Defaulting to adding a new directory under /usr/local for haskell packages is unlikely to be appreciated. 2. As a distro packager I would have to make some considerable changes to the layout during the configure step, or alternatively create a considerable number of symlinks, to make things work (think of default $PATH, manpaths, etc). 3. I don't think your comment on per-interpreter directories for Python is true. AFAIK Python only uses that directory for modules, not for binaries and not for documentation. That would mean that Cabal's current behaviour matches what Python does. (Please correct me if my understanding of this is wrong.) Personally I would keep the top-level bits the same and instead insert bits in the lower levels: $prefix bin lib $compiler $pkgid include libexec $compiler $pkgid share $compiler $pkgid man doc $compiler $pkgid html Some comments on this: 1. Placing binaries in $prefix/bin increases the likelihood of things just working. Binaries from multiple versions/configurations of a single package can be handled by a suffix to the filename itself. It could be useful to be able to instruct Cabal to create a symlink from the basename to the full filename (basename-suffix) at install time. 2. Man pages can be dealt with in the same way as binaries, by using suffixes. I have to admit I'm not sure whether this is very standard, but it does seem to work. 3. Deleting stuff in this layout would be slightly more work than in yours, but not considerably more. Comments? /M -- Magnus Therning OpenPGP: 0xAB4DFBA4 email: magnus@therning.org jabber: magnus@therning.org twitter: magthe http://therning.org/magnus

On 4 Dec 2010, at 06:37, Mark Lentczner wrote:
Over on my blog, I've put up a proposal for changing the default layout of installed package pieces:
http://mtnviewmark.wordpress.com/2010/12/02/haskell-package-layout/
Thoughts?
+1 for most of the proposal. My only doubt is about executables currently in $prefix/bin, which in your scheme would end up in $prefix/$compiler/$pkgid/bin. I don't want programs I have installed via cabal suddenly to disappear when I uninstall an old compiler version. I might even want to uninstall the corresponding library associated with the executable, but have the executable continue to be available. Regards, Malcolm

I think I can summarize the feedback and concerns expressed on the reddit discussion, the blog's comments, and here on the mailing list: 0) "Nice proposal", "I heartily recommend it.", 28 up votes, 2 down (93%) Yea. 1) Executables should be on the user's path. 2) need to be sure to rebuild haddock index, and per tree Both of these are about "making things just work". I agree. The binaries on the path *is* something we should preserve, and cabal already supports the symlink-bindir feature for just this reason. It allows one to put the binaries in one place, and symlink them into a bin directory on the $PATH. Such a scheme facilities easy removal in the future (kill the appropriate install tree, then remove broken symlinks in the $PATH bin dir -- otherwise you have know which binaries were in the package or packages you just deleted...) Keeping the haddock index up-to-date is also very important, and again, cabal-install already does this, though I believe it only ever updates the --user index. For distributions, distributing a built index for the distributed --global docs is important. I suspect we all have little scripts for doing this. I'll try generalizing mine and making them available to packagers. 3) Package specific data and documentation shouldn't vary by compiler. 4) Why would anyone need executables of the same package built with different compilers at the same time? I agree the first is strongly desirable, but not always true. The second arises when the executables are enhancements to the tool chain, and so are generally tool chain dependent. In either case, there is no harm to having these directories grouped per compiler, assuming points 1 & 2 above are addressed. Except perhaps for enabling bad practice. 5) By putting non-executable data in a separate place, it can be shared from a central server to machines with multiple architectures. 6) Compiler isn't specific enough: need os-processor-abi etc... 7) Cabal's dist structure is problematic in some cases. These are concerns mostly for people building and installing in special environments. The vast majority of installs are on single user machines, with a single architecture. And .cabal/conf can still be customized to support any layout one can achieve now. 8) Linux folks expect $prefix to be /usr/local 9) An alternate suggestion essentially having a per $compiler subtree under each of lib/, libexec/, share/, and doc/ I surveyed some Linux distributions for how other language systems lay things out. They are more like the #9 suggestion, placing things in language & version specific subdirs in common directories for libs, modules, data, and doc. Though I found it interesting that often, even on the same platform and language, there were often two common places for the same thing in use. I'll note that not a single one of them used /usr/local, nor is it in the default search paths for either perl or python. So, I think something like the #9 approach is probably best for linux like systems, and like my original proposal for Mac OS X (where each language's installs are segregated into /Library/${language}, and not spread around the system). I can't say much about the Windows layout. - Mark

Hi
I think I can summarize the feedback and concerns expressed on the reddit discussion, the blog's comments, and here on the mailing list:
0) "Nice proposal", "I heartily recommend it.", 28 up votes, 2 down (93%)
I should say, I voted that up because it looked interesting (and it does). However, voting up meant it was interesting, not that I reviewed the proposal and that it answers everyone's needs :-)
1) Executables should be on the user's path. 2) need to be sure to rebuild haddock index, and per tree
Both of these are about "making things just work". I agree. The binaries on the path *is* something we should preserve, and cabal already supports the symlink-bindir feature for just this reason. It allows one to put the binaries in one place, and symlink them into a bin directory on the $PATH. Such a scheme facilities easy removal in the future (kill the appropriate install tree, then remove broken symlinks in the $PATH bin dir -- otherwise you have know which binaries were in the package or packages you just deleted...)
Remember Windows when doing these steps. Symlinks on Windows is a bad idea (they are technically supported on some file systems, but in practice they aren't a great idea, since no tools can work with them).
Keeping the haddock index up-to-date is also very important, and again, cabal-install already does this, though I believe it only ever updates the --user index. For distributions, distributing a built index for the distributed --global docs is important. I suspect we all have little scripts for doing this. I'll try generalizing mine and making them available to packagers.
There is more per-package data than just the haddocks - there are also hoogle databases for packages. It's not critical, but if you keep that in mind when designing things it would be a lovely bonus.
5) By putting non-executable data in a separate place, it can be shared from a central server to machines with multiple architectures. 6) Compiler isn't specific enough: need os-processor-abi etc... 7) Cabal's dist structure is problematic in some cases.
These are concerns mostly for people building and installing in special environments. The vast majority of installs are on single user machines, with a single architecture. And .cabal/conf can still be customized to support any layout one can achieve now.
Many Windows machines end up being per user because of permissions, often only admin can write to shared areas. Thanks, Neil
participants (5)
-
Henning Thielemann
-
Magnus Therning
-
Malcolm Wallace
-
Mark Lentczner
-
Neil Mitchell