
Hi Duncan,
first, thanks for coming yourself to answer.
On Wed, Sep 15, 2010 at 18:33, Duncan Coutts
On 13 September 2010 20:54, Paolo Giarrusso
wrote: On Sun, Sep 12, 2010 at 20:46, Tillmann Rendel
wrote:
1. upgrading packages can break dependencies (and Cabal does not do a lot to prevent/avoid this)
2. cabal ought to allow using multiple versions of a single package in more circumstances than it does now
I answer below with some issues - in particular, I discuss why IMHO your proposal for 2. does not work well with cross-module inlining.
Both of these issues are known to the Cabal hackers (i.e. me and a few other people). I'll share my views on the problem and the solution.
Ah-ah! Can I request to add _at least_ the 1st among FAQs? Something like: "A version of package A was rebuilt [for an upgrade of its dependency], and stuff depending on A started causing linking errors!" I am even ready to send patches.
1. This is certainly a problem. The current situation is not awful but it is a bit annoying sometimes. We do now accurately track when packages get broken by upgrading dependencies so it should not be possible to get segfaults by linking incompatible ABIs.
I had a slightly different counterexample, but maybe it's purely a GHC bug; I use GHC 6.10.4 and the latest Cabal/cabal-install. Are dependencies computed by Cabal or ghc-pkg? If they are computed by Cabal, I think I have a bug report. At some point I unregistered a package with ghc-pkg (old-locale-1.0.0.2 probably), without using --force, and I started getting linker errors mentioning it, in a form like: <command line>: unknown package: old-locale-1.0.0.2 even if old-locale-1.0.0.2 appeared on no command line (not even internal ones, I checked everything with -v), but was just mentioned by a package mentioned on the command line of an internal command. There is a small possibility that this was due to the older Cabal which was installed with GHC - but IIRC the new cabal was one of the first packages (or the first) I installed.
My preferred solution is to follow the example of Nix and use a persistent package store. Then installing new packages (which includes what people think of as upgrading) become non-destructive operations: no existing packages would be broken by an upgrade.
It would be necessary to allow installing multiple instances of the same version of a package. That would solve Cabal bug 738 which I reported.
If we do not allow multiple instances of a package then breaking things during an upgrade will always remain a possibility. We could work harder to avoid breaking things, or to try rebuilding things that would become broken but it could never be a 100% solution.
It is a good idea, but how do you handle removal requests? Also, there are existing complete solutions, they are much harder to get right. However, multiple versions of the same package is a good idea, and in particular it would make upgrading Cabal much less tricky. The problem with package removal is still present, but that is less important than safety (especially given that "cabal uninstall" is still a TODO); and in a safe persistent system, one can use ghc-pkg unregister and manually handle the dependencies. And I'd like to point out that a non-persistent package store can be made to 100% work - with your proposal it would do so by design.
2. This is a problem of information and optimisitic or pesimistic assumptions. Technically there is no problem with typechecking or linking in the presense of multiple versions of a package. If we have a type Foo from package foo-1.0 then that is a different type to Foo from package foo-1.1. GHC knows this.
So if for example a package uses regex or QC privately then other parts of the same program (e.g. different libs) can also use different versions of the same packages. There are other examples of course where types from some common package get used in interfaces (e.g. ByteString or Text). In these cases it is essential that the same version of the package be used on both sides of the interface otherwise we will get a type error because text-0.7:Data.Text.Text does not unify with text-0.8:Data.Text.Text.
The problem for the package manager (i.e. cabal) is knowing which of the two above scenarios apply for each dependency and thus whether multiple versions of that dependency should be allowed or not. Currently cabal does not have any information whatsoever to make that distinction so we have to make the conservative assumption. If for example we knew that particular dependencies were "private" dependencies then we would have enough information to do a better job in very many of the common examples.
My preference here is for adding a new field, build-depends-private (or some such similar name) and to encourage packages to distinguish between their public/visible dependencies and their private/invisible deps.
On a policy level, it's difficult for a developer to keep track of which dependencies are public and private. You need to manually inspect your public API. On a mechanism level, I think that adding a field actually doesn't work, because GHC cross-module inlining can change the picture unpredictably: cabal would need to check that packages in build-depends-private are not mentioned in the .hi interface files - but GHC can store there implementation details. Results: if cabal does no checking, a packager can easily shoot the foot of its users (rather than its own). If cabal does such checking, getting it right requires trial-and-error for the developer, and it will cause errors when the GHC version and optimization options change. We don't want either scenario. E.g. I just made up a syntax for a regexp library, and built a function which should check if (useless) trailing spaces are present in some text: checkNoTrailingSpace:: String -> String checkNoTrailingSpace = not . (regexpMatch "\s+$") allowing inlining of such a function would turn a possibly private dependency on some regexp package into a public one. However, automatic checking as I proposed (without extra help from GHC) does not work either, and I show a counterexample, which is also about "bad library design". Suppose that V1 of package Foo has functions: buildFoo bar baz = (bar, baz) takeBar (bar,baz) = bar and that V2 of Foo swaps the order of bar and baz in the underlying pair. The pair representation should be either encapsulated by a data constructor (but it is not), or part of the API and ABI and thus not changeable. Today doing this causes no harm, but if linking multiple versions of Foo were allowed, this would create a nightmare. If a data constructor where used, versioned typechecking would catch the problem. Since these functions could be fully inlined in module Foo2, it becomes impossible to infer from .hi files of Foo2 which dependencies are public, unless GHC stores from which modules come bodies of functions exposed in .hi files. So, I propose to: - depending on the solution, possibly educate library developers about resulting pitfalls, if they are not supposed to write code like the above. - extend GHC to produce needed information (if not done) - use that for automatically checking which dependencies are public and which are private (at package installation time) Best regards -- Paolo Giarrusso - Ph.D. Student http://www.informatik.uni-marburg.de/~pgiarrusso/