
Michael Snoyman wrote a blog post about Solving Cabal Hall, which came across via the peerless Haskell Weekly News. http://www.yesodweb.com/blog/2012/11/solving-cabal-hell But I'd be unlikely to return there so I'm posting this to the libraries and cabal-devel lists. I'm not deep in Cabal lore, but it seems to me that there is a fairly easy way to do a Lot Better than we are now. Let's go back to "Identifying the Problem". Michael doesn't say, but I'm guessing that Fay depends on yesod-platform. Very well, so Cabal sees I'm installing Fay, which depends on yesod-platform, which dpends on data-default and depends directly on data-default If the package database was empty, Cabal would try to figure out a version of data-default that is acceptable to both yesod-platform and to Fay. And that's what we want! Suppose that Cabal figures out that yesod-platform-2.7 and data-default-0,4 would work. Then, in an empty package database, it could just go ahead and install those. But in your example, the database isn't empty; we have already isntalled yesod-platform-2.7, dependin gon data-default-0.5. And (here's the rub) you can only have yesod-platform-2.7 installed once. One solution might be 1. *un* install yesod-platform-2.7 (depending on data-default-0.5) and 2. *re* install it (depending on data-default-0.4). But that would break any programs that use both yesod-platform-2.7 and data-default-0.5. The solution is obvious: we should make it possible to instally yesod-platform-2.7 twice, once version depending on data-default-0.4 once version depending on data-default-0.5. Now, if Cabal can figure out a plan based on an empty database, it can deliver on that plan even in a non-empty database, without messing up any existing installations. Of course, if Fay depends exclusively on data-default-0.4, and yesod-platform depends exclusively on data-default-0.5, then they genuinely are incompatible, and Cabal can and should say so. That's not Cabal Hell. That's just saying that the package authors have *specified* that they are incompatible. Summary. I may be way off beam here, but I think we can easily make things way better than they are. By "easily" I mean with just design and implementation work. That's not free, and no one has any time.... but it's well within reach. I'd even been wondering about trying to crowd-fund this development so that Well Typed can do it. But let's first see if it'd solve the problem. There must be a wiki page somewhere that articulates the challenge, points to blog posts about it, and discusses solutions. Simon

On Thu, 22 Nov 2012, Simon Peyton-Jones wrote:
The solution is obvious: we should make it possible to instally yesod-platform-2.7 twice, once version depending on data-default-0.4 once version depending on data-default-0.5.
Does ghc-pkg allow to have two different versions of yesod-platform-2.7 being installed?

On 22 November 2012 09:17, Henning Thielemann
On Thu, 22 Nov 2012, Simon Peyton-Jones wrote:
The solution is obvious: we should make it possible to instally yesod-platform-2.7 twice, once version depending on data-default-0.4 once version depending on data-default-0.5.
Does ghc-pkg allow to have two different versions of yesod-platform-2.7 being installed?
No, not right now. But the point is, if it did... Duncan

| Does ghc-pkg allow to have two different versions of yesod-platform-2.7 | being installed? Not at the moment. But it should! S | -----Original Message----- | From: Henning Thielemann [mailto:lemming@henning-thielemann.de] | Sent: 22 November 2012 09:18 | To: Simon Peyton-Jones | Cc: Haskell Libraries (libraries@haskell.org); cabal-devel@haskell.org; | Michael Snoyman (michael@fpcomplete.com) | Subject: Re: Cabal and GHC | | | On Thu, 22 Nov 2012, Simon Peyton-Jones wrote: | | > The solution is obvious: we should make it possible to instally | > yesod-platform-2.7 twice, once version depending on data-default-0.4 | > once version depending on data-default-0.5. |

Excerpts from Simon Peyton-Jones's message of Thu Nov 22 00:32:27 -0800 2012:
Now, if Cabal can figure out a plan based on an empty database, it can deliver on that plan even in a non-empty database, without messing up any existing installations.
This is an interesting invariant, weaker than the more obvious one: "Cabal should do the same plan no matter what the state of the database is" which is in tension with "don't install more than you have to." Edward

| > Now, if Cabal can figure out a plan based on an empty database, it can | deliver on that plan even in a non-empty database, without messing up | any existing installations. | | This is an interesting invariant, weaker than the more obvious one: | "Cabal should do the same plan no matter what the state of the database | is" which is in tension with "don't install more than you have to." Well I didn't say it should *ignore* the database. The current database might influence its plan. For example if, after installing both yesod-platform and Fay, Cabal is asked to install a package P which depends on yesod-platform data-default 0.4-0.8, then Cabal could pick the already-installed version of yesod-platform depending on data-default 0.5, rather than installing yet another version depending on data-default-0.8. So the invariant I suggest is * If it'd work in an empty database, it should work in any non-empty one * Installing X should never break the existing installation of Y Simon

It seems that allowing multiple installed versions would get you ambiguities though... In the simplest case, if I have two versions of package foo installed, and I install a package bar that is compatible with both, which one should cabal choose? Presumably the latest, I guess? In the general case, what we have is a DAG of packages. What is installed is a subset of the DAG, with specific version numbers for each package. Assuming no trickery, what is installed is consistent - a single version for each package such that each connected DAG component is consistent with the packages requirements. The "really fun" stuff in cabal is when two different disconnected DAG components become connected, due to a package requiring something from each of the components. Even if the original two components were consistent, the combined DAG need not be. This is when you get the wailing and gnashing of teeth. If you allow only one version of a package to be installed, this forces the user to re-install a whole bunch of "innocent bystander" packages to restore consistency to the *whole* now-connected larger DAG component (*not* just the part that the new package depends on). While this is a PITA, it is at least well defined (assuming one only upgrades packages, never downgrades them). AFAIK, cabal doesn't do this right now, and IMVHO this is a source of a lot of the pain using it. BTW, the same problem occurs when one un-installs a package, or updates/downgrades to a different package version, etc.; it should be easy to un/re-install all affected packages, but AFAIK cabal doesn't make it easy. For that matter, it should be easy to un-install an ("application") package and all ("library") packages that were only installed to support it, and aren't used by any other package ("apt" does that and it is a "very cool" feature). At any rate, if we allow multiple versions of a package to be installed, then you might have several viable candidates for installing a new package, each re-using some of the existing packages, and forcing the installation of newer/older versions of some others. The simplest thing to do would be to only install newer versions of existing packages, which would give similar results to the single-installed-version case. So, you need to solve the same automation problem as for the single-installed-version case, plus add the ability to manage multiple versions. The same considerations hold for uninstalling a package version. For this reason, I think that before we hop on the multiple-installed-versions wagon, it is worth taking a long hard look at automating *all* aspects of automatic-transitive-closure of *all* the current cabal operations. This is what most package managers do (Debian's "apt", etc.) and they seem to be handling the "DLL hell" pretty well. Whatever the solution is, I agree cabal does need fixing... Oren Ben-Kiki

Simon Peyton-Jones
Well I didn't say it should *ignore* the database. The current database might influence its plan. For example if, after installing both yesod-platform and Fay, Cabal is asked to install a package P which depends on yesod-platform data-default 0.4-0.8, then Cabal could pick the already-installed version of yesod-platform depending on data-default 0.5, rather than installing yet another version depending on data-default-0.8.
So the invariant I suggest is * If it'd work in an empty database, it should work in any non-empty one * Installing X should never break the existing installation of Y
This makes a lot of sense to me. I think it is also time to admit that we are not the first trying to solve this problem. For example, Scala has the sbt tool: http://www.scala-sbt.org I am sure, it is not perfect, but it seems much more flexible and versatile than Cabal. Interestingly, the Scala guys were smart enough to avoid re-inventing the wheel and are using Apchae's Ivy http://ant.apache.org/ivy/ to do their dependency management. Manuel

On Thursday, November 22, 2012, Manuel M T Chakravarty wrote:
Simon Peyton-Jones
javascript:;>: So the invariant I suggest is * If it'd work in an empty database, it should work in any non-empty one * Installing X should never break the existing installation of Y
This makes a lot of sense to me.
We've been working on this for some time and this property is sometimes known as hermetic builds. The first approximation will be sandbox in (as it is relatively easy to implement). Long term we want a write only Nix like package store.

"Cabal should do the same plan no matter what the state of the database is" which is in tension with "don't install more than you have to."
This I like, except instead of "cabal", it should be something new. With a combination of curators and a synchronized release schedule, I see a system like this running very smoothly, and you could even bridge the gap between both ideals. A social solution will be much more effective than a technical solution. I still agree with Michael's post that it would be best to let cabal keep doing its thing, and instead create a new, more convenient tool on top of it with this added social layer of curation and synchronization. Perhaps the best solution would be to add the functionality Simon suggests, as well as adopting the hopelessly complex Nix-style package management (referring to Brandon's comments), and then giving sense to it with a social layer on top. -- Dan Burton

On Thu, Nov 22, 2012 at 10:32 AM, Simon Peyton-Jones
Michael Snoyman wrote a blog post about Solving Cabal Hall, which came across via the peerless Haskell Weekly News. http://www.yesodweb.com/blog/2012/11/solving-cabal-hell
But I'd be unlikely to return there so I'm posting this to the libraries and cabal-devel lists.
I'm not deep in Cabal lore, but it seems to me that there is a fairly easy way to do a Lot Better than we are now. Let's go back to "Identifying the Problem". Michael doesn't say, but I'm guessing that Fay depends on yesod-platform. Very well, so Cabal sees
I'm installing Fay, which depends on yesod-platform, which dpends on data-default and depends directly on data-default
If the package database was empty, Cabal would try to figure out a version of data-default that is acceptable to both yesod-platform and to Fay. And that's what we want!
Suppose that Cabal figures out that yesod-platform-2.7 and data-default-0,4 would work. Then, in an empty package database, it could just go ahead and install those. But in your example, the database isn't empty; we have already isntalled yesod-platform-2.7, dependin gon data-default-0.5. And (here's the rub) you can only have yesod-platform-2.7 installed once.
One solution might be 1. *un* install yesod-platform-2.7 (depending on data-default-0.5) and 2. *re* install it (depending on data-default-0.4). But that would break any programs that use both yesod-platform-2.7 and data-default-0.5.
The solution is obvious: we should make it possible to instally yesod-platform-2.7 twice, once version depending on data-default-0.4 once version depending on data-default-0.5.
Now, if Cabal can figure out a plan based on an empty database, it can deliver on that plan even in a non-empty database, without messing up any existing installations.
Of course, if Fay depends exclusively on data-default-0.4, and yesod-platform depends exclusively on data-default-0.5, then they genuinely are incompatible, and Cabal can and should say so. That's not Cabal Hell. That's just saying that the package authors have *specified* that they are incompatible.
Summary. I may be way off beam here, but I think we can easily make things way better than they are. By "easily" I mean with just design and implementation work. That's not free, and no one has any time.... but it's well within reach. I'd even been wondering about trying to crowd-fund this development so that Well Typed can do it. But let's first see if it'd solve the problem.
There must be a wiki page somewhere that articulates the challenge, points to blog posts about it, and discusses solutions.
Simon
_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries
Hi Simon, Having GHC support multiple copies of the same package/version would certainly be a step in the right direction. It still wouldn't completely address the goals I'm getting at in my blog post. What I'm really aiming for overall is to provide users with a stable, vetted set of packages that are known to work well together. But for those of us who will still be using plain Hackage, this improvement would be very helpful. MIchael

Having GHC support multiple copies of the same package/version would certainly be a step in the right direction. It still wouldn't completely address the goals I'm getting at in my blog post. What I'm really aiming for overall is to provide users with a stable, vetted set of packages that are known to work well together. But for those of us who will still be using plain Hackage, this improvement would be very helpful.
Indeed. I've always thought of the Haskell Platform as doing the "stable set" part, though there must be plenty of space between HP at one end and Hackage at the other, and that's what you are exploring I think.
Regardless, being able to install multiple instantiations of the same package/version is desirable for the "stable set" story, so that more than one stable set is possible; and so that trying to use a package outside the stable set doesn't mess up your stable-set installation.
The question is: who will do the work?
Simon
From: michael.snoyman@gmail.com [mailto:michael.snoyman@gmail.com] On Behalf Of Michael Snoyman
Sent: 22 November 2012 10:40
To: Simon Peyton-Jones
Cc: Haskell Libraries (libraries@haskell.org); cabal-devel@haskell.org
Subject: Re: Cabal and GHC
On Thu, Nov 22, 2012 at 10:32 AM, Simon Peyton-Jones

2012/11/22 Simon Peyton-Jones
Having GHC support multiple copies of the same package/version would certainly be a step in the right direction. It still wouldn't completely address the goals I'm getting at in my blog post. What I'm really aiming for overall is to provide users with a stable, vetted set of packages that are known to work well together. But for those of us who will still be using plain Hackage, this improvement would be very helpful.
Indeed. I’ve always thought of the Haskell Platform as doing the “stable set” part, though there must be plenty of space between HP at one end and Hackage at the other, and that’s what you are exploring I think.
Regardless, being able to install multiple instantiations of the same package/version is desirable for the “stable set” story, so that more than one stable set is possible; and so that trying to use a package outside the stable set doesn’t mess up your stable-set installation.
The question is: who will do the work?
Duncan said on reddit you mentionned to him Kickstarter as a possible venue (http://www.reddit.com/r/haskell/comments/12e3a0/the_good_the_bad_and_the_ugl...). He mentioned a minimum of around $50k. I think it would be interesting to gauge interest on the haskell mailing list or on reddit to see if such an amount would be possible. Cheers, Thu

On 22 November 2012 10:39, Michael Snoyman
On Thu, Nov 22, 2012 at 10:32 AM, Simon Peyton-Jones
wrote: Michael Snoyman wrote a blog post about Solving Cabal Hall, which came across via the peerless Haskell Weekly News. http://www.yesodweb.com/blog/2012/11/solving-cabal-hell
The solution is obvious: we should make it possible to instally yesod-platform-2.7 twice,
Having GHC support multiple copies of the same package/version would certainly be a step in the right direction. It still wouldn't completely address the goals I'm getting at in my blog post. What I'm really aiming for overall is to provide users with a stable, vetted set of packages that are known to work well together. But for those of us who will still be using plain Hackage, this improvement would be very helpful.
Right, and we need both. Your suggestion is very sensible. It's similar to the suggestion people have made before about having "testing" and "stable" sets of packages (these suggestions often come from people using debian terminology), and managing things like a distribution. In both cases the point is to provide something to end users where we've set up the available packages such that we know most things work together. (This is why debian users don't run into problems most of the time, because volunteers have done a huge amount of work behind the scenes to eliminate conflicting versions). So yes we need that, and it will require a fair bit of work, and we need to distribute that amongst many people. I think eventually the best mechanism to do it will be to use hackage and use tagging, but we can certainly experiment now by using separate hackage instances. But we also need to address the problems of developers in the thick of things, who must work with the latest and greatest unsable versions, before people have got round to smoothing out all the conflicting dependencies and patching packages so they can all use the same versions of common dependencies. That's where the nix-style approach that we've been advocating for years comes in. That's what Philipp Schuster's GSoC this summer was all about. That's aiming to do exactly what Simon is talking about here. It's about allowing sets of packages to be installed simultaneously that have inconsistent sets of dependencies. There's a slight overlap with sandboxing, but the way I see it, the nix approach is the right underlying implementation mechanism and then sandboxing is more of a UI issue. Duncan

On 22/11/2012 11:00, Duncan Coutts wrote: he nix-style approach that we've been advocating for
years comes in. That's what Philipp Schuster's GSoC this summer was all about. That's aiming to do exactly what Simon is talking about here. It's about allowing sets of packages to be installed simultaneously that have inconsistent sets of dependencies. There's a slight overlap with sandboxing, but the way I see it, the nix approach is the right underlying implementation mechanism and then sandboxing is more of a UI issue.
I completely agree with Duncan. Finishing the implementation of this would solve many of the problems that people are collectively calling "cabal hell", and would achieve exactly what Simon suggested at the beginning of this thread. For people who want to learn more about this, we have a wiki page with lots of information about the design (which is sadly in need of a bit of reorganisation though): http://hackage.haskell.org/trac/ghc/wiki/Commentary/GSoCMultipleInstances It's really not as hard as it seems, the main tricky bit is in deciding how to name various things (see the section entitled "Hashes and Identifiers"). Cheers, Simon

On Thu, Nov 22, 2012 at 3:32 AM, Simon Peyton-Jones
The solution is obvious: we should make it possible to instally yesod-platform-2.7 twice, once version depending on data-default-0.4 once version depending on data-default-0.5.
This sounds great until you have lots of packages installed, with all of them doing this kind of thing; you have many packages present multiple times with different (and ultimately nested) cross-dependencies, and a nightmare on the accounting front. And the user is watching already-installed packages being constantly reinstalled again because there's a little gap *somewhere* in the dependency web. I am wondering if we're trying to solve the problem in the wrong way. The core of the problem is that various things get baked into libraries in the name of optimization, such that a given binary library has dependencies on precise versions of other libraries; it's not like C where anything supporting the ABI can use the same .a/.lib or .so/.dll/.dylib. Perhaps what we need is some kind of half-compiled state which leaves the compiler able to do those optimizations at link time. (This may admittedly end up meaning whole-program compilation and thereby giving up on binary libraries completely.) -- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix/linux, openafs, kerberos, infrastructure http://sinenomine.net

On Thu, 22 Nov 2012, Brandon Allbery wrote:
I am wondering if we're trying to solve the problem in the wrong way. The core of the problem is that various things get baked into libraries in the name of optimization, such that a given binary library has dependencies on precise versions of other libraries; it's not like C where anything supporting the ABI can use the same .a/.lib or .so/.dll/.dylib.
C supports inlining. This should cause the same kind of problems, shouldn't it?

On Thu, Nov 22, 2012 at 8:41 PM, Henning Thielemann
On Thu, 22 Nov 2012, Brandon Allbery wrote:
I am wondering if we're trying to solve the problem in the wrong way. The core of the problem is that various things get baked into libraries in the name of optimization, such that a given binary library has dependencies on precise versions of other libraries; it's not like C where anything supporting the ABI can use the same .a/.lib or .so/.dll/.dylib.
C supports inlining. This should cause the same kind of problems, shouldn't it?
Indeed, and in Qt for example which has a strict binary compatibility guarantee, making a public function inline means you effectively cannot change it until the next major release. (In this case "making it inline" and "including the definition in the header file" are effectively synonymous, one requires the other.) The problem is that in Haskell inlining is a lot more important for performance.
_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries
-- Your ship was destroyed in a monadic eruption.

On 22/11/2012 19:54, Gábor Lehel wrote:
On Thu, Nov 22, 2012 at 8:41 PM, Henning Thielemann
wrote: On Thu, 22 Nov 2012, Brandon Allbery wrote:
I am wondering if we're trying to solve the problem in the wrong way. The core of the problem is that various things get baked into libraries in the name of optimization, such that a given binary library has dependencies on precise versions of other libraries; it's not like C where anything supporting the ABI can use the same .a/.lib or .so/.dll/.dylib.
C supports inlining. This should cause the same kind of problems, shouldn't it?
Indeed, and in Qt for example which has a strict binary compatibility guarantee, making a public function inline means you effectively cannot change it until the next major release. (In this case "making it inline" and "including the definition in the header file" are effectively synonymous, one requires the other.)
The problem is that in Haskell inlining is a lot more important for performance.
I would like to see GHC support fixed ABIs, and the work I did with ABI hashing in GHC was aiming towards exactly that. For fixed ABIs you would need to have the user explicitly declare every inline function, and then hash the definitions as part of the ABI. (also do something about strictness and arity, and other cross-module optimisation hints). It might be painful, but it could be optional, and the gains are quite nice: the ability to upgrade a library in-place without recompiling everything that depends on it. Especially now that we're moving towards shared libraries, this would become more useful. I'm sure it's not going to happen soon, but it would be an interesting project for someone (probably larger than a GSoC project though). Cheers, Simon

On 23 November 2012 21:06, Simon Marlow
On 22/11/2012 19:54, Gábor Lehel wrote:
On Thu, Nov 22, 2012 at 8:41 PM, Henning Thielemann
wrote: On Thu, 22 Nov 2012, Brandon Allbery wrote:
I am wondering if we're trying to solve the problem in the wrong way. The core of the problem is that various things get baked into libraries in the name of optimization, such that a given binary library has dependencies on precise versions of other libraries; it's not like C where anything supporting the ABI can use the same .a/.lib or .so/.dll/.dylib.
Right, I tend to agree. Multiple installs of packages make me feel uncomfortable and sleep less well at night... :)
I would like to see GHC support fixed ABIs, and the work I did with ABI hashing in GHC was aiming towards exactly that.
For fixed ABIs you would need to have the user explicitly declare every inline function, and then hash the definitions as part of the ABI. (also do something about strictness and arity, and other cross-module optimisation hints). It might be painful, but it could be optional, and the gains are quite nice: the ability to upgrade a library in-place without recompiling everything that depends on it. Especially now that we're moving towards shared libraries, this would become more useful.
Yes that would be really wonderful!! The current ABI situation is really very painful for packagers and beyond IMHO. To me it is actually the largest current problem with ghc. Jens

| > I would like to see GHC support fixed ABIs, and the work I did with | > ABI hashing in GHC was aiming towards exactly that. | > | > For fixed ABIs you would need to have the user explicitly declare | > every inline function, and then hash the definitions as part of the | > ABI. (also do something about strictness and arity, and other | > cross-module optimisation hints). It might be painful, but it could | > be optional, and the gains are quite nice: the ability to upgrade a | > library in-place without recompiling everything that depends on it. | > Especially now that we're moving towards shared libraries, this would | become more useful. | | Yes that would be really wonderful!! | | The current ABI situation is really very painful for packagers and | beyond IMHO. To me it is actually the largest current problem with ghc. Indeed. Offering fixed ABIs is something that Simon and I have often discussed, but never acted on. Why not? * There is significant design work to be done. Exactly how does the programmer specify which inlinings, strictness, arity etc appear in the ABI? What if the implementation of a function changes so that its strictness or arity really is different? * There is significant implementation work to do. Apart from anything else, we presumably do not want to inhibit cross-module inlining etc *within* a package. Currently GHC uses the same M.hi files *within* a package as *between* packages. So maybe we need to strip down these .hi files somehow for use cross-package; or generate a one-per-package portmanteau .hi file? That's real work. * There will be a performance penalty for reducing cross-package inlining. We don't know how much, and it's hard find out without doing the work of the earlier two bullets. So, if we want fixed ABIs then lots of people are going to have to help, including help lead! Simon
participants (14)
-
Brandon Allbery
-
Dan Burton
-
Duncan Coutts
-
Edward Z. Yang
-
Gábor Lehel
-
Henning Thielemann
-
Jens Petersen
-
Johan Tibell
-
Manuel M T Chakravarty
-
Michael Snoyman
-
Oren Ben-Kiki
-
Simon Marlow
-
Simon Peyton-Jones
-
Vo Minh Thu