An Easy Solution to PVP Bounds and Cabal Hell

Recently there has been some discussion about how we can fix the problem of “Cabal Hell”. Some people advocate restrictive upper bounds, to prevent packages from being broken by new updates. Some other people point out that too-restrictive bounds can lead to bad install plans, since some packages might want newer versions of some dependencies, and others older versions. Still other people say that we can _retroactively_ fix upper bounds (in either direction) by modifying cabal files using the new features in Hackage. Some people think this is a terrible idea because it looks like we are mutating things, and this confuses hashes. In turn, these people support either nix or a nix-like approach by which packages have hashes that encompass the full versions of all their transitive dependencies. With that in hand, we can cache builds and mix-and-match to build the precise environment we want for each package while reducing redundant computation. However, the cache of all the various binary combinations may still grow large! And none of this fully begins to address the dreaded “diamond dependency” problem. Here is a chart of some such solutions: http://www.well-typed.com/blog/aux/images/cabal-hell/cabal-hell-solutions.pn... One way to look at a particular build is in an n-dimensional state space (of the Hilbert sort) determined by all its dependencies, not least the compiler itself. The solver acts as a particle traversing this space. But this description is too simple. Our state of dependencies and constraints itself varies and grows over time. So another approach is to think of the dependencies as a transitive graph, where each node may vary along a time axis, and as they slide along the axis, this in turn affects their children. We now have not just one Hilbert space, but a collection of them related by branching trees as authors locally modify the dependencies of their packages. If we keep this simple model in mind, it is easy to see why everyone is always having these debates. Some people want to fix the graph, and others want to simplify it. Some people want an immutable store, and some people want to rebase. But we can’t “really” rebase in our current model. Bearing in mind our model of a space-time continuum of hackage dependences, the solution emerges — enforce immutability, but allow retroactive mutation. For instance, suppose Fred tries to install package Foo on Friday evening, but discovers that it depends on version 1.0 of Bar (released the previous Friday) that in turn depends on version 0.5 of Baz but Foo also depends on version 0.8 of Baz. So Fred branches Bar and changes the dependency, which in turn informs Betty, that there is also a 1.0 of Bar with different dependencies and we have forked our package timeline. On getting this message on Monday, Betty can merge by pushing with --force-rewrites and this goes back in the timeline and makes it so that Baz retroactively had the right dependencies and now Fred, as of the previous Friday, no longer has this problem. (That way he still has the weekend). Now the failed build is cut off temporarily into a cycle in the package timeline that is disconnected from the rewrite. We stash it with “hackage stash” until Monday at which time the dependency graph is 100 percent equalized and primed for new patches. At this point we unstash Foo as of Friday and it is replaced by the Foo from the new timeline. Friday Fred needs to remain stashed lest he run into himself. The longer he can be avoided by his Monday self the better. Future work could include bots which automate pruning of artifacts from redundant branches. If this description was too abrupt, here is a diagram with a fuller description of the workflow: http://bit.ly/15IIGac I know there are some new ideas to take in here, and there is a little technical work necessary to make it feasible, but in my opinion if you can understand the current cabal situation, and you can understand how git and darcs work, then you should be able to understand this too. Hopefully by this time next year, we’ll be able to say that our problems with cabal have been truly wiped from our collective memory. HTH, HAND. Gershom

Hi Gershom,
On Tue, Mar 31, 2015 at 11:22 PM, Gershom B
One way to look at a particular build is in an n-dimensional state space (of the Hilbert sort) determined by all its dependencies, not least the compiler itself. The solver acts as a particle traversing this space. But this description is too simple. Our state of dependencies and constraints itself varies and grows over time. So another approach is to think of the dependencies as a transitive graph, where each node may vary along a time axis, and as they slide along the axis, this in turn affects their children. We now have not just one Hilbert space, but a collection of them related by branching trees as authors locally modify the dependencies of their packages.
I don't know if you intended this to be a satirical remark about the package upper bounds, but I think you really hit the nail on the head. When we ask package authors to put upper bounds on their dependencies, what we are really doing is asking them to propagate information about _future_ incompatibilities back into the present. As none of us has access to future information [1], the upper bounds on our dependencies amount to a collection of bad guesses [2]. I think we should consider taking a more empirical approach [3]. One way (but certainly not the only way) to approach this would be to require packages to be uploaded to Hackage with a kind of "build-certificate," certifying that a successful build plan was possible given the state of Hackage at a particular timestamp. This allows us to infer minimal upper bounds for all of the package's transitive dependencies. Once the build-certificate is authenticated, Hackage only needs to store the latest timestamp of a successful build, so the overhead is very low. In this scheme, author-specified upper bounds would be relegated to ruling out known incompatibilities with already-released versions of dependencies. Of course, build failures will still occur. By using anonymous build-reporting to track the timestamp of the earliest failed build of a package, we can automatically infer the _true_ upper bounds on its dependencies. If contradictory reports occur, they can be resolved by the trustees or the package's maintainers. Just some food for thought. I hope the timing of my e-mail will not discourage anyone from taking my suggestions seriously. [1]. If I am mistaken and you think you do have access to future information, please respond privately; I have some questions for you about the stock market. [2]. If you've never had an upper bounds problem on a package you maintain, I'm happy for you, but there is mounting evidence that as a community, we are very bad guessers, on average. [3]. The PVP is orthogonal to this. It is a convenient set of assumptions and a reasonably good set of norms; nothing more. -- Thomas Tuegel

On Wed, Apr 1, 2015 at 10:53 AM, Thomas Tuegel
[2]. If you've never had an upper bounds problem on a package you maintain, I'm happy for you, but there is mounting evidence that as a community, we are very bad guessers, on average.
I think upper bound with easy way to "slip" it (--allow-newer, as already implemented) is the best we can do. There's already a lot of evidence that guessing wrong in the other direction can and does break large chunks of the ecosystem --- as much as certain developers would prefer to ignore the fact and/or arrange that everyone other than them has to deal with the breakage. -- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net
participants (3)
-
Brandon Allbery
-
Gershom B
-
Thomas Tuegel