
Duncan, Mark, Simon Here's another cry of pain: http://fluffynukeit.com/2013/02/reflections-after-a-hard-day-in-haskell-gui-... "Cabal is a frustratingly constraining tool. Far too frequently I encountered packages that, when trying to install, would say installing this package will break a dozen others. If not that, then I often would be notified that the dependencies could not be resolved. " What is frustrating that we KNOW how to fix this, don't we? (Allow multiple installations of package P-3.2.5, each depending on different versions of its dependencies.) We just need to liberate enough effort to do it. Indeed, more people seem to be joining in with GHC/Cabal these days. How hard would it be to write a detailed description of the implementation changes needed to support side-by-side installations, and project-manage a group to do it? Simon

What I found very confusing that the sequence
cabal install wurbel
and
cabal unpack wurbel cd wurbel-0.0 cabal install
Gives us *radically* different results. The first one usually fails
when I have a "hand-patched" package (already successfully installed)
that wurbel depend on. The second one will work.
My interpretation is that the former looks at the transitive
dependency tree and thus ignores my fixes that have lead to my current
cabal world. The latter only considers direct dependencies, which is
sufficient to resolve everything, since the necessary packages are
present, and the problematic dependencies (from hackage) get dropped.
What we need for the former is a flag that says:
"do not transitively chase dependencies of already installed packages"
This would greatly enhance the cabal experience for people who want to
try packages with HEAD GHC and thus may accelerate the adoption rate
of new GHC releases w.r.t. hackage.
Just some feedback while at it.
Cheers,
Gabor
On 3/1/13, Simon Peyton-Jones
Duncan, Mark, Simon
Here's another cry of pain: http://fluffynukeit.com/2013/02/reflections-after-a-hard-day-in-haskell-gui-...
"Cabal is a frustratingly constraining tool. Far too frequently I encountered packages that, when trying to install, would say installing this package will break a dozen others. If not that, then I often would be notified that the dependencies could not be resolved. "
What is frustrating that we KNOW how to fix this, don't we? (Allow multiple installations of package P-3.2.5, each depending on different versions of its dependencies.) We just need to liberate enough effort to do it.
Indeed, more people seem to be joining in with GHC/Cabal these days. How hard would it be to write a detailed description of the implementation changes needed to support side-by-side installations, and project-manage a group to do it?
Simon

I'll chip in with my cries. Yesterday I spent 6 hours trying to track down a build problem in GHC HEAD. Turned out that it was caused by having two versions of binary package, which were installed to satisfy dependencies and silently broke my installation. I spent another hour today fixing things. I would like cabal to prevent such things from ever happening, the same way that Linux rpm/deb managers keep packages on the system in a consistent state. Janek

On Fri, Mar 01, 2013 at 03:02:41PM +0100, Jan Stolarek wrote:
fixing things. I would like cabal to prevent such things from ever happening, the same way that Linux rpm/deb managers keep packages on the system in a consistent state.
There's one big difference here: rpm/dpkg are only used to install things by the system administrator. But in the case of Cabal, a user could install 'mypackage' (in their user package database) and the next day the sysadmin could install a different instance of 'mypackage' in the global database. Thanks Ian

There's one big difference here: rpm/dpkg are only used to install things by the system administrator. But in the case of Cabal, a user could install 'mypackage' (in their user package database) and the next day the sysadmin could install a different instance of 'mypackage' in the global database. Then we must come up with a way of handling such a situation. The first idea that comes to my head is that by default cabal would only use one database: either the global one managed by the system administrator or the local user database. The user should be allowed to override the default setting and use both package databases (as it is now) with no consistency guarantees.
Janek

On Fri, Mar 01, 2013 at 05:13:58PM +0100, Jan Stolarek wrote:
There's one big difference here: rpm/dpkg are only used to install things by the system administrator. But in the case of Cabal, a user could install 'mypackage' (in their user package database) and the next day the sysadmin could install a different instance of 'mypackage' in the global database. Then we must come up with a way of handling such a situation. The first idea that comes to my head is that by default cabal would only use one database: either the global one managed by the system administrator or the local user database.
Well, that basically means you can't use the local one, as base is in the global one. Even if you made it a 3 database system: * the 'ghc' database, containing base, directory, etc * the 'system' database, containing and packages from Debian (for example) * the 'user' database, containing things you install where you have the choice of (ghc + system) or (ghc + user) then that means that you can only use packages from your OS if every single package you want to use is packaged by the OS. You could imagine changing things so that packages installed by OS packages aren't actually visible, and there's some way to add them to your user database (provided that would keep everything consistent). Perhaps 'cabal install foo' would first check to see if there is a suitable 'global' foo that it can just register in its database. It would be a more klunky workflow, but perhaps better than the status quo. Thanks Ian

On 1 March 2013 14:15, Ian Lynagh
On Fri, Mar 01, 2013 at 03:02:41PM +0100, Jan Stolarek wrote:
fixing things. I would like cabal to prevent such things from ever happening, the same way that Linux rpm/deb managers keep packages on the system in a consistent state.
There's one big difference here: rpm/dpkg are only used to install things by the system administrator. But in the case of Cabal, a user could install 'mypackage' (in their user package database) and the next day the sysadmin could install a different instance of 'mypackage' in the global database.
I thought that "cabal install" should be viewed as installing an instance of the requested package by recompiling the whole transitive closure of dependencies from scratch, in a sort of NixOS-like way. Given this view, Cabal's reuse of already compiled and installed packages is purely an optimization that can prevent it from recompiling some things if it is absolutely certain that doing so is unnecessary. The problem then is just that Cabal is currently brokenly unable to handle multiple instances of an installed package with the same name and version. In this view, the existence of local and global databases is straightforward: packages should always be installed in the most-accessible DB to which you have write permissions (for maximum sharing) and should be sourced from whichever is convenient when they are required. There are two complicating factors: 1. Some packages cannot be recompiled by the user (such as base) which breaks the mental model a bit. This s probably not too important. 2. In this view, does "cabal install mylibrary-1.1" actually do anything useful? The very next program you write that tries to link against mylibrary-1.1 may end up requiring a differently-compiled version because of its own dependency constraints. Of course, "cabal install myexe-1.1" is perfectly useful and well defined, and it should be the case that if "cabal install my-dep-1 my-dep-2 ... my-dep-N" immediately preceds "cabal build" of a package with dependencies (my-dep-i) then compilation of that package should proceed without requiring any dependencies to be recompiled. It seems to me that the ideal mental model for "cabal install mylibrary-1.1" is that it appends to a global mapping from package name to version which are essentially the packages that are available when you do "ghc -package mylibrary" and when using ghci. Cabals promise should be that it adds the requested package to the global mapping and then recompiles *everything* on your system as necessary in order to make it possible for every package in that global mapping to be imported simultaneously into a GHCi session. This seems like a vaguely sensible model of how things *should* work to me, unless I've overlooked some horrible complication. I know that Duncan is pretty keen on Nix so this the above plan may even be his final intention. But of course, saying all that is one thing, but finding the time to implement it quite another... Max

On Fri, Mar 1, 2013 at 10:24 AM, Max Bolingbroke wrote: I thought that "cabal install" should be viewed as installing an
instance of the requested package by recompiling the whole transitive
closure of dependencies from scratch, in a sort of NixOS-like way.
Given this view, Cabal's reuse of already compiled and installed
packages is purely an optimization that can prevent it from
recompiling some things if it is absolutely certain that doing so is
unnecessary. The problem then is just that Cabal is currently brokenly
unable to handle multiple instances of an installed package with the
same name and version. In this view, the existence of local and global
databases is straightforward: packages should always be installed in
the most-accessible DB to which you have write permissions (for
maximum sharing) and should be sourced from whichever is convenient
when they are required. There are two complicating factors:
1. Some packages cannot be recompiled by the user (such as base)
which breaks the mental model a bit. This s probably not too
important.
2. In this view, does "cabal install mylibrary-1.1" actually do
anything useful? The very next program you write that tries to link
against mylibrary-1.1 may end up requiring a differently-compiled
version because of its own dependency constraints. Of course, "cabal
install myexe-1.1" is perfectly useful and well defined, and it should
be the case that if "cabal install my-dep-1 my-dep-2 ... my-dep-N"
immediately preceds "cabal build" of a package with dependencies
(my-dep-i) then compilation of that package should proceed without
requiring any dependencies to be recompiled. It seems to me that the ideal mental model for "cabal install
mylibrary-1.1" is that it appends to a global mapping from package
name to version which are essentially the packages that are available
when you do "ghc -package mylibrary" and when using ghci. Cabals
promise should be that it adds the requested package to the global
mapping and then recompiles *everything* on your system as necessary
in order to make it possible for every package in that global mapping
to be imported simultaneously into a GHCi session. This seems like a vaguely sensible model of how things *should* work
to me, unless I've overlooked some horrible complication. I know that
Duncan is pretty keen on Nix so this the above plan may even be his
final intention. But of course, saying all that is one thing, but
finding the time to implement it quite another... This is the model I've been arguing for in e.g.
http://blog.johantibell.com/2012/03/cabal-of-my-dreams.html
It's the only model I believe scales to e.g. executables that depend on
thousands of packages (which happens for us at work). At that number of
dependencies building needs to be hermetic. cabal install <lib> should just
be a connivence thing you can use if you e.g. want to poke around a library
using ghci or need to have the library available when you're offline.

On Fri, Mar 01, 2013 at 10:33:39AM -0800, Johan Tibell wrote:
It's the only model I believe scales to e.g. executables that depend on thousands of packages
Debian has approximately 30,000 packages (although admittedly I don't know how many are libraries), and only needs a single version of each package. Having a single version of each package (with Hackage using a system similar to Debian's releases and 'testing' to define the sets of package versions) would make life a lot easier: Library maintainers don't need to worry so much about keeping packages working with old versions of their dependencies. Authors know that they can use any 2 packages together, and not have to worry about one of those packages depending on foo 1.* and the other depending on foo 2.*. The intractible problem of testing all combinations of versions of dependencies, to ensure that packages really do build in all the circumstances that they claim they do, disappears. Thanks Ian

Debian has a large team curating the packages.
On Mar 1, 2013 6:56 PM, "Ian Lynagh"
On Fri, Mar 01, 2013 at 10:33:39AM -0800, Johan Tibell wrote:
It's the only model I believe scales to e.g. executables that depend on thousands of packages
Debian has approximately 30,000 packages (although admittedly I don't know how many are libraries), and only needs a single version of each package.
Having a single version of each package (with Hackage using a system similar to Debian's releases and 'testing' to define the sets of package versions) would make life a lot easier:
Library maintainers don't need to worry so much about keeping packages working with old versions of their dependencies.
Authors know that they can use any 2 packages together, and not have to worry about one of those packages depending on foo 1.* and the other depending on foo 2.*.
The intractible problem of testing all combinations of versions of dependencies, to ensure that packages really do build in all the circumstances that they claim they do, disappears.
Thanks Ian
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

On 01/03/13 18:24, Max Bolingbroke wrote:
On 1 March 2013 14:15, Ian Lynagh
wrote: On Fri, Mar 01, 2013 at 03:02:41PM +0100, Jan Stolarek wrote:
fixing things. I would like cabal to prevent such things from ever happening, the same way that Linux rpm/deb managers keep packages on the system in a consistent state.
There's one big difference here: rpm/dpkg are only used to install things by the system administrator. But in the case of Cabal, a user could install 'mypackage' (in their user package database) and the next day the sysadmin could install a different instance of 'mypackage' in the global database.
I thought that "cabal install" should be viewed as installing an instance of the requested package by recompiling the whole transitive closure of dependencies from scratch, in a sort of NixOS-like way. Given this view, Cabal's reuse of already compiled and installed packages is purely an optimization that can prevent it from recompiling some things if it is absolutely certain that doing so is unnecessary. The problem then is just that Cabal is currently brokenly unable to handle multiple instances of an installed package with the same name and version.
Cabal comes under fire a lot, so I'd like to point out that it's not just Cabal that can't handle this right now, GHC can't either :-) And various people have been thinking a lot about how to fix it, there was even a SoC project last year to tackle it. The design notes are here: http://hackage.haskell.org/trac/ghc/wiki/Commentary/GSoCMultipleInstances
In this view, the existence of local and global databases is straightforward: packages should always be installed in the most-accessible DB to which you have write permissions (for maximum sharing) and should be sourced from whichever is convenient when they are required.
Right - when the DB is semantically just a cache, it doesn't matter whether stuff is installed in the global or local database. All those problems just go away.
It seems to me that the ideal mental model for "cabal install mylibrary-1.1" is that it appends to a global mapping from package name to version which are essentially the packages that are available when you do "ghc -package mylibrary" and when using ghci. Cabals promise should be that it adds the requested package to the global mapping and then recompiles *everything* on your system as necessary in order to make it possible for every package in that global mapping to be imported simultaneously into a GHCi session.
The new library that you just asked to be installed might be incompatible with some other libraries that you asked to be installed, and yet you want to be able to use them both with GHCi (just not at the same time). I don't think we should prevent the user from doing that. So whether "cabal install foo-1.0" should store some state somewhere that says the user prefers foo-1.0 over other versions of foo is an interesting question. (see the section "Simplistic Dependency Resolution" on the wiki page for some other thoughts on this). One stance is that "cabal install foo-1.0" should do nothing except populate the cache; that is, it is semantically a no-op. To have it modify some state breaks this nice no-op notion. Cheers, Simon

Thanks for the GSoCMultipleInstances link: it is very informative! It seems that there is a consensus already on what needs to be done here: GHC and Cabal must support multiple package instances with the same name and version (package curation and development sandboxing have their value above and beyond this too). And there also is seems to be a general design of how this needs to be done. Assuming that a package instance is identified by {PackageName}-{Version}-{InstanceId} here are some specific comments: ** What are the precise inputs to generating {InstanceId}? The is a key question and the rest of the design will flow from it. ** When developing a package or multiple packages there is no point in keeping track of multiple instances (i.e. don't install). Cabal sandboxing or a local package db where {InstanceId} is a constant is enough. Cabal will, however, need to find their other package instance dependencies in the user db or system db.
[GSoCMultipleInstance] There are three identifiers: [GSoCMultipleInstance] XXXX: the identifier appended to the installation directory so that installed packages do not clash with each other [GSoCMultipleInstance] YYYY: the InstalledPackageId, which is an identifier used to uniquely identify a package in the package database. [GSoCMultipleInstance] ZZZZ: the ABI hash derived by GHC after compiling the package ** It would be nice to reduce the complexity here and strive for a single {InstanceId} that together with {PackageName} and {Version} are used throughout (libs, package.conf.d, etc)
[GSoCMultipleInstance] "we need to distinguish between two packages that have identical ABIs but different behaviour (e.g. a bug was fixed)" ** This is why the package version {Version} exists. If a bug was fixed, a proper release process must increase the package version and the unique hash/id should not try to fix this.
[GSoCMultipleInstance] "We define a new Cabal Hash that hashes the compilation inputs (the LocalBuildInfo and the contents of the source files)" ** I am not sure why hashing the sources here is important: an added space character could render a different hash but the object file could be exactly the same. ** There is paragraph later in the document that describes what could be the motivation here: installing unreleased packages (a clean install vs a dirty install).
[GSoCMultipleInstance] "ZZZZ is recorded in the package database as a new field abi-hash. When two packages have identical ZZZZs then they are interface-compatible, and the user might in the future want to change a particular dependency to use a different package but the the same ZZZZ. We do not want to make this change automatically, because even when two packages have identical ZZZZs, they may have different behaviour (e.g. bugfixes)." ** It is not clear to me in what cases will this be useful. If my .cabal defines that I depend on a version 1.2.3 (or a range) this assumes these dependencies are interface compatible and the InstallPlan should be able to pick one that makes most sense (same for bug fixes). I don't deny that this may be an interesting requirement, but sounds like secondary to me. ** I am a bit confused by who will be responsible for generating this {InstanceId}: Cabal or GHC? My initial thought was that GHC should be responsible for defining the required inputs and generating the appropriate {InstanceId} specially since it needs to be able to traverse package dependencies for linking/ghci. However, maybe this is not an issue since the package DB will simply be a DAG with specific {InstanceId} pointers between nodes/dependencies?
[GSoCMultipleInstance] The best tool for determining suitable package instances to use as build inputs is cabal-install. However, in practice there will be many situations where users will probably not have the full cabal-install functionality available: [GSoCMultipleInstance] invoking GHCi from the command line, [GSoCMultipleInstance] invoking GHC directly from the command line, [GSoCMultipleInstance] invoking the configure phase of Cabal (without using cabal-install). ** If the package DB stores a graph of {PackageName}-{Version}-{InstanceId} packages connected to other specific package instances (e.g. the mypkg-1.0-1234 package instance depends on the yourpkg-1.1-9876 package instance), navigating this DAG is straightforward and I don't see why cabal-install would be needed here. Maybe the issue is selecting the first package instance based on a given {PackageName}-{Version} or just {PackageName}? Maybe the design here should make sure that there are some minimal attributes that GHC/GHCi can query to decide what initial package instance to pick.

| I thought that "cabal install" should be viewed as installing an | instance of the requested package by recompiling the whole transitive | closure of dependencies from scratch, in a sort of NixOS-like way. | Given this view, Cabal's reuse of already compiled and installed | packages is purely an optimization that can prevent it from | recompiling some things if it is absolutely certain that doing so is | unnecessary. The problem then is just that Cabal is currently brokenly | unable to handle multiple instances of an installed package with the | same name and version. I believe that what you describe is precisely The Glorious Plan http://hackage.haskell.org/trac/ghc/wiki/Commentary/GSoCMultipleInstances It's just that no one has time to do it. That's why I was raising it (again) to see if anyone has any bright ideas for un-gluing this particular log-jam. Simon

Somebody claiming to be Simon Peyton-Jones wrote:
"Far too frequently I encountered packages that, when trying to install, would say installing this package will break a dozen others."
I also get this message sometimes, but I never consider it a problem. I just add all the packages that would be broken also to the command line, and that informs the constraints solver a bit more (and rebuilds some things) and then works. I've actually wished for a switch that just does this, but cut-n-paste the packages it tells me about works fine. -- Stephen Paul Weber, @singpolyma See http://singpolyma.net for how I prefer to be contacted edition right joseph
participants (10)
-
Administrator
-
Don Stewart
-
Gabor Greif
-
Ian Lynagh
-
Jan Stolarek
-
Johan Tibell
-
Max Bolingbroke
-
Simon Marlow
-
Simon Peyton-Jones
-
Stephen Paul Weber