Changing the -package dependency resolution algorithm

Right now, GHC has a very complex and hard to explain algorithm for picking packages from the package database when you give it a pile of -package/-package-id/-{hide,ignore,trust,distrust}-package flags. Roughly, it currently does something like this. 1. Concatenate all of the package databases into a giant list, with system packages first and then user packages later, removing duplicate entries with the same installed package ID (preferring later packages). Call this PACKAGES. 2. Take all the -package-id flags and compute their transitive closure (call this set P). 3. Calculate the set of shadowed packages--the set of installed packages for which there exists another package with the same package ID (foo-0.1) which is in P or is later in the package stack. 4. Calculate the set of ignored packages---the set of packages which match all -ignore-package flags 5. Filter out shadowed and ignored packages from the list of packages, calling the result ALL_PACKAGES. 6. Calculate the set of broken packages---the set of packages not contained in the least fixed point of the relation that takes a set of packages, and adds all packages whose dependencies are satisfied, from ALL_PACKAGES. 7. Process the package flags in order, operating on PACKAGES (not ALL_PACKAGES). For -package/-hide-package, take the package with the *latest* version that matches the flag and are not broken (since this includes shadowed packages, the result is unique) and toggle it to be exposed/unexposed, and hide all the other packages. For -trust-package/-distrust-package, toggle the trusted bit for all instances in the database. 8. If we have exposed multiple versions of the same package name, hide all the older versions What a mouthful! I have no idea, given a set of flags, how this works. So here is an alternate proposal for an alternate way of handling these flags *when starting from an empty database* (e.g. -hide-all-packages) Suppose we maintain a set of selected packages, which starts off empty. Process each package flag in turn. For -package and -package-id, get the set of installed packages which match the flag. (If multiple package names apply, process each in turn as if it were a separate flag.) Compute the transitive closure of dependencies for all of them, and filter out all choices which have dependencies which are inconsistent with the current set of selected packages. Consistency without multi-instances is a mapping of a package name to an installed package. If there is still more than one choice, tiebreak by version, which database and time of install. (The latter tiebreak should not be necessary until we allow multiple instances of a package with the same package ID.) For -hide-package, get the set of packages which match and hide them all; for -ignore-package, hide the transitive closure of dependencies of it. For -trust,distrust-package, toggle for all matching packages as before. Here are some differences in behavior between this and the previous scheme: - It's no longer valid to indirectly depend on two different versions of the same package. Most of the time, users didn't want this anyway. Note that the current scheme prevents directly depending on two different versions by shadowing the old ones. - We can easily extend it to handle multi-instances by relaxing the consistency check. - It assumes *-hide-all-packages* at the beginning. This scheme probably works less well without that: now we need some consistent view of the database to start with. What do people think? Cheers, Edward

The background here is our (Edward + me) beliefs that * The current situation is complicated, and completely un-documented * Its functionality is not used. The dominant modes of use are - Cabal: hide-all-packages and then say exactly which ones to expose - Users: use -package (but not -package-id etc) in very simple-minded way So the cost/benefit ratio is extremely poor. Better to implement something extremely simple (for the common case), relying on Cabal's super solver for planning a good solution to more complex situations. Simon | -----Original Message----- | From: ghc-devs [mailto:ghc-devs-bounces@haskell.org] On Behalf Of | Edward Z.Yang | Sent: 24 July 2014 15:57 | To: ghc-devs | Subject: Changing the -package dependency resolution algorithm | | Right now, GHC has a very complex and hard to explain algorithm for | picking packages from the package database when you give it a pile of - | package/-package-id/-{hide,ignore,trust,distrust}-package flags. | Roughly, it currently does something like this. | | 1. Concatenate all of the package databases into a giant list, with | system packages first and then user packages later, removing duplicate | entries with the same installed package ID (preferring later packages). | Call this PACKAGES. | | 2. Take all the -package-id flags and compute their transitive closure | (call this set P). | | 3. Calculate the set of shadowed packages--the set of installed | packages for which there exists another package with the same package | ID | (foo-0.1) which is in P or is later in the package stack. | | 4. Calculate the set of ignored packages---the set of packages which | match all -ignore-package flags | | 5. Filter out shadowed and ignored packages from the list of packages, | calling the result ALL_PACKAGES. | | 6. Calculate the set of broken packages---the set of packages not | contained in the least fixed point of the relation that takes a set of | packages, and adds all packages whose dependencies are satisfied, from | ALL_PACKAGES. | | 7. Process the package flags in order, operating on PACKAGES (not | ALL_PACKAGES). For -package/-hide-package, take the package with the | *latest* version that matches the flag and are not broken (since this | includes shadowed packages, the result is unique) and toggle it to be | exposed/unexposed, and hide all the other packages. For -trust- | package/-distrust-package, toggle the trusted bit for all instances in | the database. | | 8. If we have exposed multiple versions of the same package name, hide | all the older versions | | What a mouthful! I have no idea, given a set of flags, how this works. | So here is an alternate proposal for an alternate way of handling these | flags *when starting from an empty database* (e.g. -hide-all-packages) | | Suppose we maintain a set of selected packages, which starts off empty. | Process each package flag in turn. | | For -package and -package-id, get the set of installed packages which | match the flag. (If multiple package names apply, process each in turn | as if it were a separate flag.) Compute the transitive closure of | dependencies for all of them, and filter out all choices which have | dependencies which are inconsistent with the current set of selected | packages. Consistency without multi-instances is a mapping of a | package name to an installed package. If there is still more than one | choice, tiebreak by version, which database and time of install. (The | latter tiebreak should not be necessary until we allow multiple | instances of a package with the same package ID.) | | For -hide-package, get the set of packages which match and hide them | all; for -ignore-package, hide the transitive closure of dependencies | of it. | For -trust,distrust-package, toggle for all matching packages as | before. | | Here are some differences in behavior between this and the previous | scheme: | | - It's no longer valid to indirectly depend on two different versions | of the same package. Most of the time, users didn't want this | anyway. | Note that the current scheme prevents directly depending on two | different versions by shadowing the old ones. | | - We can easily extend it to handle multi-instances by relaxing the | consistency check. | | - It assumes *-hide-all-packages* at the beginning. This scheme | probably works less well without that: now we need some consistent | view of the database to start with. | | What do people think? | | Cheers, | Edward | _______________________________________________ | ghc-devs mailing list | ghc-devs@haskell.org | http://www.haskell.org/mailman/listinfo/ghc-devs

Excerpts from Edward Z. Yang's message of 2014-07-24 15:57:05 +0100:
- It assumes *-hide-all-packages* at the beginning. This scheme probably works less well without that: now we need some consistent view of the database to start with.
Actually, thinking about this, this dovetails nicely with the "package environments" work the IHG is sponsoring. The idea behind a package environment is you specify some set of installed package IDs, which serves as the visible slice of the package database which is used for compilation. Then, ghc called without any arguments is simply using the *default* package environment. Furthermore, when users install packages, they may or may not decide to add the package to their global environment, and they can be informed if the package is inconsistent with a package that already is in their environment (mismatched dependencies). A user can also request to upgrade a package in their environment, and Cabal could calculate how all the other packages in the environment would need to be upgraded in order to keep the environment consistent, and run this plan for the user. Cheers, Edward

How would this work with ghci? If I'm understanding correctly, the
proposal means users could no longer do:
$ ghci SomeFile.hs
and have it work without manually specifying all -package flags. Did I
miss something?
I think it would work in conjuction with the package environments stuff,
provided that were available on all platforms ghc supports.
John L.
On Thu, Jul 24, 2014 at 11:12 PM, Edward Z. Yang
Excerpts from Edward Z. Yang's message of 2014-07-24 15:57:05 +0100:
- It assumes *-hide-all-packages* at the beginning. This scheme probably works less well without that: now we need some consistent view of the database to start with.
Actually, thinking about this, this dovetails nicely with the "package environments" work the IHG is sponsoring. The idea behind a package environment is you specify some set of installed package IDs, which serves as the visible slice of the package database which is used for compilation. Then, ghc called without any arguments is simply using the *default* package environment.
Furthermore, when users install packages, they may or may not decide to add the package to their global environment, and they can be informed if the package is inconsistent with a package that already is in their environment (mismatched dependencies). A user can also request to upgrade a package in their environment, and Cabal could calculate how all the other packages in the environment would need to be upgraded in order to keep the environment consistent, and run this plan for the user.
Cheers, Edward _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

Good question. I think package environments are the right answer here: GHCi should come preloaded with some special global package environment. Edward Excerpts from John Lato's message of 2014-07-25 00:52:12 +0100:
How would this work with ghci? If I'm understanding correctly, the proposal means users could no longer do:
$ ghci SomeFile.hs
and have it work without manually specifying all -package flags. Did I miss something?
I think it would work in conjuction with the package environments stuff, provided that were available on all platforms ghc supports.
John L.
On Thu, Jul 24, 2014 at 11:12 PM, Edward Z. Yang
wrote: Excerpts from Edward Z. Yang's message of 2014-07-24 15:57:05 +0100:
- It assumes *-hide-all-packages* at the beginning. This scheme probably works less well without that: now we need some consistent view of the database to start with.
Actually, thinking about this, this dovetails nicely with the "package environments" work the IHG is sponsoring. The idea behind a package environment is you specify some set of installed package IDs, which serves as the visible slice of the package database which is used for compilation. Then, ghc called without any arguments is simply using the *default* package environment.
Furthermore, when users install packages, they may or may not decide to add the package to their global environment, and they can be informed if the package is inconsistent with a package that already is in their environment (mismatched dependencies). A user can also request to upgrade a package in their environment, and Cabal could calculate how all the other packages in the environment would need to be upgraded in order to keep the environment consistent, and run this plan for the user.
Cheers, Edward _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

Sorry for not replying to this earlier. I think after our meeting the other day we agreed to keep the existing algorithm but document it better. Here's a stab at describing the spec: 1. compute the set of valid packages, where a package is valid if - all its dependencies are valid - it is not named in an -ignore-package flag, - it is not shadowed by a package with the same PackageId in another Package DB, unless it is in the transitive closure of a package named in a -package-id flag 2. for each flag: -package P: expose P, and hide all other versions of P -package-id P: expose P, and hide all other versions of P -hide-package P: hide P 3. if there are multiple exposed packages with the same name, hide all but the most recent version. We need to rethink the shadowing behaviour. It is designed to handle the case where we have the same PackageId (name + version) in two different DBs (e.g. global and local). Shadowing takes the topmost one of these (e.g. local, or rightmost -package-db flag). We can relax this requirement so long as the InstalledPackageIds are different, but what if the InstalledPackageIds are the same? Right now that's OK, because identical InstalledPackageIds implies identical ABIs, but if we change that so that InstalledPackageId is derived from the source and not the ABI, we would not be able to assume that two identical InstalledPackageIds are compatible. Cheers, Simon On 24/07/2014 15:57, Edward Z. Yang wrote:
Right now, GHC has a very complex and hard to explain algorithm for picking packages from the package database when you give it a pile of -package/-package-id/-{hide,ignore,trust,distrust}-package flags. Roughly, it currently does something like this.
1. Concatenate all of the package databases into a giant list, with system packages first and then user packages later, removing duplicate entries with the same installed package ID (preferring later packages). Call this PACKAGES.
2. Take all the -package-id flags and compute their transitive closure (call this set P).
3. Calculate the set of shadowed packages--the set of installed packages for which there exists another package with the same package ID (foo-0.1) which is in P or is later in the package stack.
4. Calculate the set of ignored packages---the set of packages which match all -ignore-package flags
5. Filter out shadowed and ignored packages from the list of packages, calling the result ALL_PACKAGES.
6. Calculate the set of broken packages---the set of packages not contained in the least fixed point of the relation that takes a set of packages, and adds all packages whose dependencies are satisfied, from ALL_PACKAGES.
7. Process the package flags in order, operating on PACKAGES (not ALL_PACKAGES). For -package/-hide-package, take the package with the *latest* version that matches the flag and are not broken (since this includes shadowed packages, the result is unique) and toggle it to be exposed/unexposed, and hide all the other packages. For -trust-package/-distrust-package, toggle the trusted bit for all instances in the database.
8. If we have exposed multiple versions of the same package name, hide all the older versions
What a mouthful! I have no idea, given a set of flags, how this works. So here is an alternate proposal for an alternate way of handling these flags *when starting from an empty database* (e.g. -hide-all-packages)
Suppose we maintain a set of selected packages, which starts off empty. Process each package flag in turn.
For -package and -package-id, get the set of installed packages which match the flag. (If multiple package names apply, process each in turn as if it were a separate flag.) Compute the transitive closure of dependencies for all of them, and filter out all choices which have dependencies which are inconsistent with the current set of selected packages. Consistency without multi-instances is a mapping of a package name to an installed package. If there is still more than one choice, tiebreak by version, which database and time of install. (The latter tiebreak should not be necessary until we allow multiple instances of a package with the same package ID.)
For -hide-package, get the set of packages which match and hide them all; for -ignore-package, hide the transitive closure of dependencies of it. For -trust,distrust-package, toggle for all matching packages as before.
Here are some differences in behavior between this and the previous scheme:
- It's no longer valid to indirectly depend on two different versions of the same package. Most of the time, users didn't want this anyway. Note that the current scheme prevents directly depending on two different versions by shadowing the old ones.
- We can easily extend it to handle multi-instances by relaxing the consistency check.
- It assumes *-hide-all-packages* at the beginning. This scheme probably works less well without that: now we need some consistent view of the database to start with.
What do people think?
Cheers, Edward _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

We need to rethink the shadowing behaviour. It is designed to handle the case where we have the same PackageId (name + version) in two different DBs (e.g. global and local). Shadowing takes the topmost one of these (e.g. local, or rightmost -package-db flag). We can relax this requirement so long as the InstalledPackageIds are different, but what if the InstalledPackageIds are the same? Right now that's OK, because identical InstalledPackageIds implies identical ABIs, but if we change that so that InstalledPackageId is derived from the source and not the ABI, we would not be able to assume that two identical InstalledPackageIds are compatible.
I talked to Duncan about this, and he's asserted that under a Nix-like model, equal InstalledPackageIds really would imply identical ABIs. I think this would be pretty hard to achieve reliably (and if we get it wrong, segfault); Simon, perhaps you would know more about this. Duncan (and shout if I understand wrong) is also keen on abolishing the shadowing algorithm completely when we have package environments and making a package environment mandatory (defaulting to a global package environment when nothing available.) The reason for this is that in a Nix model, we mostly abolish package database stacks and have a single global package database, which all packages are chucked into. In this case, the current shadowing really has no idea how to pick between two packages which shadow each other in the same database. Edward

On 31/07/2014 15:31, Edward Z. Yang wrote:
We need to rethink the shadowing behaviour. It is designed to handle the case where we have the same PackageId (name + version) in two different DBs (e.g. global and local). Shadowing takes the topmost one of these (e.g. local, or rightmost -package-db flag). We can relax this requirement so long as the InstalledPackageIds are different, but what if the InstalledPackageIds are the same? Right now that's OK, because identical InstalledPackageIds implies identical ABIs, but if we change that so that InstalledPackageId is derived from the source and not the ABI, we would not be able to assume that two identical InstalledPackageIds are compatible.
I talked to Duncan about this, and he's asserted that under a Nix-like model, equal InstalledPackageIds really would imply identical ABIs. I think this would be pretty hard to achieve reliably (and if we get it wrong, segfault); Simon, perhaps you would know more about this.
Right now this is ok, because InstalledPackageIds contain the ABI, but the proposal we were going to follow (I believe) was to have InstalledPackageIds contain a hash of the source code. Since GHC compilation is (still) not deterministic, identical source hashes does not imply identical ABIs. It would be ok if the InstalledPackageId was Hash(Source,ABI), though. (or was there some reason we had to know the InstalledPackageId before compilation?)
Duncan (and shout if I understand wrong) is also keen on abolishing the shadowing algorithm completely when we have package environments and making a package environment mandatory (defaulting to a global package environment when nothing available.) The reason for this is that in a Nix model, we mostly abolish package database stacks and have a single global package database, which all packages are chucked into. In this case, the current shadowing really has no idea how to pick between two packages which shadow each other in the same database.
Yes, and I agree with Duncan, provided identical InstalledPackageId implies identical ABI. Cheers, Simon
participants (4)
-
Edward Z. Yang
-
John Lato
-
Simon Marlow
-
Simon Peyton Jones