Suggestion for resolving the Cabal/GHC dependency problems

All, I was discussing this with Yuri earlier and I had an idea that I think may resolve our problems. Firstly, what are the problems: 1. ghc devs and users grumble because the ghc library depends on Cabal, making it hard to use the ghc lib with a later Cabal. 2. ghc devs grumble generally that Cabal seems quite big but they only need small parts of it 3. Cabal devs complain that they cannot add useful dependencies (like a parser with error messages) because ghc depends on Cabal. Secondly, let us recall why it is that ghc does use Cabal, and where: 1. it's used by ghc-pkg to read/write the external representation of installed package files (external rep is defined by Cabal spec, and implemented in the Cabal lib) 2. it's used by ghc to read the ghc package database files/dirs. These databases use the same external representation, and ghc & ghc-pkg use the InstalledPackageInfo type internally (InstalledPackageInfo is defined in the Cabal lib). 3. it's used by the ghc build system to help with building all the libraries that ship with ghc. I believe that this part uses more of the build system part of Cabal, not just the types and external formats. 4. ghc comes with Cabal pre-installed so that users can run Setup.hs scripts to install other packages. This was part of the original Cabal design: that all compilers would use the installed package info format defined by Cabal, and all compilers would ship Cabal to users so the Setup.hs mechanism will work. Now, as far as I know, nobody is suggesting that ghc stop shipping Cabal, nor that it stop using it as part of the build system. The problems all centre around use number 2, where the ghc library package depends on Cabal. Number 1 isn't really a problem because ghc-pkg is an executable. So my suggestion is quite simple, eliminate the dependency in case 2 above, but keep it in the other three cases. Specifically: * ghc will use a new internal type to represent info coming from the ghc-pkg databases, ie not InstalledPackageInfo. This can be smaller as ghc doesn't care about the metadata. * The InstalledPackageInfo and the current need for ghc to read its external representation is the main reason the ghc lib depends on Cabal. Other dependencies should be minor and easy to remove. * ghc and ghc-pkg will agree on a new on-disk representation of the installed package info. * ghc-pkg will continue to depend on Cabal, it will continue to use the types and parsers defined by Cabal to read/write the InstalledPackageInfo. It will translate from InstalledPackageInfo into the on-disk representation that ghc & ghc-pkg share. So what might the on-disk representation for the ghc-pkg databases look like? Currently they use the external format of InstalledPackageInfo because this is convenient using Cabal. One simple option is just to store both formats for all packages. Another option would be that ghc never reads package dbs where the cache is out of date. Then it only ever reads the cache and never has to look at the other files. In principle the cache should never be out of date: there are two options for updating the db, calling ghc-pkg, or putting the file directly and calling ghc-pkg recache (distros often use the latter as it is simpler for them). In either case the db cache will be up to date. (In fact calling it a cache is not really correct.) So this is a better solution than the one previously proposed to split out some small part of Cabal, because in this proposal, ghc doesn't depend on Cabal at all, not even some smaller common lib. It's also better from the point of view of the Cabal folks because it does not involve splitting Cabal in unnatural ways. The Cabal folks do want to split the Cabal lib, but not in a way that is especially helpful to ghc. This suggestion is orthogonal to any Cabal lib splits. Further, if only ghc-pkg and the ghc build system depend on Cabal, then it is easier for Cabal to add more dependencies, since they do not have to be installed with ghc (due to the ghc lib depending on them). In particular the Cabal folks would like to use a proper parser and have suggested adding dependencies on parsec, mtl and transformers. If only ghc-pkg depends on Cabal, then these dependencies only need to be used at build time, and do not have to be installed (which also means they don't have to be kept quite so up to date). Note that this would not address SPJ's complaint that the start of building ghc involves building 60+ modules of Cabal. The ghc-cabal tool still uses Cabal and I am not suggesting changing that now. It's plausible that when the Cabal lib is split that the ghc-cabal tool could depend on just the smaller of the two (someone would need to look at how much functionality from the "Simple" build system it uses). I don't see that this is a big priority however. Duncan

On Wed, 2013-09-11 at 17:28 +0100, Duncan Coutts wrote:
Further, if only ghc-pkg and the ghc build system depend on Cabal, then it is easier for Cabal to add more dependencies, since they do not have to be installed with ghc (due to the ghc lib depending on them). In particular the Cabal folks would like to use a proper parser and have suggested adding dependencies on parsec, mtl and transformers. If only ghc-pkg depends on Cabal, then these dependencies only need to be used at build time, and do not have to be installed (which also means they don't have to be kept quite so up to date).
Actually, this is not quite right. Since ghc would still ship Cabal (but not depend on it), it would also ship its dependencies including parsec, mtl and transformers. So they would need to be up to date and installed, it's just that ghc itself would not depend on them. If that's really inconvenient, it's plausible to have a minimal set which is just the things ghc depends on, so long as what gets shipped to users is the useful set, including Cabal. Duncan

On Wed, Sep 11, 2013 at 12:19 PM, Duncan Coutts < duncan.coutts@googlemail.com> wrote:
Actually, this is not quite right. Since ghc would still ship Cabal (but not depend on it), it would also ship its dependencies including parsec, mtl and transformers. So they would need to be up to date and installed, it's just that ghc itself would not depend on them.
If that's really inconvenient, it's plausible to have a minimal set which is just the things ghc depends on, so long as what gets shipped to users is the useful set, including Cabal.
I don't quite like how GHC's dependencies leak out to the rest of the world. It makes it possible for us to decide what version we want to ship in the platform of those libraries. I guess we don't have a good technical solution to this problem though.

wasn't there an effort to have a mini private variant of attoparsec for the
parser combinator deps?
On Wed, Sep 11, 2013 at 4:03 PM, Johan Tibell
On Wed, Sep 11, 2013 at 12:19 PM, Duncan Coutts < duncan.coutts@googlemail.com> wrote:
Actually, this is not quite right. Since ghc would still ship Cabal (but not depend on it), it would also ship its dependencies including parsec, mtl and transformers. So they would need to be up to date and installed, it's just that ghc itself would not depend on them.
If that's really inconvenient, it's plausible to have a minimal set which is just the things ghc depends on, so long as what gets shipped to users is the useful set, including Cabal.
I don't quite like how GHC's dependencies leak out to the rest of the world. It makes it possible for us to decide what version we want to ship in the platform of those libraries. I guess we don't have a good technical solution to this problem though.
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

On 11/09/13 17:28, Duncan Coutts wrote:
All,
I was discussing this with Yuri earlier and I had an idea that I think may resolve our problems.
Firstly, what are the problems:
1. ghc devs and users grumble because the ghc library depends on Cabal, making it hard to use the ghc lib with a later Cabal. 2. ghc devs grumble generally that Cabal seems quite big but they only need small parts of it 3. Cabal devs complain that they cannot add useful dependencies (like a parser with error messages) because ghc depends on Cabal.
Secondly, let us recall why it is that ghc does use Cabal, and where:
1. it's used by ghc-pkg to read/write the external representation of installed package files (external rep is defined by Cabal spec, and implemented in the Cabal lib) 2. it's used by ghc to read the ghc package database files/dirs. These databases use the same external representation, and ghc & ghc-pkg use the InstalledPackageInfo type internally (InstalledPackageInfo is defined in the Cabal lib). 3. it's used by the ghc build system to help with building all the libraries that ship with ghc. I believe that this part uses more of the build system part of Cabal, not just the types and external formats. 4. ghc comes with Cabal pre-installed so that users can run Setup.hs scripts to install other packages. This was part of the original Cabal design: that all compilers would use the installed package info format defined by Cabal, and all compilers would ship Cabal to users so the Setup.hs mechanism will work.
Now, as far as I know, nobody is suggesting that ghc stop shipping Cabal, nor that it stop using it as part of the build system.
The problems all centre around use number 2, where the ghc library package depends on Cabal. Number 1 isn't really a problem because ghc-pkg is an executable.
So my suggestion is quite simple, eliminate the dependency in case 2 above, but keep it in the other three cases. Specifically:
* ghc will use a new internal type to represent info coming from the ghc-pkg databases, ie not InstalledPackageInfo. This can be smaller as ghc doesn't care about the metadata. * The InstalledPackageInfo and the current need for ghc to read its external representation is the main reason the ghc lib depends on Cabal. Other dependencies should be minor and easy to remove. * ghc and ghc-pkg will agree on a new on-disk representation of the installed package info. * ghc-pkg will continue to depend on Cabal, it will continue to use the types and parsers defined by Cabal to read/write the InstalledPackageInfo. It will translate from InstalledPackageInfo into the on-disk representation that ghc & ghc-pkg share.
So what might the on-disk representation for the ghc-pkg databases look like? Currently they use the external format of InstalledPackageInfo because this is convenient using Cabal.
One simple option is just to store both formats for all packages. Another option would be that ghc never reads package dbs where the cache is out of date. Then it only ever reads the cache and never has to look at the other files. In principle the cache should never be out of date: there are two options for updating the db, calling ghc-pkg, or putting the file directly and calling ghc-pkg recache (distros often use the latter as it is simpler for them). In either case the db cache will be up to date. (In fact calling it a cache is not really correct.)
GHC currently always reads the binary cache, even if it is out of date (I just checked). However, it still also supports the legacy format of package databases using the Read instance of InstalledPackageInfo. I'm not sure whether this is still used at all. We certainly could make another type similar to InstalledPackageInfo, derive Binary for it, and use that as the package database format. I think you're right that it's probably easier to do this than to split out InstalledPackageInfo from Cabal. We would need to make small package for this that would be shared by ghc-pkg and GHC. Cheers, Simon
So this is a better solution than the one previously proposed to split out some small part of Cabal, because in this proposal, ghc doesn't depend on Cabal at all, not even some smaller common lib.
It's also better from the point of view of the Cabal folks because it does not involve splitting Cabal in unnatural ways. The Cabal folks do want to split the Cabal lib, but not in a way that is especially helpful to ghc. This suggestion is orthogonal to any Cabal lib splits.
Further, if only ghc-pkg and the ghc build system depend on Cabal, then it is easier for Cabal to add more dependencies, since they do not have to be installed with ghc (due to the ghc lib depending on them). In particular the Cabal folks would like to use a proper parser and have suggested adding dependencies on parsec, mtl and transformers. If only ghc-pkg depends on Cabal, then these dependencies only need to be used at build time, and do not have to be installed (which also means they don't have to be kept quite so up to date).
Note that this would not address SPJ's complaint that the start of building ghc involves building 60+ modules of Cabal. The ghc-cabal tool still uses Cabal and I am not suggesting changing that now. It's plausible that when the Cabal lib is split that the ghc-cabal tool could depend on just the smaller of the two (someone would need to look at how much functionality from the "Simple" build system it uses). I don't see that this is a big priority however.
Duncan
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
participants (4)
-
Carter Schonwald
-
Duncan Coutts
-
Johan Tibell
-
Simon Marlow