Request for feedback on spec/proposal for distributing package collections via hackage

Hi folks, I'd like to get feedback on a spec/proposal for distributing package collections via hackage. This is currently somewhere beyond vapourware but certainly not a fait accompli and hopefully it is at an appropriate point to get feedback. The basic idea is that package collections are: * useful (IMHO, one of the top two solutions to dependency hell, alongside nix-style package management); and * just as we distribute packages via hackage, we should also be able to easily distribute package collections. One would then use them with tools like cabal and stack. Distributing via hackage (both in the sense of the format/protocol and in the sense of the central community hackage instance) seems natural, and allows taking advantage of much of the infrastructure we have for packages already like: * existing user accounts and management infrastructure on the hackage website * allowing anyone to host collections on their own servers, just as they can host their own package archives currently (either as static file sets or with smart servers) * low barrier for distribution, potentially encouraging more collections to be created potentially covering more use cases * security infrastructure (currently in alpha) * automatic mirroring (currently in alpha) Two obvious examples are stackage-lts and stackage-nightly but if we lower the barrier for distribution then there may well be many more. For example, the existing Linux distros put a lot of effort into selecting and maintain package collections, and some of these collections could be distributed via hackage. In fast several Linux distributions already use Hackage's "distro" feature to advertise which versions of packages are provided by that distro. One can also imagine special-purpose collections, and there's probably cases we've not thought of yet. Package collections are different things from packages, not like "meta packages" that one gets in some package systems. A package collection at it's simplest is just a set of source package identifiers (ie names-version pairs). Like packages, package collections have names and versions and are immutable once distributed. The intention is that users can configure their tool to use collection(s), either by nailing down a specific collection version, or by not specifying a version it would default to the latest version of the named collection. (But the specific behaviour is up to the tool) Use cases: * versioned collections. For some collections the policy by which it's defined naturally uses meaningful versions. * daily collections. These can have a date-form version number imposed on them. * "live" "rolling" collections. These could have a simple monotonic increasing version with no particular meaning attached. For such collections, clients might be configured to use the latest (by not specifying a version), but it's always possible to pick a specific revision. * special-purpose collections. Not necessarily collections aiming to cover a large number of common packages, but aiming to cover some application area, or related stack of packages (e.g. some of the web frameworks). * negative collections. Collections of packages you may specifically want to avoid (e.g. deprecated by their authors, or known-broken). Using such collections would rely on clients that can be configured to treat it negatively. Specifics: A package collection specifies a set of source package ids (id being name-version pair). It also optionally specifies a (partial) flag assignment for any package name. The collection does not specify how tools should treat them. That is, a collection does not specify if it should be treated as a strong or a soft constraint, inclusive or exclusive, positive or negative. Such things are completely up to the client's policy and configuration. Similarly for flag assignments, collections do not specify whether tools should interpret these as strong or soft constraints. Syntax: Package collection names and versions exactly follow those of package names (but they live in a different namespace). For example, "stackage-lts-2.9", or "deprecated-343" (the latter being a "rolling" collection with a meaningless monotonically increasing version). A collection distributed in the archive format is just a text file with one entry per line, such as: foo-1.0 foo-1.1 bar >= 3 && < 4 bar +this -that So each line can be one of: * a simple package id * a package version range, using Cabal version range syntax * a package name with a flag assignment, + for on, - for off The interpretation of the above is that: * both foo-1.0 and foo-1.1 are in the collection (ie union not intersection) * all versions of bar between 3 and 4 are in the collection * the package bar has flag 'this' as True, and flag 'that' as False Of course for some collections the policy is that only one version of any package is included, but this is a policy question and the format itself does not impose this constraint. Hackage archive format: collection files live under a different prefix from package tarballs (but are still considered part of the archive) and are named after the collection id. The collection files are not compressed (but of course http clients and servers can negotiate transport compression). The collection files are not included in nor listed in the existing 00-index.tar.gz, but there's other json format metadata for a client to enumerate the available collections and versions. And like with package tarballs, a client that wants a specific collection version can construct the url and fetch it directly. Security: The hackage security system that's currently in alpha testing can easily be extended to cover collections, similarly to how it covers package tarballs. Misc notes: There is no requirement that a hackage-format repo containing collections be closed. That is, the collections may refer to packages not in that archive. This could be useful for private hackage repos that host a small number of private packages, but also host collections that refer both to the private packages and public ones from the community central hackage. The resolution of package names is done by the clients, and some clients may be configured to union/overlay multiple repos. On the other hand, for the central community hackage it may be sensible to enforce a policy that the collections it distributes be closed (ie refer only to packages distributed via hackage). Questions: Is this sufficiently flexible to fully cover the obvious use cases? Are there any interesting use cases that are excluded? Anything else? Duncan

Hello Duncan, In my eyes, this proposal looks like some sort of generalization of Stackage; and one further use case is "special purpose" collection. My big question: how composable are these collections really? I can't put two collections with conflicting versions together (or can I? Do I union?); and is there any point to having a collection without versions in it? (If Cabal syntax is extended to support depending on collections as well as packages, yes?) The classic use-case for package collections is deployment settings, ala Stack, or even Cargo lockfiles / Bundler Gemfile.lock (versioned collections). In all these use-cases package collections are treated as non-compositional things. http://doc.crates.io/guide.html http://bundler.io/v1.7/rationale.html#checking-your-code-into-version-contro... Libraries (compositional) do NOT publish lockfiles: only executables (non-compositional) DO. Re the file format, it seems fine; suitable for the lockfile use-case and the Stackage use-case. Less sure about the unioning semantics. Edward Excerpts from Duncan Coutts's message of 2015-07-14 05:52:46 -0700:
Hi folks,
I'd like to get feedback on a spec/proposal for distributing package collections via hackage. This is currently somewhere beyond vapourware but certainly not a fait accompli and hopefully it is at an appropriate point to get feedback.
The basic idea is that package collections are: * useful (IMHO, one of the top two solutions to dependency hell, alongside nix-style package management); and * just as we distribute packages via hackage, we should also be able to easily distribute package collections.
One would then use them with tools like cabal and stack. Distributing via hackage (both in the sense of the format/protocol and in the sense of the central community hackage instance) seems natural, and allows taking advantage of much of the infrastructure we have for packages already like: * existing user accounts and management infrastructure on the hackage website * allowing anyone to host collections on their own servers, just as they can host their own package archives currently (either as static file sets or with smart servers) * low barrier for distribution, potentially encouraging more collections to be created potentially covering more use cases * security infrastructure (currently in alpha) * automatic mirroring (currently in alpha)
Two obvious examples are stackage-lts and stackage-nightly but if we lower the barrier for distribution then there may well be many more. For example, the existing Linux distros put a lot of effort into selecting and maintain package collections, and some of these collections could be distributed via hackage. In fast several Linux distributions already use Hackage's "distro" feature to advertise which versions of packages are provided by that distro. One can also imagine special-purpose collections, and there's probably cases we've not thought of yet.
Package collections are different things from packages, not like "meta packages" that one gets in some package systems. A package collection at it's simplest is just a set of source package identifiers (ie names-version pairs). Like packages, package collections have names and versions and are immutable once distributed.
The intention is that users can configure their tool to use collection(s), either by nailing down a specific collection version, or by not specifying a version it would default to the latest version of the named collection. (But the specific behaviour is up to the tool)
Use cases:
* versioned collections. For some collections the policy by which it's defined naturally uses meaningful versions. * daily collections. These can have a date-form version number imposed on them. * "live" "rolling" collections. These could have a simple monotonic increasing version with no particular meaning attached. For such collections, clients might be configured to use the latest (by not specifying a version), but it's always possible to pick a specific revision. * special-purpose collections. Not necessarily collections aiming to cover a large number of common packages, but aiming to cover some application area, or related stack of packages (e.g. some of the web frameworks). * negative collections. Collections of packages you may specifically want to avoid (e.g. deprecated by their authors, or known-broken). Using such collections would rely on clients that can be configured to treat it negatively.
Specifics:
A package collection specifies a set of source package ids (id being name-version pair). It also optionally specifies a (partial) flag assignment for any package name.
The collection does not specify how tools should treat them. That is, a collection does not specify if it should be treated as a strong or a soft constraint, inclusive or exclusive, positive or negative. Such things are completely up to the client's policy and configuration. Similarly for flag assignments, collections do not specify whether tools should interpret these as strong or soft constraints.
Syntax:
Package collection names and versions exactly follow those of package names (but they live in a different namespace). For example, "stackage-lts-2.9", or "deprecated-343" (the latter being a "rolling" collection with a meaningless monotonically increasing version).
A collection distributed in the archive format is just a text file with one entry per line, such as:
foo-1.0 foo-1.1 bar >= 3 && < 4 bar +this -that
So each line can be one of: * a simple package id * a package version range, using Cabal version range syntax * a package name with a flag assignment, + for on, - for off
The interpretation of the above is that: * both foo-1.0 and foo-1.1 are in the collection (ie union not intersection) * all versions of bar between 3 and 4 are in the collection * the package bar has flag 'this' as True, and flag 'that' as False
Of course for some collections the policy is that only one version of any package is included, but this is a policy question and the format itself does not impose this constraint.
Hackage archive format:
collection files live under a different prefix from package tarballs (but are still considered part of the archive) and are named after the collection id. The collection files are not compressed (but of course http clients and servers can negotiate transport compression). The collection files are not included in nor listed in the existing 00-index.tar.gz, but there's other json format metadata for a client to enumerate the available collections and versions. And like with package tarballs, a client that wants a specific collection version can construct the url and fetch it directly.
Security:
The hackage security system that's currently in alpha testing can easily be extended to cover collections, similarly to how it covers package tarballs.
Misc notes:
There is no requirement that a hackage-format repo containing collections be closed. That is, the collections may refer to packages not in that archive. This could be useful for private hackage repos that host a small number of private packages, but also host collections that refer both to the private packages and public ones from the community central hackage. The resolution of package names is done by the clients, and some clients may be configured to union/overlay multiple repos.
On the other hand, for the central community hackage it may be sensible to enforce a policy that the collections it distributes be closed (ie refer only to packages distributed via hackage).
Questions:
Is this sufficiently flexible to fully cover the obvious use cases? Are there any interesting use cases that are excluded?
Anything else?
Duncan

On Tue, 2015-07-14 at 12:02 -0700, Edward Z. Yang wrote:
Hello Duncan,
In my eyes, this proposal looks like some sort of generalization of Stackage; and one further use case is "special purpose" collection. My big question: how composable are these collections really? I can't put two collections with conflicting versions together (or can I? Do I union?);
You're right that some use cases of collections only make sense if clients can reasonably flexibly combine collections, e.g. set ops like union, intersection and inversion or difference. I think in principle taking collection unions or intersections makes sense. For example, in stack, as I understand it, you can add a local version of something that is also in the collection. This is of course a union. So extending "narrow" collections by unioning them with extra stuff makes sense. And suppose we had other "wide" collections that didn't nail things down to just one version, then taking an intersection with some other wide or narrow collection makes sense. So in principle, these set-like operations make sense. The code we've got in-progress for cabal-install allows exactly that. But none of that is essential to make use of the general purpose collections like stackage.
and is there any point to having a collection without versions in it?
To be clear, every collection instance has a version. It's just that for some of them -- live or rolling collections -- there is no particular meaning to the version. So this thing about not specifying a version is completely client side and doesn't affect the spec at all. It's just worth pointing out as a use case. As an example, we might have collections that are automagically defined and updated based on some property of the packages. Those are unlikely to have a meaningful version number.
(If Cabal syntax is extended to support depending on collections as well as packages, yes?)
I don't think you're confused here, but just to clarify: packages do not talk about collections. Tools like cabal/stack can be configured to use collections. And yes, where currently to configure cabal-install to use a collection you have to explicitly list every member as a constraint in a cabal.config file (see the cabal.config files distributed from the stackage website), the extension in cabal-install is to support these collections as a first class thing, by name-version or just name (and with an expression language to combine them).
The classic use-case for package collections is deployment settings, ala Stack, or even Cargo lockfiles / Bundler Gemfile.lock (versioned collections). In all these use-cases package collections are treated as non-compositional things. http://doc.crates.io/guide.html http://bundler.io/v1.7/rationale.html#checking-your-code-into-version-contro... Libraries (compositional) do NOT publish lockfiles: only executables (non-compositional) DO.
Re the file format, it seems fine; suitable for the lockfile use-case and the Stackage use-case. Less sure about the unioning semantics.
Right, collections and frozen settings are similar, and for the latter composition doesn't make a lot of sense. We have the latter already of course, in the form of "cabal freeze" cabal.config files, and similarly for stack with .yml config files (which can be based off of pre-defined collections). It's likely that cabal freeze will switch to use this collection notation as it's somewhat more intentional (than the big sets of raw constraints). Duncan

and is there any point to having a collection without versions in it? (If Cabal syntax is extended to support depending on collections as well as packages, yes?)
So I think another use-case for collections, besides "version-locked",
is sets of "blessed" packages. So we might want a collection for
"verified compatible with windows 2008" or "platform packages" or
"works with nhc" to be collections.
And in those cases, specifying unique versions doesn't seem to make
much sense. And one could imagine taking intersections -- i.e.
"uhc-compatible _and_ in stackage LTS".
In general I like this proposal as very minimal in terms of just
providing and collecting data, and allowing that data to then be used
by clients in various ways.
--gershom
On Tue, Jul 14, 2015 at 3:02 PM, Edward Z. Yang
Hello Duncan,
In my eyes, this proposal looks like some sort of generalization of Stackage; and one further use case is "special purpose" collection. My big question: how composable are these collections really? I can't put two collections with conflicting versions together (or can I? Do I union?); and is there any point to having a collection without versions in it? (If Cabal syntax is extended to support depending on collections as well as packages, yes?)
The classic use-case for package collections is deployment settings, ala Stack, or even Cargo lockfiles / Bundler Gemfile.lock (versioned collections). In all these use-cases package collections are treated as non-compositional things. http://doc.crates.io/guide.html http://bundler.io/v1.7/rationale.html#checking-your-code-into-version-contro... Libraries (compositional) do NOT publish lockfiles: only executables (non-compositional) DO.
Re the file format, it seems fine; suitable for the lockfile use-case and the Stackage use-case. Less sure about the unioning semantics.
Edward

On Tue, 2015-07-14 at 13:52 +0100, Duncan Coutts wrote:
Syntax:
Package collection names and versions exactly follow those of package names (but they live in a different namespace). For example, "stackage-lts-2.9", or "deprecated-343" (the latter being a "rolling" collection with a meaningless monotonically increasing version).
A collection distributed in the archive format is just a text file with one entry per line, such as:
foo-1.0 foo-1.1 bar >= 3 && < 4 bar +this -that
So each line can be one of: * a simple package id * a package version range, using Cabal version range syntax * a package name with a flag assignment, + for on, - for off
Oops, one thing I forgot to mention is another entry syntax: baz That is, a package name with no version or range at all. This is shorthand for the version range style with no version constraint (.cabal files have a slightly odd syntax for that, "baz -any"). This is actually useful if you want to define a negative collection, e.g. all the packages that are deprecated (as a whole, not just single versions). Duncan
participants (3)
-
Duncan Coutts
-
Edward Z. Yang
-
Gershom B