
On Fri, 2010-11-19 at 11:16 -0600, Antoine Latter wrote:
On Fri, Nov 19, 2010 at 7:01 AM, Duncan Coutts
wrote: On Fri, 2010-11-19 at 12:27 +0000, Duncan Coutts wrote:
Matt and I also discussed making the 00-index.tar.gz into a RESTful format by adding proper URLs for package tarballs.
Indeed we could go further and use a single general format for describing or distributing bundles of packages.
[..]
Opinions?
I'd like to restart discussion on this topic. I think it'd be really useful to have a single format worked out that covers all these cases. Otherwise we'll end up with multiple special-case formats that are less flexible overall.
It feels like an abuse of tar-files to me - if we want to have a set of meta-data about the location of resources in a package repository, I think it would be better to come up with a file format that has the information we want directly and then serve it up.
The URLs in tar symlink entries is a bit of an abuse, but using tar as a container format is perfectly reasonable (people do the same with zip all the time). We already use tar, it is extensible and is a standard format so has tools to help inspect or debug it.
This hypothetical cabal-repository.description file would be pointed at by a user's .cabal/conf, and the config file would describe either what resources the repo makes available or how to discover what resources it makes available.
You mean the description file (not the ~/.cabal/config file) would include or link to the resources that the repo makes available. In that case we're talking about the same thing, the only issue is the format of this package collection resource and what info it contains.
So for a small repo, this file could contain a listing of package ids and where the tar-ball/package descriptions are.
I think that's also what I suggested (but using the tar format).
We could even have a special case for local or file-share hosted repositories - the presence of an empty repo description file would imply that the contents of the repo is every tar, tar.gz or directory containing a .cabal file in the top level.
I'd rather not have a special case like that. We can make that use case convenient with tools that add a package to a collection.
A larger repository would point to another file which contains a collection of packages and their meta-data. One of the resources could be "here's where to find a tarball containing the package descriptions of every package I know how to serve" to support the current model of solving dependencies based. In this scenario the 'repo description' files would exactly be a REST description of the contents of Hackage Server.
Why the indirection via another file? I don't see why small vs large is important here. We just point to the package collection / index either as a local file or a URL.
It's the same information as what you'd wanted to put in the index tarball, and we might even want to make it so that the repo config file can live in the tarball and address resources in the tarball it is hosted in (so I can deply a local cabal repo by dropping a tarball into a fileshare).
I'm not quite sure I follow. You're talking about a repo being a fileshare with multiple files in a dir, or a single tarball with everything in it? Using a tarball format would indeed allow either, since the index can link to package tarballs by reference (relative or absolute URL) or include them by value.
But slipstreaming metadata into soft-links in a tarball feels weird, and since we need client changes to make it work we may as well do it right.
If you don't like the symlink idea, just use blah.url files in the tarball instead. They would contain the url as a single line of text. Or instead of a symlink or an ordinary file, a special file entry (the tar format has some file types reserved for user rather than system purposes).
Does this sort of approach sound sensible? I don't mind fleshing it out more as a start.
I'm not sure I really understand the difference. Whether there is a difference in content/meaning or just a difference in the format. Duncan