hackage-server: index format

Antoine Latter

19 Nov 2010 19 Nov '10

1:46 a.m.

Hi folks, The index tar-ball on Hackage has an odd naming convention. Package descriptions are given paths of the form: ./$pkg/$version/$pkg.cabal including the leading "./". I'm guessing that this is done as a method of distinguishing non-package meta-data. Is this a convention we need to preserve? Thanks, Antoine

Show replies by date

Duncan Coutts

19 Nov 19 Nov

12:27 p.m.

On Thu, 2010-11-18 at 19:46 -0600, Antoine Latter wrote:

...

Hi folks,

The index tar-ball on Hackage has an odd naming convention. Package descriptions are given paths of the form:

./$pkg/$version/$pkg.cabal

including the leading "./". I'm guessing that this is done as a method of distinguishing non-package meta-data.

Is this a convention we need to preserve?

The .cabal extension is essential. Tools are required to ignore file extensions they do not understand. This provides a bit of forwards compatibility. In theory the file path should not be significant. However the current cabal-install code does rely on the name and version directories. It uses this to find the package id without having to parse the .cabal file. This is bad and fragile. But basically you cannot change that layout for the moment. I would like to move to a model where the file name may be meaningful but the path is not significant. I would also like to make the proper way to find the package id be to parse the file. I'd like to change cabal-install so that it generates it's own fast cache on each "cabal update", rather than reading the index.tar every time. This would mean we could pay the expense of parsing all the .cabal files and thus could do it properly. Matt and I also discussed making the 00-index.tar.gz into a RESTful format by adding proper URLs for package tarballs. Currently clients have to know the URL structure of the server: given a package Id taken from the index they construct a URL $root/pkg-ver/pkg-ver.tar.gz. As we all know, forcing clients to construct URLs is bad (inflexible etc etc). To extend the format to contain URLs we were thinking of making use of the tar format's support for symlinks. The symlink content can be interpreted as a URL, either relative or absolute, e.g.: foo-1.0.tar.gz -> /package/foo-1.0/foo-1.0.tar.gz or foo-1.0.tar.gz -> http://hackage.haskell.org/package/foo-1.0/foo-1.0.tar.gz That is, the index contains a bunch of cabal files, and also a bunch of .tar.gz symlinks. Like URLs in html these are interpreted relative to the URL of the index.tar.gz itself. So if we got the index.tar.gz from say: http://hackage.haskell.org/index.tar.gz then a relative URL like /package/foo-1.0/foo-1.0.tar.gz is interpreted as http://hackage.haskell.org/package/foo-1.0/foo-1.0.tar.gz This is totally standard URL convention, only odd thing is using tarball symlinks as URLs, though it seems like a pretty natural generalisation. It works fine if you unpack the tarball with ordinary tar programs, it just makes broken symlinks. So note that the name of the tarball entry "foo-1.0.tar.gz" is significant, beyond the fact of the extension. The name "foo-1.0" is significant as it is the key in the package Id -> url mapping. Duncan

Duncan Coutts

1:01 p.m.

On Fri, 2010-11-19 at 12:27 +0000, Duncan Coutts wrote:

...

Matt and I also discussed making the 00-index.tar.gz into a RESTful format by adding proper URLs for package tarballs.

Indeed we could go further and use a single general format for describing or distributing bundles of packages. Use case: local build trees --------------------------- A bunch of related packages (e.g. gtk2hs, happstack-* etc) unpacked locally. /home/me/prgs/myproj/foo/ --top of source tree for foo /home/me/prgs/myproj/foo/foo.cabal /home/me/prgs/myproj/bar/ /home/me/prgs/myproj/bar/bar.cabal Now we can have an index.tar containing symlinks to .cabal files! /home/me/prgs/myproj/index.tar: containing foo.cabal -> foo/foo.cabal bar.cabal -> bar/bar.cabal So these are not copies of the .cabal files, these really are symlinks to the local .cabal files (but inside the tarball). I guess we need some extra index entry to point to the location of the source tree, though it's not a .tar.gz kind. Now just as we can have symlinks (or really URLs) inside the tarball, we could also have full file contents there too. Next use case... Use case: distribution bundles ------------------------------ Shipping a bunch of source packages as a single file some-name.tar: containing foo.cabal foo-1.0.tar.gz bar.cabal bar-1.0.tar.gz So now instead of symlinks/URLs to separate tarballs, the whole file contents is right there. We have a hackage-like index plus the file tarballs. We might have to have a different naming convention than simply blah.tar for these indexes, otherwise cabal install might not know how to interpret "cabal install foo.tar" should it interpret foo.tar as an index or as a single package? Opinions? Duncan

Tillmann Rendel

1:44 p.m.

Duncan Coutts wrote:

...

[...] symlinks [...]

Opinions?

How would this interact with the absence of symlinks on Windows? Tillmann

Lars Viklund

2:07 p.m.

On Fri, Nov 19, 2010 at 02:44:39PM +0100, Tillmann Rendel wrote:

...

Duncan Coutts wrote:

...
[...] symlinks [...]

How would this interact with the absence of symlinks on Windows?

Note that NTFS has supported all kinds of links, sym- and hard-, since Vista and up, so I guess you're referring to exotic filesystems like FAT32, or slow-to-adopt environments. -- Lars Viklund | zao@acc.umu.se

Duncan Coutts

5:26 p.m.

On Fri, 2010-11-19 at 14:44 +0100, Tillmann Rendel wrote:

...

Duncan Coutts wrote:

...
[...] symlinks [...]

Opinions?

How would this interact with the absence of symlinks on Windows?

Not a problem at all. The index tarballs are never unpacked to files on disk. We read the tar file directly using the tar package. Duncan

Antoine Latter

5:16 p.m.

On Fri, Nov 19, 2010 at 7:01 AM, Duncan Coutts wrote:

...

On Fri, 2010-11-19 at 12:27 +0000, Duncan Coutts wrote:

...
Matt and I also discussed making the 00-index.tar.gz into a RESTful format by adding proper URLs for package tarballs.

Indeed we could go further and use a single general format for describing or distributing bundles of packages.

Use case: local build trees ---------------------------

A bunch of related packages (e.g. gtk2hs, happstack-* etc) unpacked locally.

/home/me/prgs/myproj/foo/ --top of source tree for foo /home/me/prgs/myproj/foo/foo.cabal /home/me/prgs/myproj/bar/ /home/me/prgs/myproj/bar/bar.cabal

Now we can have an index.tar containing symlinks to .cabal files!

/home/me/prgs/myproj/index.tar: containing foo.cabal -> foo/foo.cabal bar.cabal -> bar/bar.cabal

So these are not copies of the .cabal files, these really are symlinks to the local .cabal files (but inside the tarball). I guess we need some extra index entry to point to the location of the source tree, though it's not a .tar.gz kind.

Now just as we can have symlinks (or really URLs) inside the tarball, we could also have full file contents there too. Next use case...

Use case: distribution bundles ------------------------------

Shipping a bunch of source packages as a single file

some-name.tar: containing foo.cabal foo-1.0.tar.gz bar.cabal bar-1.0.tar.gz

So now instead of symlinks/URLs to separate tarballs, the whole file contents is right there. We have a hackage-like index plus the file tarballs.

We might have to have a different naming convention than simply blah.tar for these indexes, otherwise cabal install might not know how to interpret "cabal install foo.tar" should it interpret foo.tar as an index or as a single package?

Opinions?

It feels like an abuse of tar-files to me - if we want to have a set of meta-data about the location of resources in a package repository, I think it would be better to come up with a file format that has the information we want directly and then serve it up. This hypothetical cabal-repository.description file would be pointed at by a user's .cabal/conf, and the config file would describe either what resources the repo makes available or how to discover what resources it makes available. So for a small repo, this file could contain a listing of package ids and where the tar-ball/package descriptions are. We could even have a special case for local or file-share hosted repositories - the presence of an empty repo description file would imply that the contents of the repo is every tar, tar.gz or directory containing a .cabal file in the top level. A larger repository would point to another file which contains a collection of packages and their meta-data. One of the resources could be "here's where to find a tarball containing the package descriptions of every package I know how to serve" to support the current model of solving dependencies based. In this scenario the 'repo description' files would exactly be a REST description of the contents of Hackage Server. It's the same information as what you'd wanted to put in the index tarball, and we might even want to make it so that the repo config file can live in the tarball and address resources in the tarball it is hosted in (so I can deply a local cabal repo by dropping a tarball into a fileshare). But slipstreaming metadata into soft-links in a tarball feels weird, and since we need client changes to make it work we may as well do it right. Does this sort of approach sound sensible? I don't mind fleshing it out more as a start. Antoine

Duncan Coutts

29 May 29 May

4:13 p.m.

On Fri, 2010-11-19 at 11:16 -0600, Antoine Latter wrote:

...

On Fri, Nov 19, 2010 at 7:01 AM, Duncan Coutts wrote:

...
On Fri, 2010-11-19 at 12:27 +0000, Duncan Coutts wrote:

...
Matt and I also discussed making the 00-index.tar.gz into a RESTful format by adding proper URLs for package tarballs.

Indeed we could go further and use a single general format for describing or distributing bundles of packages.

[..]

...

...
Opinions?

I'd like to restart discussion on this topic. I think it'd be really useful to have a single format worked out that covers all these cases. Otherwise we'll end up with multiple special-case formats that are less flexible overall.

...

It feels like an abuse of tar-files to me - if we want to have a set of meta-data about the location of resources in a package repository, I think it would be better to come up with a file format that has the information we want directly and then serve it up.

The URLs in tar symlink entries is a bit of an abuse, but using tar as a container format is perfectly reasonable (people do the same with zip all the time). We already use tar, it is extensible and is a standard format so has tools to help inspect or debug it.

...

This hypothetical cabal-repository.description file would be pointed at by a user's .cabal/conf, and the config file would describe either what resources the repo makes available or how to discover what resources it makes available.

You mean the description file (not the ~/.cabal/config file) would include or link to the resources that the repo makes available. In that case we're talking about the same thing, the only issue is the format of this package collection resource and what info it contains.

...

So for a small repo, this file could contain a listing of package ids and where the tar-ball/package descriptions are.

I think that's also what I suggested (but using the tar format).

...

We could even have a special case for local or file-share hosted repositories - the presence of an empty repo description file would imply that the contents of the repo is every tar, tar.gz or directory containing a .cabal file in the top level.

I'd rather not have a special case like that. We can make that use case convenient with tools that add a package to a collection.

...

A larger repository would point to another file which contains a collection of packages and their meta-data. One of the resources could be "here's where to find a tarball containing the package descriptions of every package I know how to serve" to support the current model of solving dependencies based. In this scenario the 'repo description' files would exactly be a REST description of the contents of Hackage Server.

Why the indirection via another file? I don't see why small vs large is important here. We just point to the package collection / index either as a local file or a URL.

...

It's the same information as what you'd wanted to put in the index tarball, and we might even want to make it so that the repo config file can live in the tarball and address resources in the tarball it is hosted in (so I can deply a local cabal repo by dropping a tarball into a fileshare).

I'm not quite sure I follow. You're talking about a repo being a fileshare with multiple files in a dir, or a single tarball with everything in it? Using a tarball format would indeed allow either, since the index can link to package tarballs by reference (relative or absolute URL) or include them by value.

...

But slipstreaming metadata into soft-links in a tarball feels weird, and since we need client changes to make it work we may as well do it right.

If you don't like the symlink idea, just use blah.url files in the tarball instead. They would contain the url as a single line of text. Or instead of a symlink or an ordinary file, a special file entry (the tar format has some file types reserved for user rather than system purposes).

...

Does this sort of approach sound sensible? I don't mind fleshing it out more as a start.

I'm not sure I really understand the difference. Whether there is a difference in content/meaning or just a difference in the format. Duncan

Antoine Latter

6:46 p.m.

On Sun, May 29, 2011 at 11:13 AM, Duncan Coutts wrote:

...

On Fri, 2010-11-19 at 11:16 -0600, Antoine Latter wrote:

...

I'm not sure I really understand the difference. Whether there is a difference in content/meaning or just a difference in the format.

Oh my, what an old thread. I'll try an resurrect my state of mind at the time. I think my main concern was, as you said, a difference in format not a difference in substance. I also might have thrown in a good amount of over-engineering as well. What it comes down to is that embedding relative URLs (or even absolute URLs) in a tar-file feels like an odd thing to do - I don't see what advantage is has over a flat text file, and I can no longer create/consume the tar-file with standard tools. But maybe this doesn't matter - can we re-state what goals we're trying to get to, and what problems we're trying to solve? Going back into this thread I'm not even sure what I was talking about. Are we trying to come up with a master plan of allowing cabal-install to interact with diverse sources of packages-which-may-be-installed data? I'm imagining the following use cases: 1. hackage.haskell.org 2. a network share/file system path with a collection of packages 3. an internet url with a collection of packages 4. an internet url for a single package 5. a tarball with a collection of packages 6. a tarball with a single package 7. an untarred folder containing a package (as in 'cabal install' in my dev directory) With the ability to specify some of these in the .cabal/config or at the command line as appropriate. There's going to be some overlap between these cases, almost certainly. Am I missing any important cases? Are any of these cases unimportant? The next question would be how much effort do we require of the provider of a specific case? So for numbers 4 & 5, is the output of 'cabal sdist' good enough? For numbers 2 & 3, will I be able to just place package tgz files into a particular folder structure, or will I need to produce an index file? What are other folks doing? I don't know much about ruby gems. Microsoft's new 'NuGet' packages supports tossing packages in a directory and then telling Visual Studio to look there (they also support pointing the tools at an ATOM feed, which was interesting). Antoine

...

Duncan

Duncan Coutts

8:41 p.m.

On 29 May 2011 19:46, Antoine Latter wrote:

...

On Sun, May 29, 2011 at 11:13 AM, Duncan Coutts wrote:

...
On Fri, 2010-11-19 at 11:16 -0600, Antoine Latter wrote:

...
I'm not sure I really understand the difference. Whether there is a difference in content/meaning or just a difference in the format.

Oh my, what an old thread. I'll try an resurrect my state of mind at the time.

Sorry :-)

...

I think my main concern was, as you said, a difference in format not a difference in substance. I also might have thrown in a good amount of over-engineering as well.

What it comes down to is that embedding relative URLs (or even absolute URLs) in a tar-file feels like an odd thing to do - I don't see what advantage is has over a flat text file, and I can no longer create/consume the tar-file with standard tools.

But maybe this doesn't matter - can we re-state what goals we're trying to get to, and what problems we're trying to solve? Going back into this thread I'm not even sure what I was talking about.

Hah! :-) I'll restate my thoughts.

...

Are we trying to come up with a master plan of allowing cabal-install to interact with diverse sources of packages-which-may-be-installed data?

Yes.

...

I'm imagining the following use cases:

1. hackage.haskell.org 2. a network share/file system path with a collection of packages 3. an internet url with a collection of packages

Yes.

...

4. an internet url for a single package

That we can do now, because it's a single package rather than really a collection. cabal install http://example.com/~me/foo-1.0.tar.gz

...

5. a tarball with a collection of packages

Yes, distributing a whole bunch of packages in a single file.

...

6. a tarball with a single package

We can also do that now: cabal install ./foo-1.0.tar.gz

...

7. an untarred folder containing a package (as in 'cabal install' in my dev directory)

Yes.

...

With the ability to specify some of these in the .cabal/config or at the command line as appropriate. There's going to be some overlap between these cases, almost certainly.

Yes. The policy is up for grabs, the important point here is mechanism and format.

...

Am I missing any important cases? Are any of these cases unimportant?

Another impotant use case: the "cabal-dev" use case, a local unpacked package with a bunch of other local source packages, either local dirs, local or remote tarballs. This is basically when you want a special local package environment for this specific package. A closely related and overlapping use case is having a project that consists of multiple packages, e.g. gtk2hs consists of gtk2hs-buildtools, glib, cairo, pango and gtk. Devs hacking on this want to build them all in one batch. Technically you can do this now, but it's not convenient. I'd have to say: gtk2hs$ cabal install gtk2hs-buildtools/ glib/ cairo/ pango/ gtk/ What we want there is a simple index that contains them all and that cabal-install then uses by default when we build/install in this directory. Or something like that.

...

The next question would be how much effort do we require of the provider of a specific case? So for numbers 4 & 5, is the output of 'cabal sdist' good enough? For numbers 2 & 3, will I be able to just place package tgz files into a particular folder structure, or will I need to produce an index file?

For the single package cases, yes we don't need an index and we can already do these cases. My though about the UI is that we always have an index, so no pure directory collections. I'd add a "cabal index" command with subcommands for adding, removing and listing the collection. There would be some options when you add to choose the kind of entry.

...

What are other folks doing? I don't know much about ruby gems. Microsoft's new 'NuGet' packages supports tossing packages in a directory and then telling Visual Studio to look there (they also support pointing the tools at an ATOM feed, which was interesting).

Ah that's interesting. I've also been thinking about incremental updates of the hackage index. I think we can do this with a tar based format. We're not precluding cabal-install supporting a pure directory style, but having a specific collection resource is necessary in most use cases, particularly the http remote cases. If we get the UI right then we probably don't need the pure directory style since it'd just be a matter of "cp" vs "cabal index add". Ok, you've mostly covered it, but to try and present it all in one go, here's what I think we need: We need a way to describe collections of Cabal packages. These collections should either link to packages or include them by value. Optionally, for improved performance the .cabal file for packages can be included. The format should be usable in a REST context, that is it should support locating packages via a URL. For each package in the index we need: * A link to the package (either tarball or local directory) OR: the package tarball by value (rather than a link) * optionally a .cabal file for the package We need a format that has forwards compatability so that in future we allow other optional attributes/metadata for the package, e.g. digital signatures, or other collection-global information. Using proper URLs (absolute or relative to the location of the collection itself) gives a good deal of flexibility. The current hackage archive format has implicit links which means the layout of the archive is fixed and it requires that all the packages are provided directly on the same http server. Using URLs allows a flexible archive layout and allows "shallow" or "mirror" archives that redirect to other servers for all or some packages. In addition to hackage/archive-style use cases, the other major use case is on local machines to create special source package environments. This is just a mapping of source package id to its implementation as a source package. This is useful for multi-package projects, or building some package with special local versions of dependencies. The key distinguishing feature of these package environments is that they are local to some project directory rather than registered globally in the ~/.cabal/config. The motivation for including package tarballs by value is that it allows distributing multi-package systems/projects as a single file, or as a convenient way of making snapshots of packages without having to stash them specially in some local directory. My suggestion to get this kind of flexible format is to reuse and abuse the tar format. The tar format is a collection of files. We can encode different kinds of entries into file extensions. To encode URL links my idea was to abuse the tar symlink support and say that symlinks are really just URLs. Relative links are already URLs, the abuse is to suggest allowing absolute URLs also, like http://example.com/~me/foo-1.0.tar.gz. The advantage of this approach is that each kind of entry (tarball .cabal file etc) can be either included by value as a file or included as a link. If we have to encode links as .url files then we lose that ability. Instead of using symlinks it is also possible to add new tar entry types. Standard tools will either ignore custom types on unpacking or treat them as ordinary files. Standard tools will obviously not create custom tar entries, though they will add symlinks. Here is an example convention for names and meanings of tar entries 1. foo-1.0.tar.gz 2. foo-1.0.cabal 3. foo-1.0 1 & 2 can be a file entry or they can be a symlink/url, while 3 can only be a symlink. For example: * foo-1.0.tar.gz -> packages/foo/1.0/foo-1.0.tar.gz * foo-1.0.tar.gz -> htpp://code.haskell.org/~me/foo-1.0.tar.gz * foo-1.0 -> foo-1.0/ * foo-1.0 -> ../deps/foo-1.0/ The links are interpreted as ordinary URLs, possibly relative to the location of the collection itself. For example if we got this index.tar.gz from http://hackage.haskell.org/index.tar.gz then the link packages/foo-1.0.tar.gz gives us http://hackage.haskell.org/packages/foo-1.0.tar.gz Links to directories are only valid for local cases because we do not support remote unpacked packages (because there's no reasonable way to enumerate the contents). For these relative URLs one can use standard tar tools to construct the index. For absolute URLs it is in fact still possible by making broken symlinks that point to non-existent files like: $ ln -s htpp://code.haskell.org/~me/foo-1.0.tar.gz foo-1.0.tar.gz and the tar tool will happily include such broken symlinks into the tar file. We could instead use a custom tar entry type for URLs but we would lose this ability. For a user interface I was thinking of something along the lines of: cabal index init [indexfile] cabal index add [indexfile] [--copy] [--link] [targets] cabal index list [indexfile] cabal index remove [indexfile] [pkgname] The --copy and --link flags for index add are to distinguish between adding a snapshot copy of a tarball to the index or linking to the local tarball which may be updated later. We may also want to distinguish between a volatile local tarball and a stable one. In the latter case we can include a cached copy of the .cabal file. I'm not sure if there's a sensible default for --copy vs --link or whether we should force people to choose like "cabal index add-copy vs add-link". Duncan

Jason Dusek

20 Dec 20 Dec

4:54 p.m.

Hello, I have recently been reading the source code of Cabal. I found the index command and I found this thread. However, when I run a recent Cabal, it seems the index command is not available: :; /Library/Haskell/ghc-7.4.2/lib/cabal-install-1.16.0.2/bin/cabal index cabal: unrecognised command: index (try --help) I am curious as to what the present level of support is for local repositories, in the form of a directory full of sdist tarballs. -- Jason Dusek pgp // solidsnack // C1EBC57DC55144F35460C8DF1FD4C6C1FED18A2B 2011/5/29 Duncan Coutts :

...

On 29 May 2011 19:46, Antoine Latter wrote:

...
On Sun, May 29, 2011 at 11:13 AM, Duncan Coutts wrote:

...
On Fri, 2010-11-19 at 11:16 -0600, Antoine Latter wrote:

...
I'm not sure I really understand the difference. Whether there is a difference in content/meaning or just a difference in the format.

Oh my, what an old thread. I'll try an resurrect my state of mind at the time.

Sorry :-)

...
I think my main concern was, as you said, a difference in format not a difference in substance. I also might have thrown in a good amount of over-engineering as well.

What it comes down to is that embedding relative URLs (or even absolute URLs) in a tar-file feels like an odd thing to do - I don't see what advantage is has over a flat text file, and I can no longer create/consume the tar-file with standard tools.

But maybe this doesn't matter - can we re-state what goals we're trying to get to, and what problems we're trying to solve? Going back into this thread I'm not even sure what I was talking about.

Hah! :-)

I'll restate my thoughts.

...
Are we trying to come up with a master plan of allowing cabal-install to interact with diverse sources of packages-which-may-be-installed data?

Yes.

...
I'm imagining the following use cases:

1. hackage.haskell.org 2. a network share/file system path with a collection of packages 3. an internet url with a collection of packages

Yes.

...
4. an internet url for a single package

That we can do now, because it's a single package rather than really a collection.

cabal install http://example.com/~me/foo-1.0.tar.gz

...
5. a tarball with a collection of packages

Yes, distributing a whole bunch of packages in a single file.

...
6. a tarball with a single package

We can also do that now:

cabal install ./foo-1.0.tar.gz

...
7. an untarred folder containing a package (as in 'cabal install' in my dev directory)

Yes.

...
With the ability to specify some of these in the .cabal/config or at the command line as appropriate. There's going to be some overlap between these cases, almost certainly.

Yes. The policy is up for grabs, the important point here is mechanism and format.

...
Am I missing any important cases? Are any of these cases unimportant?

Another impotant use case: the "cabal-dev" use case, a local unpacked package with a bunch of other local source packages, either local dirs, local or remote tarballs. This is basically when you want a special local package environment for this specific package.

A closely related and overlapping use case is having a project that consists of multiple packages, e.g. gtk2hs consists of gtk2hs-buildtools, glib, cairo, pango and gtk. Devs hacking on this want to build them all in one batch. Technically you can do this now, but it's not convenient. I'd have to say:

gtk2hs$ cabal install gtk2hs-buildtools/ glib/ cairo/ pango/ gtk/

What we want there is a simple index that contains them all and that cabal-install then uses by default when we build/install in this directory. Or something like that.

...
The next question would be how much effort do we require of the provider of a specific case? So for numbers 4 & 5, is the output of 'cabal sdist' good enough? For numbers 2 & 3, will I be able to just place package tgz files into a particular folder structure, or will I need to produce an index file?

For the single package cases, yes we don't need an index and we can already do these cases.

My though about the UI is that we always have an index, so no pure directory collections. I'd add a "cabal index" command with subcommands for adding, removing and listing the collection. There would be some options when you add to choose the kind of entry.

...
What are other folks doing? I don't know much about ruby gems. Microsoft's new 'NuGet' packages supports tossing packages in a directory and then telling Visual Studio to look there (they also support pointing the tools at an ATOM feed, which was interesting).

Ah that's interesting. I've also been thinking about incremental updates of the hackage index. I think we can do this with a tar based format.

We're not precluding cabal-install supporting a pure directory style, but having a specific collection resource is necessary in most use cases, particularly the http remote cases. If we get the UI right then we probably don't need the pure directory style since it'd just be a matter of "cp" vs "cabal index add".

Ok, you've mostly covered it, but to try and present it all in one go, here's what I think we need:

We need a way to describe collections of Cabal packages. These collections should either link to packages or include them by value. Optionally, for improved performance the .cabal file for packages can be included. The format should be usable in a REST context, that is it should support locating packages via a URL.

For each package in the index we need: * A link to the package (either tarball or local directory) OR: the package tarball by value (rather than a link) * optionally a .cabal file for the package

We need a format that has forwards compatability so that in future we allow other optional attributes/metadata for the package, e.g. digital signatures, or other collection-global information.

Using proper URLs (absolute or relative to the location of the collection itself) gives a good deal of flexibility. The current hackage archive format has implicit links which means the layout of the archive is fixed and it requires that all the packages are provided directly on the same http server. Using URLs allows a flexible archive layout and allows "shallow" or "mirror" archives that redirect to other servers for all or some packages.

In addition to hackage/archive-style use cases, the other major use case is on local machines to create special source package environments. This is just a mapping of source package id to its implementation as a source package. This is useful for multi-package projects, or building some package with special local versions of dependencies. The key distinguishing feature of these package environments is that they are local to some project directory rather than registered globally in the ~/.cabal/config.

The motivation for including package tarballs by value is that it allows distributing multi-package systems/projects as a single file, or as a convenient way of making snapshots of packages without having to stash them specially in some local directory.

My suggestion to get this kind of flexible format is to reuse and abuse the tar format. The tar format is a collection of files. We can encode different kinds of entries into file extensions.

To encode URL links my idea was to abuse the tar symlink support and say that symlinks are really just URLs. Relative links are already URLs, the abuse is to suggest allowing absolute URLs also, like http://example.com/~me/foo-1.0.tar.gz. The advantage of this approach is that each kind of entry (tarball .cabal file etc) can be either included by value as a file or included as a link. If we have to encode links as .url files then we lose that ability.

Instead of using symlinks it is also possible to add new tar entry types. Standard tools will either ignore custom types on unpacking or treat them as ordinary files. Standard tools will obviously not create custom tar entries, though they will add symlinks.

Here is an example convention for names and meanings of tar entries

1. foo-1.0.tar.gz 2. foo-1.0.cabal 3. foo-1.0

1 & 2 can be a file entry or they can be a symlink/url, while 3 can only be a symlink. For example:

* foo-1.0.tar.gz -> packages/foo/1.0/foo-1.0.tar.gz * foo-1.0.tar.gz -> htpp://code.haskell.org/~me/foo-1.0.tar.gz * foo-1.0 -> foo-1.0/ * foo-1.0 -> ../deps/foo-1.0/

The links are interpreted as ordinary URLs, possibly relative to the location of the collection itself. For example if we got this index.tar.gz from http://hackage.haskell.org/index.tar.gz then the link packages/foo-1.0.tar.gz gives us http://hackage.haskell.org/packages/foo-1.0.tar.gz

Links to directories are only valid for local cases because we do not support remote unpacked packages (because there's no reasonable way to enumerate the contents).

For these relative URLs one can use standard tar tools to construct the index. For absolute URLs it is in fact still possible by making broken symlinks that point to non-existent files like:

$ ln -s htpp://code.haskell.org/~me/foo-1.0.tar.gz foo-1.0.tar.gz

and the tar tool will happily include such broken symlinks into the tar file.

We could instead use a custom tar entry type for URLs but we would lose this ability.

For a user interface I was thinking of something along the lines of:

cabal index init [indexfile] cabal index add [indexfile] [--copy] [--link] [targets] cabal index list [indexfile] cabal index remove [indexfile] [pkgname]

The --copy and --link flags for index add are to distinguish between adding a snapshot copy of a tarball to the index or linking to the local tarball which may be updated later. We may also want to distinguish between a volatile local tarball and a stable one. In the latter case we can include a cached copy of the .cabal file. I'm not sure if there's a sensible default for --copy vs --link or whether we should force people to choose like "cabal index add-copy vs add-link".

Duncan

_______________________________________________ cabal-devel mailing list cabal-devel@haskell.org http://www.haskell.org/mailman/listinfo/cabal-devel

Ross Paterson

19 Nov 19 Nov

1:46 p.m.

On Thu, Nov 18, 2010 at 07:46:33PM -0600, Antoine Latter wrote:

...

The index tar-ball on Hackage has an odd naming convention. Package descriptions are given paths of the form:

./$pkg/$version/$pkg.cabal

including the leading "./". I'm guessing that this is done as a method of distinguishing non-package meta-data.

Is this a convention we need to preserve?

I've removed the leading "./"; let's see if it breaks anything.

Duncan Coutts

5:31 p.m.

On Fri, 2010-11-19 at 13:46 +0000, Ross Paterson wrote:

...

On Thu, Nov 18, 2010 at 07:46:33PM -0600, Antoine Latter wrote:

...
The index tar-ball on Hackage has an odd naming convention. Package descriptions are given paths of the form:

./$pkg/$version/$pkg.cabal

including the leading "./". I'm guessing that this is done as a method of distinguishing non-package meta-data.

Is this a convention we need to preserve?

I've removed the leading "./"; let's see if it breaks anything.

I expect it'll be fine. cabal-install uses: case Tar.entryContent entry of Tar.NormalFile content _ | takeExtension fileName == ".cabal" -> case splitDirectories (normalise fileName) of [pkgname,vers,_] -> and splitDirectories (normalise "./$pkg/$version/$pkg.cabal") = ["$pkg","$version","$pkg.cabal"] Duncan

4682

Age (days ago)

5444

Last active (days ago)

List overview

Download

12 comments

6 participants

participants (6)

Antoine Latter
Duncan Coutts
Jason Dusek
Lars Viklund
Ross Paterson
Tillmann Rendel