
Hello,
I have recently been reading the source code of Cabal. I found
the index command and I found this thread. However, when I run a
recent Cabal, it seems the index command is not available:
:; /Library/Haskell/ghc-7.4.2/lib/cabal-install-1.16.0.2/bin/cabal index
cabal: unrecognised command: index (try --help)
I am curious as to what the present level of support is for
local repositories, in the form of a directory full of sdist
tarballs.
--
Jason Dusek
pgp // solidsnack // C1EBC57DC55144F35460C8DF1FD4C6C1FED18A2B
2011/5/29 Duncan Coutts
On 29 May 2011 19:46, Antoine Latter
wrote: On Sun, May 29, 2011 at 11:13 AM, Duncan Coutts
wrote: On Fri, 2010-11-19 at 11:16 -0600, Antoine Latter wrote:
I'm not sure I really understand the difference. Whether there is a difference in content/meaning or just a difference in the format.
Oh my, what an old thread. I'll try an resurrect my state of mind at the time.
Sorry :-)
I think my main concern was, as you said, a difference in format not a difference in substance. I also might have thrown in a good amount of over-engineering as well.
What it comes down to is that embedding relative URLs (or even absolute URLs) in a tar-file feels like an odd thing to do - I don't see what advantage is has over a flat text file, and I can no longer create/consume the tar-file with standard tools.
But maybe this doesn't matter - can we re-state what goals we're trying to get to, and what problems we're trying to solve? Going back into this thread I'm not even sure what I was talking about.
Hah! :-)
I'll restate my thoughts.
Are we trying to come up with a master plan of allowing cabal-install to interact with diverse sources of packages-which-may-be-installed data?
Yes.
I'm imagining the following use cases:
1. hackage.haskell.org 2. a network share/file system path with a collection of packages 3. an internet url with a collection of packages
Yes.
4. an internet url for a single package
That we can do now, because it's a single package rather than really a collection.
cabal install http://example.com/~me/foo-1.0.tar.gz
5. a tarball with a collection of packages
Yes, distributing a whole bunch of packages in a single file.
6. a tarball with a single package
We can also do that now:
cabal install ./foo-1.0.tar.gz
7. an untarred folder containing a package (as in 'cabal install' in my dev directory)
Yes.
With the ability to specify some of these in the .cabal/config or at the command line as appropriate. There's going to be some overlap between these cases, almost certainly.
Yes. The policy is up for grabs, the important point here is mechanism and format.
Am I missing any important cases? Are any of these cases unimportant?
Another impotant use case: the "cabal-dev" use case, a local unpacked package with a bunch of other local source packages, either local dirs, local or remote tarballs. This is basically when you want a special local package environment for this specific package.
A closely related and overlapping use case is having a project that consists of multiple packages, e.g. gtk2hs consists of gtk2hs-buildtools, glib, cairo, pango and gtk. Devs hacking on this want to build them all in one batch. Technically you can do this now, but it's not convenient. I'd have to say:
gtk2hs$ cabal install gtk2hs-buildtools/ glib/ cairo/ pango/ gtk/
What we want there is a simple index that contains them all and that cabal-install then uses by default when we build/install in this directory. Or something like that.
The next question would be how much effort do we require of the provider of a specific case? So for numbers 4 & 5, is the output of 'cabal sdist' good enough? For numbers 2 & 3, will I be able to just place package tgz files into a particular folder structure, or will I need to produce an index file?
For the single package cases, yes we don't need an index and we can already do these cases.
My though about the UI is that we always have an index, so no pure directory collections. I'd add a "cabal index" command with subcommands for adding, removing and listing the collection. There would be some options when you add to choose the kind of entry.
What are other folks doing? I don't know much about ruby gems. Microsoft's new 'NuGet' packages supports tossing packages in a directory and then telling Visual Studio to look there (they also support pointing the tools at an ATOM feed, which was interesting).
Ah that's interesting. I've also been thinking about incremental updates of the hackage index. I think we can do this with a tar based format.
We're not precluding cabal-install supporting a pure directory style, but having a specific collection resource is necessary in most use cases, particularly the http remote cases. If we get the UI right then we probably don't need the pure directory style since it'd just be a matter of "cp" vs "cabal index add".
Ok, you've mostly covered it, but to try and present it all in one go, here's what I think we need:
We need a way to describe collections of Cabal packages. These collections should either link to packages or include them by value. Optionally, for improved performance the .cabal file for packages can be included. The format should be usable in a REST context, that is it should support locating packages via a URL.
For each package in the index we need: * A link to the package (either tarball or local directory) OR: the package tarball by value (rather than a link) * optionally a .cabal file for the package
We need a format that has forwards compatability so that in future we allow other optional attributes/metadata for the package, e.g. digital signatures, or other collection-global information.
Using proper URLs (absolute or relative to the location of the collection itself) gives a good deal of flexibility. The current hackage archive format has implicit links which means the layout of the archive is fixed and it requires that all the packages are provided directly on the same http server. Using URLs allows a flexible archive layout and allows "shallow" or "mirror" archives that redirect to other servers for all or some packages.
In addition to hackage/archive-style use cases, the other major use case is on local machines to create special source package environments. This is just a mapping of source package id to its implementation as a source package. This is useful for multi-package projects, or building some package with special local versions of dependencies. The key distinguishing feature of these package environments is that they are local to some project directory rather than registered globally in the ~/.cabal/config.
The motivation for including package tarballs by value is that it allows distributing multi-package systems/projects as a single file, or as a convenient way of making snapshots of packages without having to stash them specially in some local directory.
My suggestion to get this kind of flexible format is to reuse and abuse the tar format. The tar format is a collection of files. We can encode different kinds of entries into file extensions.
To encode URL links my idea was to abuse the tar symlink support and say that symlinks are really just URLs. Relative links are already URLs, the abuse is to suggest allowing absolute URLs also, like http://example.com/~me/foo-1.0.tar.gz. The advantage of this approach is that each kind of entry (tarball .cabal file etc) can be either included by value as a file or included as a link. If we have to encode links as .url files then we lose that ability.
Instead of using symlinks it is also possible to add new tar entry types. Standard tools will either ignore custom types on unpacking or treat them as ordinary files. Standard tools will obviously not create custom tar entries, though they will add symlinks.
Here is an example convention for names and meanings of tar entries
1. foo-1.0.tar.gz 2. foo-1.0.cabal 3. foo-1.0
1 & 2 can be a file entry or they can be a symlink/url, while 3 can only be a symlink. For example:
* foo-1.0.tar.gz -> packages/foo/1.0/foo-1.0.tar.gz * foo-1.0.tar.gz -> htpp://code.haskell.org/~me/foo-1.0.tar.gz * foo-1.0 -> foo-1.0/ * foo-1.0 -> ../deps/foo-1.0/
The links are interpreted as ordinary URLs, possibly relative to the location of the collection itself. For example if we got this index.tar.gz from http://hackage.haskell.org/index.tar.gz then the link packages/foo-1.0.tar.gz gives us http://hackage.haskell.org/packages/foo-1.0.tar.gz
Links to directories are only valid for local cases because we do not support remote unpacked packages (because there's no reasonable way to enumerate the contents).
For these relative URLs one can use standard tar tools to construct the index. For absolute URLs it is in fact still possible by making broken symlinks that point to non-existent files like:
$ ln -s htpp://code.haskell.org/~me/foo-1.0.tar.gz foo-1.0.tar.gz
and the tar tool will happily include such broken symlinks into the tar file.
We could instead use a custom tar entry type for URLs but we would lose this ability.
For a user interface I was thinking of something along the lines of:
cabal index init [indexfile] cabal index add [indexfile] [--copy] [--link] [targets] cabal index list [indexfile] cabal index remove [indexfile] [pkgname]
The --copy and --link flags for index add are to distinguish between adding a snapshot copy of a tarball to the index or linking to the local tarball which may be updated later. We may also want to distinguish between a volatile local tarball and a stable one. In the latter case we can include a cached copy of the .cabal file. I'm not sure if there's a sensible default for --copy vs --link or whether we should force people to choose like "cabal index add-copy vs add-link".
Duncan
_______________________________________________ cabal-devel mailing list cabal-devel@haskell.org http://www.haskell.org/mailman/listinfo/cabal-devel