nice new hackage urls

Yay! :-)
Thu Jun 11 16:40:34 BST 2009 Ross Paterson

On Sat, 2009-06-13 at 22:50 +0000, Duncan Coutts wrote:
Yay! :-)
Thu Jun 11 16:40:34 BST 2009 Ross Paterson
* shorter package URL hunk ./Locations.hs 61 -pkgScriptURL = "/cgi-bin/hackage-scripts/package" +pkgScriptURL = "/package" And I see the old urls work too, presumably using the magic of apache url rewriting.
So, I'd like to update cabal-install to use these new URLs in the next minor release. It's good since it'll also match the URLs for the new hackage server, so it no longer needs a special case in the code in cabal-install. This made me recall that one place we do still have a special case for the current central hackage server is when we specify the upload and check POST URLs. Currently hackage uses: /cgi-bin/hackage-scripts/check-pkg /cgi-bin/hackage-scripts/protected/upload-pkg At the moment, cabal-install just hard codes these URLs when it notices that we're using the main hackage server. When we're using any other server (assumed to be the new hackage-server impl) then we use $serverroot/upload. I'm not sure this is ideal. So I suggest we find a common standard URL, eg: $serverroot/packages/upload $serverroot/packages/check and make the current server use that as an allowable alias for its existing cgi scripts, and adjust cabal-install and the new hackage server to do the same. Why these URL names? * /package/ is for the individual packages, everything under this is a package name (with optional number) * /packages/ is for resources related to the collection itself, as opposed to individual members of the collection, so things like the collection index etc. Sound sane? (In theory, "well known" URL schemes is not the right RESTful style and we should have some resource that points to the various sub-services / resources) Duncan

On Sat, Jul 04, 2009 at 06:01:13PM +0100, Duncan Coutts wrote:
This made me recall that one place we do still have a special case for the current central hackage server is when we specify the upload and check POST URLs. Currently hackage uses:
/cgi-bin/hackage-scripts/check-pkg /cgi-bin/hackage-scripts/protected/upload-pkg
At the moment, cabal-install just hard codes these URLs when it notices that we're using the main hackage server. When we're using any other server (assumed to be the new hackage-server impl) then we use $serverroot/upload. I'm not sure this is ideal.
So I suggest we find a common standard URL, eg:
$serverroot/packages/upload $serverroot/packages/check
and make the current server use that as an allowable alias for its existing cgi scripts, and adjust cabal-install and the new hackage server to do the same.
SimonM has added the aliases. Any word on when you'll be ready to take over?

On Tue, 2009-07-07 at 08:41 +0100, Ross Paterson wrote:
On Sat, Jul 04, 2009 at 06:01:13PM +0100, Duncan Coutts wrote:
This made me recall that one place we do still have a special case for the current central hackage server is when we specify the upload and check POST URLs. Currently hackage uses:
/cgi-bin/hackage-scripts/check-pkg /cgi-bin/hackage-scripts/protected/upload-pkg
At the moment, cabal-install just hard codes these URLs when it notices that we're using the main hackage server. When we're using any other server (assumed to be the new hackage-server impl) then we use $serverroot/upload. I'm not sure this is ideal.
So I suggest we find a common standard URL, eg:
$serverroot/packages/upload $serverroot/packages/check
and make the current server use that as an allowable alias for its existing cgi scripts, and adjust cabal-install and the new hackage server to do the same.
SimonM has added the aliases.
Thanks both. I now realise we need a bit more :-) So the issue is that we'd like to be able to specify a server by a single URL and be able to find everything else relative to that. Ideally we would do that by some discovery mechanism rather than hard coding relative URLs into the clients, but in the mean time... So we want all interesting URLs to be relative to one root. Currently hackage has two roots '/' and '/packages/archive/'. We find the index and package tarballs in the second and for other things we've been moving more towards the main root '/' with things like '/package/foo'. Currently in the cabal-install config file we use: remote-repo: hackage.haskell.org:http://hackage.haskell.org/packages/archive and then lookup the 00-index.tar.gz and all the package tarballs relative to that. So if we try to unify it around a single root '/' then here's my suggestion: /packages/00-index.tar.gz /package/foo-1.0.tar.gz Again, the principle is that /packages/* are attributes of the collection as a whole (indexes, stats, reports), where /package/* are elements of the collection (ie packages).
Any word on when you'll be ready to take over?
There's been more contributors recently with work on authentication which is one of the last two features needed for parity. The other being doc uploads and downloads. I hope we'll be able to polish things off at the next hackathon and get at least a testing instance running on a persistent basis, so we can start evaluating migration. Duncan

On Tue, Jul 07, 2009 at 12:57:16PM +0100, Duncan Coutts wrote:
Currently in the cabal-install config file we use: remote-repo: hackage.haskell.org:http://hackage.haskell.org/packages/archive
and then lookup the 00-index.tar.gz and all the package tarballs relative to that. So if we try to unify it around a single root '/' then here's my suggestion:
/packages/00-index.tar.gz /package/foo-1.0.tar.gz
Again, the principle is that /packages/* are attributes of the collection as a whole (indexes, stats, reports), where /package/* are elements of the collection (ie packages).
I've moved 00-index.tar.gz, pkg-list.html, recent.html and recent.rss up to /packages/, but it would be a bit of a pain to try to overlay the static package-specific data over the dynamically generated package pages.

On Tue, 2009-07-07 at 13:57 +0100, Ross Paterson wrote:
On Tue, Jul 07, 2009 at 12:57:16PM +0100, Duncan Coutts wrote:
Currently in the cabal-install config file we use: remote-repo: hackage.haskell.org:http://hackage.haskell.org/packages/archive
and then lookup the 00-index.tar.gz and all the package tarballs relative to that. So if we try to unify it around a single root '/' then here's my suggestion:
/packages/00-index.tar.gz /package/foo-1.0.tar.gz
Again, the principle is that /packages/* are attributes of the collection as a whole (indexes, stats, reports), where /package/* are elements of the collection (ie packages).
I've moved 00-index.tar.gz, pkg-list.html, recent.html and recent.rss up to /packages/,
Great.
but it would be a bit of a pain to try to overlay the static package-specific data over the dynamically generated package pages.
I think we can do it with something like the following in a suitable .htaccess file: RewriteRule ^/package/([A-Za-z0-9-]*)-([0-9.]*)\.tar\.gz$ /packages/archive/$1/$2/$1-$2.tar.gz Duncan

On Tue, Jul 07, 2009 at 06:38:44PM +0100, Duncan Coutts wrote:
I think we can do it with something like the following in a suitable .htaccess file:
RewriteRule ^/package/([A-Za-z0-9-]*)-([0-9.]*)\.tar\.gz$ /packages/archive/$1/$2/$1-$2.tar.gz
I guess it can be done, but it feels a bit wierd to have two collections of different sorts of things mixed together at the same logical olocation.

I now realise we need a bit more :-)
So the issue is that we'd like to be able to specify a server by a single URL and be able to find everything else relative to that. Ideally we would do that by some discovery mechanism rather than hard coding relative URLs into the clients, but in the mean time...
From what you've said my imagination makes me think of a page at $URL/jumptable that gives a list of hard coded services and their location on the server. For example, $URL/jumptable might return:
---------------------- package package/$pkg index packages/00-index.tar.gz upload upload/ check check/ accounts account/ -------------------------- Is this the discovery mechanism you had in mind? It allows server(s) to move things around and cabal-install like tools to learn about the locations dynamically - basically replacing the N hard coded URLs with one and requiring some parsing. Perhaps you have thought of something more elegant? Thomas

On Tue, Jul 7, 2009 at 10:00 PM, Thomas DuBuisson < thomas.dubuisson@gmail.com> wrote:
I now realise we need a bit more :-)
So the issue is that we'd like to be able to specify a server by a single URL and be able to find everything else relative to that. Ideally we would do that by some discovery mechanism rather than hard coding relative URLs into the clients, but in the mean time...
From what you've said my imagination makes me think of a page at $URL/jumptable that gives a list of hard coded services and their location on the server. For example, $URL/jumptable might return:
To be RESTfull this should just be $URL to avoid forcing servers to have a resource called jumptable. -- Johan

On Tue, 2009-07-07 at 22:52 +0200, Johan Tibell wrote:
On Tue, Jul 7, 2009 at 10:00 PM, Thomas DuBuisson
wrote: > I now realise we need a bit more :-) > > So the issue is that we'd like to be able to specify a server by a > single URL and be able to find everything else relative to that. Ideally > we would do that by some discovery mechanism rather than hard coding > relative URLs into the clients, but in the mean time... >From what you've said my imagination makes me think of a page at $URL/jumptable that gives a list of hard coded services and their location on the server. For example, $URL/jumptable might return:
To be RESTfull this should just be $URL to avoid forcing servers to have a resource called jumptable.
What do real REST designs really do in this kind of situation? For the parts of sites intended to be consumed by humans that's easy, you use index.html and that provides links humans can choose to follow. For sites where automated and somewhat-coupled clients (ie not totally generic clients like caches, web spiders etc) are expecting certain services (it is that expectation that is the coupling), how do they discover the urls for the services they are (or might be) expecting? Do people really concoct little text or xml files giving name -> url mappings? Is there some common standard format for doing that? Duncan

On Wed, Jul 8, 2009 at 3:09 AM, Duncan Coutts
On Tue, 2009-07-07 at 22:52 +0200, Johan Tibell wrote:
To be RESTfull this should just be $URL to avoid forcing servers to have a resource called jumptable.
What do real REST designs really do in this kind of situation? For the parts of sites intended to be consumed by humans that's easy, you use index.html and that provides links humans can choose to follow.
For sites where automated and somewhat-coupled clients (ie not totally generic clients like caches, web spiders etc) are expecting certain services (it is that expectation that is the coupling), how do they discover the urls for the services they are (or might be) expecting?
Do people really concoct little text or xml files giving name -> url mappings? Is there some common standard format for doing that?
I don't know of a standard format. You could indeed use XML (or perhaps JSON). By letting the server specify its URL scheme (instead of relying on out-of-band knowledge about resource locations) it can be more flexible. -- Johan

On Wed, 2009-07-08 at 14:17 +0200, Johan Tibell wrote:
Do people really concoct little text or xml files giving name -> url mappings? Is there some common standard format for doing that?
I don't know of a standard format. You could indeed use XML (or perhaps JSON). By letting the server specify its URL scheme (instead of relying on out-of-band knowledge about resource locations) it can be more flexible.
I was worried you might say that :-). If possible I'd rather not have to put and xml or json parser into cabal-install. Duncan
participants (5)
-
Duncan Coutts
-
Johan Tibell
-
Ross Paterson
-
Simon Michael
-
Thomas DuBuisson