A little bit OT: Google as a global directory/resolver for cabalized packages?

If one tries to search Google: http://www.google.com/search?hl=en&lr=&c2coff=1&q=exposed-modules+version+filetype%3Acabal&btnG=Search it returns URLs of all Google-indexed .cabal files (currently 7- just this few?) So if one makes such a request from a program, then parses the resulting HTML extracting <a> tags containing links ending with ".cabal", it would result in a directory of .cabal files locations suitable for furter processing. Similarly, if all locations of a particular package need to be found, package name needs to be added to the query string. Using Google API might be an alternative, but it returns only up to 10 results (per their docs) which might be enough to find all mirrors of a single package, but not the whole list of available packages. Publishing in such a directory might be simple: just place one's .cabal file in a place where Google can index it. One weakness: too easy to spam. Could this be a (cheap) way to set up package name -> URL resolution? -- Dimitry Golubovsky Anywhere on the Web

Dimitry Golubovsky
If one tries to search Google:
it returns URLs of all Google-indexed .cabal files (currently 7- just this few?)
That's certainly not all of them. Just in the fptools tree we have: Cabal.cabal GLUT.cabal HGL.cabal HUnit.cabal HaXml.cabal OpenAL.cabal OpenGL.cabal QuickCheck.cabal Win32.cabal X11.cabal arrows.cabal base.cabal fgl.cabal src.cabal haskell98.cabal monadLib.cabal mtl.cabal network.cabal parsec.cabal unix.cabal
So if one makes such a request from a program, then parses the resulting HTML extracting <a> tags containing links ending with ".cabal", it would result in a directory of .cabal files locations suitable for furter processing. Similarly, if all locations of a particular package need to be found, package name needs to be added to the query string.
Using Google API might be an alternative, but it returns only up to 10 results (per their docs) which might be enough to find all mirrors of a single package, but not the whole list of available packages.
Publishing in such a directory might be simple: just place one's .cabal file in a place where Google can index it.
One weakness: too easy to spam.
We could possibly overcome that by using the cryptographic signing process similar to what I implemented for apt-secure in Debian. I've already talked to Lemmih a bit about this and he's working on it. Not sure if cabal-get really wants to troll google, though :) peace, isaac

Isaac, Isaac Jones wrote:
it returns URLs of all Google-indexed .cabal files (currently 7- just this few?)
That's certainly not all of them. Just in the fptools tree we have:
Cabal.cabal GLUT.cabal
[skip] Well, I think package maintainers might think why Google does not see them (unless hidden intentionally). I picked keywords (i. e. Cabal tags) which I believe every package should have. Unfortunately, just "filetype:cabal" does not work in Google.
Using Google API might be an alternative, but it returns only up to 10 results (per their docs) which might be enough to find all mirrors of a single package, but not the whole list of available packages.
[Commenting on myself] BTW, one of search parameters of Google API is "start: Zero-based index of the first desired result". So if we got first 10 URLs, and then set start=10, 20, 30 etc., then perhaps it will be like clicking 1 2 3 etc. on the bottom of Google results page.
One weakness: too easy to spam.
We could possibly overcome that by using the cryptographic signing process similar to what I implemented for apt-secure in Debian. I've
Spamming, I mean, putting fake .cabal files with garbage inside (but containing correct tags), which will clutter up Google search results and will make more requests necessary. PS Yahoo recently announced its own search API which seems to be more liberal on number of search requests per day, and is done via plain HTTP GET request, not WSDL. Unfortunately, Yahoo does not provide search by file suffix. Dimitry Golubovsky Middletown, CT
participants (3)
-
Dimitry Golubovsky
-
Dimitry Golubovsky
-
Isaac Jones