
as was pointed out on the programming reddit [1], crawling of the haskell wiki is forbidden, since http://www.haskell.org/robots.txt contains
User-agent: * Disallow: /haskellwiki/
i agree that having the wiki searchable would be preferred, but was told that there were performance issues. even giving Googlebot a wider range than other spiders won't help if, as the irc page suggests, some of those faulty bots pretend to be Googlebot..
This also applies to Haskell mailing lists as I mentioned recently: http://www.haskell.org/pipermail/haskell-cafe/2007-April/025006.html
ah, yes, sorry. there was an ongoing offlist discussion at the time, following an earlier thread on ghc-users. Simon M has since changed robots.txt to the above, which *does* permit indexing of the pipermail archives, as long as google can find them. that still doesn't mean that they'll show up first in google's ranking system. for instance, if you google for 'ghc manuals online' (that's the subject for that earlier thread i mentioned), you'll get mail-archive and nabble first, but the haskell.org archives are there as well now, as you can see by googling for 'ghc manuals online inurl:pipermail' also, the standard test of googling for 'site:haskell.org' looks a lot healthier these days. and googling for 'inurl:ghc/docs/latest LANGUAGE pragma' gives me two relevant answers (not the most specific sub-page). so the situation for mailing lists and online docs seems to have improved, but there is still the wiki indexing/rogue bot issue, and lots of fine tuning (together with watching the logs to spot any issues arising out of relaxing those restrictions). perhaps someone on this list would be willing to volunteer to look into those robots/indexing issues on haskell.org?-) claus