
Jason Dagit wrote:
On 5/22/07, Robin Green
wrote: On Tue, 22 May 2007 15:05:48 +0100 Duncan Coutts
wrote: On Tue, 2007-05-22 at 14:40 +0100, Claus Reinke wrote:
so the situation for mailing lists and online docs seems to have improved, but there is still the wiki indexing/rogue bot issue, and lots of fine tuning (together with watching the logs to spot any issues arising out of relaxing those restrictions). perhaps someone on this list would be willing to volunteer to look into those robots/indexing issues on haskell.org?-)
The main problem, and the reason for the original (temporary!) measure was bots indexing all possible diffs between old versions of wiki pages. URLs like:
http://haskell.org/haskellwiki/?title=Quicksort&diff=9608&oldid=9607
For pages with long histories this O(n^2) number of requests starts to get quite large and the wiki engine does not seem well optimised for getting arbitrary diffs. So we ended up with bots holding open many http server connections. They were not actually causing much server cpu load or generating much traffic but once the number of nearly hung connections got up to the http child process limit then we are effectively in a DOS situation.
So if we can ban bots from the page histories or turn them off for the bot user agents or something then we might have a cure. Perhaps we just need to upgrade our media wiki software or find out how other sites using this software deal with the same issue of bots reading page histories.
http://en.wikipedia.org/robots.txt
Wikipedia uses URLs starting with /w/ for "dynamic" pages (well, all pages are dynamic in a sense, but you know what I mean I hope.) And then puts /w/ in robots.txt.
Does anyone know the status of applying a workaround such as this? I really miss being able to find things on the haskell wiki via google search. I don't like the mediawiki search at all.
The status is that nobody has stepped up and volunteered to look after haskell.org's robots.txt file. It needs someone with the time and experience to look into what needs doing, make the changes, fix problems as the arise, and update it as necessary in the future. Anyone? Cheers, Simon