
On 5/22/07, Robin Green
On Tue, 22 May 2007 15:05:48 +0100 Duncan Coutts
wrote: On Tue, 2007-05-22 at 14:40 +0100, Claus Reinke wrote:
so the situation for mailing lists and online docs seems to have improved, but there is still the wiki indexing/rogue bot issue, and lots of fine tuning (together with watching the logs to spot any issues arising out of relaxing those restrictions). perhaps someone on this list would be willing to volunteer to look into those robots/indexing issues on haskell.org?-)
The main problem, and the reason for the original (temporary!) measure was bots indexing all possible diffs between old versions of wiki pages. URLs like:
http://haskell.org/haskellwiki/?title=Quicksort&diff=9608&oldid=9607
For pages with long histories this O(n^2) number of requests starts to get quite large and the wiki engine does not seem well optimised for getting arbitrary diffs. So we ended up with bots holding open many http server connections. They were not actually causing much server cpu load or generating much traffic but once the number of nearly hung connections got up to the http child process limit then we are effectively in a DOS situation.
So if we can ban bots from the page histories or turn them off for the bot user agents or something then we might have a cure. Perhaps we just need to upgrade our media wiki software or find out how other sites using this software deal with the same issue of bots reading page histories.
http://en.wikipedia.org/robots.txt
Wikipedia uses URLs starting with /w/ for "dynamic" pages (well, all pages are dynamic in a sense, but you know what I mean I hope.) And then puts /w/ in robots.txt.
Does anyone know the status of applying a workaround such as this? I really miss being able to find things on the haskell wiki via google search. I don't like the mediawiki search at all. I did a google search earlier tonight but I didn't get wiki pages so I assume nothing has been done yet. Please make the wiki indexed again as soon as possible (if at all possible). Otheriwise, I feel like it's a waste of time to keep contributing to wiki pages. Thanks, Jason