
On Thu, May 6, 2010 at 2:15 AM, Malcolm Wallace < malcolm.wallace@cs.york.ac.uk> wrote:
http://{code,community,projects}.haskell..org/ seem to be inaccessible.
Could someone please look into it?
For me, it seems to be down everyday around 5-6pm (0700-0800 UTC) which is prime hacking time for me.
Anyone know what's going on with the machine at that time?
Well, it's hosted in the USA which is somewhere around UTC-8; as such your prime hacking time is prime sleeping time for those poor old servers! Let the poor dears rest! ;-)
Unfortunately, I come from China. :-( code.haskell.org is always down in my time.
We think that the apache web server is using up the machine resources through some kind of memory leak. Our temporary solution until recently has been to automatically kill and restart apache once a day. We have now moved to restarting it every 6 hours, hoping that this will increase its availability. Please keep us informed whether this is an improvement, or whether you still see long down periods.
The last time I noticed it was down I made the following observations: * I could ssh into the machine * top didn't show any process as using ridiculous amounts of memory * CPU time was very low across all processes, essentially zero * load avg was less than 1 * I could telnet to port 80 and when I manually typed an HTTP GET request there was no response * I tried the above request to darcs.haskell.org and it immediately served a response * netstat showed lots of sockets * many of the sockets were from webcrawlers * nearly all sockets were either in SYN_RECV or CLOSE_WAIT So, at least the other day apache was accepting connections on port 80 but not properly servicing them. Because the load avg was so low I doubt it was waiting on disk IO. The interesting thing about the HTTP request I made is that it should have given an error code (meaning, no data needed to be served from a web directory other than possibly Apache's config and checking for content.) I hope you find this info useful. Jason