nun.haskell.org http services down?

http://{code,community,projects}.haskell..org/ seem to be inaccessible. Could someone please look into it? Thanks, Jens

Why does this happen so often? Broken hardware, software crash,
bandwidth overuse, etc.? I have 200GB of bandwidth/month on the
tryhaskell.org server. It's not much but hopefully I can make a
Hackage mirror out of it one weekend for when the main server goes
down.
On 5 May 2010 09:05, Jens Petersen
http://{code,community,projects}.haskell..org/ seem to be inaccessible.
Could someone please look into it?
Thanks, Jens _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

I think it would be nice in general to be able to mirror at least hackage.haskell.org. Something like rsync would be close to ideal for this purpose. Reasons I would like to mirror hackage: 1 - Provide alternative when the main hackage is down 2 - Access to the sources of all uploaded packages for analysis purposes

On Wed, 5 May 2010, Roel van Dijk wrote:
I think it would be nice in general to be able to mirror at least hackage.haskell.org. Something like rsync would be close to ideal for this purpose.
Reasons I would like to mirror hackage: 1 - Provide alternative when the main hackage is down 2 - Access to the sources of all uploaded packages for analysis purposes
It would be also interesting to have alternative upload servers. That's certainly more complicated, but I know that the AmiNet manages this problem, e.g.: http://de.aminet.net/ The secondary upload servers queue the uploads and forward it to the main server, when it is available.

Jens Petersen wrote:
http://{code,community,projects}.haskell..org/ seem to be inaccessible.
Could someone please look into it?
For me, it seems to be down everyday around 5-6pm (0700-0800 UTC) which is prime hacking time for me. Anyone know what's going on with the machine at that time? Erik -- ---------------------------------------------------------------------- Erik de Castro Lopo http://www.mega-nerd.com/

Erik de Castro Lopo
Jens Petersen wrote:
http://{code,community,projects}.haskell..org/ seem to be inaccessible.
Could someone please look into it?
For me, it seems to be down everyday around 5-6pm (0700-0800 UTC) which is prime hacking time for me.
Anyone know what's going on with the machine at that time?
Well, it's hosted in the USA which is somewhere around UTC-8; as such your prime hacking time is prime sleeping time for those poor old servers! Let the poor dears rest! ;-) -- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com IvanMiljenovic.wordpress.com

Ivan Lazar Miljenovic
Erik de Castro Lopo
writes: Jens Petersen wrote:
http://{code,community,projects}.haskell..org/ seem to be inaccessible.
Could someone please look into it?
For me, it seems to be down everyday around 5-6pm (0700-0800 UTC) which is prime hacking time for me.
Anyone know what's going on with the machine at that time?
Well, it's hosted in the USA which is somewhere around UTC-8; as such your prime hacking time is prime sleeping time for those poor old servers! Let the poor dears rest! ;-) Unfortunately, I come from China. :-( code.haskell.org is always down in my time.
-- Andy

http://{code,community,projects}.haskell..org/ seem to be inaccessible.
Could someone please look into it?
For me, it seems to be down everyday around 5-6pm (0700-0800 UTC) which is prime hacking time for me.
Anyone know what's going on with the machine at that time?
Well, it's hosted in the USA which is somewhere around UTC-8; as such your prime hacking time is prime sleeping time for those poor old servers! Let the poor dears rest! ;-) Unfortunately, I come from China. :-( code.haskell.org is always down in my time.
We think that the apache web server is using up the machine resources through some kind of memory leak. Our temporary solution until recently has been to automatically kill and restart apache once a day. We have now moved to restarting it every 6 hours, hoping that this will increase its availability. Please keep us informed whether this is an improvement, or whether you still see long down periods. Regards, Malcolm

Malcolm Wallace
We think that the apache web server [snip]
Well, _there's_ your problem! You're relying on some random project written using that completely unsafe C language rather than one written using a pure garbage-collected language with strong static typing for extra safety! :p -- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com IvanMiljenovic.wordpress.com

Ivan Lazar Miljenovic schrieb:
Malcolm Wallace
writes: We think that the apache web server [snip]
Well, _there's_ your problem! You're relying on some random project written using that completely unsafe C language rather than one written using a pure garbage-collected language with strong static typing for extra safety! :p
I'm running a web-server that is entirely written in Haskell. I also have to restart it from time to time. I do it by a cronjob that checks occasionally whether the web server is still running. I even do not know, whether there is a memory leak or whether I'm expecting too much of what that machine can handle at all. Thus I'd be interested to know whether the code.haskell.org server is knocked out by Apache's core C code, or by Python code or by Haskell code.

On Thu, May 6, 2010 at 2:15 AM, Malcolm Wallace < malcolm.wallace@cs.york.ac.uk> wrote:
http://{code,community,projects}.haskell..org/ seem to be inaccessible.
Could someone please look into it?
For me, it seems to be down everyday around 5-6pm (0700-0800 UTC) which is prime hacking time for me.
Anyone know what's going on with the machine at that time?
Well, it's hosted in the USA which is somewhere around UTC-8; as such your prime hacking time is prime sleeping time for those poor old servers! Let the poor dears rest! ;-)
Unfortunately, I come from China. :-( code.haskell.org is always down in my time.
We think that the apache web server is using up the machine resources through some kind of memory leak. Our temporary solution until recently has been to automatically kill and restart apache once a day. We have now moved to restarting it every 6 hours, hoping that this will increase its availability. Please keep us informed whether this is an improvement, or whether you still see long down periods.
The last time I noticed it was down I made the following observations: * I could ssh into the machine * top didn't show any process as using ridiculous amounts of memory * CPU time was very low across all processes, essentially zero * load avg was less than 1 * I could telnet to port 80 and when I manually typed an HTTP GET request there was no response * I tried the above request to darcs.haskell.org and it immediately served a response * netstat showed lots of sockets * many of the sockets were from webcrawlers * nearly all sockets were either in SYN_RECV or CLOSE_WAIT So, at least the other day apache was accepting connections on port 80 but not properly servicing them. Because the load avg was so low I doubt it was waiting on disk IO. The interesting thing about the HTTP request I made is that it should have given an error code (meaning, no data needed to be served from a web directory other than possibly Apache's config and checking for content.) I hope you find this info useful. Jason

Jason Dagit wrote:
The last time I noticed it was down I made the following observations... I hope you find this info useful.
That is indeed very useful. I had noticed a few times that the apache processes had not exhausted memory when the problem occurred - though usually they do exhaust memory. Your detective work is strong evidence that the memory problem is probably just another side effect of the real problem. Right now, we are in the process of moving c.h.o to a server with a more recent OS and more hardware capacity. That will take a little while, so in the meantime we are trying to use the available time most effectively by focusing mostly on server upgrade. We are hoping to keep the old server limping along in the meantime with band-aids like periodic restarts of apache. However, your work shows that this problem may not just go away by itself with the upgrade. Thanks for your help in further tracking down this problem. Let us know (possibly off list) if you have any more ideas. Regards, Yitz
participants (11)
-
Andy Stewart
-
Christopher Done
-
Erik de Castro Lopo
-
Henning Thielemann
-
Henning Thielemann
-
Ivan Lazar Miljenovic
-
Jason Dagit
-
Jens Petersen
-
Malcolm Wallace
-
Roel van Dijk
-
Yitzchak Gale