A bug of multicore IO manager

Hi, As I said before, I started running HTTP server using Mio in the real world. Unfortunately, the daemon is not stable. After one day or so, the server cannot accept any HTTP requests. No error messages from the server. The server is alive. To terminate the server (running in a "screen" terminal), single Ctrl-c is not enough. Typing Ctrl-c again terminates the server. After several tests, I'm getting convinced that this occurs only when +RTS -N<x> is specified (where <x> >= 2). The server runs well if +RTS -N<x> is not specified. My question: if the program complied with GHC needs double Ctrl-c to terminate, what is the situation of the program? P.S. It seems to me that the server also is leaking space. The server is getting fatter gradually. --Kazu

Hi Kazu,
On Tue, Sep 3, 2013 at 2:52 PM, Kazu Yamamoto
Hi,
As I said before, I started running HTTP server using Mio in the real world. Unfortunately, the daemon is not stable.
After one day or so, the server cannot accept any HTTP requests. No error messages from the server.
The server is alive. To terminate the server (running in a "screen" terminal), single Ctrl-c is not enough. Typing Ctrl-c again terminates the server.
Could you run an strace on the process in this state so we can get an idea what it's doing?
After several tests, I'm getting convinced that this occurs only when +RTS -N<x> is specified (where <x> >= 2). The server runs well if +RTS -N<x> is not specified.
That indicates that the problem is with the threaded RTS and perhaps with the IO manager.
My question: if the program complied with GHC needs double Ctrl-c to terminate, what is the situation of the program?
If Ctrl+C generates an exception (does it?) there could be an overzealous exception catcher somewhere that catches all exceptions, including your Ctrl+C.
P.S.
It seems to me that the server also is leaking space. The server is getting fatter gradually.
Could you use the profiler to see what type of objects are leaking?

Kazu, thanks for noticing this! I will try to recreate it on my server as
well.
-Andi
On Tue, Sep 3, 2013 at 5:57 PM, Johan Tibell
Hi Kazu,
On Tue, Sep 3, 2013 at 2:52 PM, Kazu Yamamoto
wrote: Hi,
As I said before, I started running HTTP server using Mio in the real world. Unfortunately, the daemon is not stable.
After one day or so, the server cannot accept any HTTP requests. No error messages from the server.
The server is alive. To terminate the server (running in a "screen" terminal), single Ctrl-c is not enough. Typing Ctrl-c again terminates the server.
Could you run an strace on the process in this state so we can get an idea what it's doing?
After several tests, I'm getting convinced that this occurs only when +RTS -N<x> is specified (where <x> >= 2). The server runs well if +RTS -N<x> is not specified.
That indicates that the problem is with the threaded RTS and perhaps with the IO manager.
My question: if the program complied with GHC needs double Ctrl-c to terminate, what is the situation of the program?
If Ctrl+C generates an exception (does it?) there could be an overzealous exception catcher somewhere that catches all exceptions, including your Ctrl+C.
P.S.
It seems to me that the server also is leaking space. The server is getting fatter gradually.
Could you use the profiler to see what type of objects are leaking?
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

Hi Kazu,
What sort of workload was the mighty server under during those 1 or 2 days
while you waited for it to become unresponsive. I.e. was this a production
web server? Or were you generating requests at some frequency or leaving it
mostly idle?
-Andi
On Tue, Sep 3, 2013 at 6:29 PM, Andreas Voellmy
Kazu, thanks for noticing this! I will try to recreate it on my server as well.
-Andi
On Tue, Sep 3, 2013 at 5:57 PM, Johan Tibell
wrote: Hi Kazu,
On Tue, Sep 3, 2013 at 2:52 PM, Kazu Yamamoto
wrote: Hi,
As I said before, I started running HTTP server using Mio in the real world. Unfortunately, the daemon is not stable.
After one day or so, the server cannot accept any HTTP requests. No error messages from the server.
The server is alive. To terminate the server (running in a "screen" terminal), single Ctrl-c is not enough. Typing Ctrl-c again terminates the server.
Could you run an strace on the process in this state so we can get an idea what it's doing?
After several tests, I'm getting convinced that this occurs only when +RTS -N<x> is specified (where <x> >= 2). The server runs well if +RTS -N<x> is not specified.
That indicates that the problem is with the threaded RTS and perhaps with the IO manager.
My question: if the program complied with GHC needs double Ctrl-c to terminate, what is the situation of the program?
If Ctrl+C generates an exception (does it?) there could be an overzealous exception catcher somewhere that catches all exceptions, including your Ctrl+C.
P.S.
It seems to me that the server also is leaking space. The server is getting fatter gradually.
Could you use the profiler to see what type of objects are leaking?
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

Hi Andi,
What sort of workload was the mighty server under during those 1 or 2 days while you waited for it to become unresponsive. I.e. was this a production web server? Or were you generating requests at some frequency or leaving it mostly idle?
I ran Mighty on http://mew.org. This is my private domain which provides my free programs and articles. It's not so busy but not so dull. I did not generate requests from measurement tools. --Kazu

On 03/09/13 22:57, Johan Tibell wrote:
Hi Kazu,
On Tue, Sep 3, 2013 at 2:52 PM, Kazu Yamamoto
mailto:kazu@iij.ad.jp> wrote: Hi,
As I said before, I started running HTTP server using Mio in the real world. Unfortunately, the daemon is not stable.
After one day or so, the server cannot accept any HTTP requests. No error messages from the server.
The server is alive. To terminate the server (running in a "screen" terminal), single Ctrl-c is not enough. Typing Ctrl-c again terminates the server.
Could you run an strace on the process in this state so we can get an idea what it's doing?
After several tests, I'm getting convinced that this occurs only when +RTS -N<x> is specified (where <x> >= 2). The server runs well if +RTS -N<x> is not specified.
That indicates that the problem is with the threaded RTS and perhaps with the IO manager.
My question: if the program complied with GHC needs double Ctrl-c to terminate, what is the situation of the program?
If Ctrl+C generates an exception (does it?) there could be an overzealous exception catcher somewhere that catches all exceptions, including your Ctrl+C.
The first Ctrl-C is sent as an Interrupted exception to the main thread. The second Ctrl-C sends a SIGINT as usual, which tends to kill the process. If you need two Ctrl-Cs to kill the program, it probably means that it deadlocked somewhere, maybe in the RTS. Kazu: if you can attach to the deadlocked process with gdb and get stack traces of all the threads, that might help. Cheers, Simon

Hi,
If you need two Ctrl-Cs to kill the program, it probably means that it deadlocked somewhere, maybe in the RTS. Kazu: if you can attach to the deadlocked process with gdb and get stack traces of all the threads, that might help.
To debug with GDB, I complied Mighty with "-debug". This changes the behavior and I got the following error: mighty-20130905: internal error: ASSERTION FAILED: file rts/sm/MarkWeak.c, line 371 (GHC version 7.7.20130901 for i386_unknown_linux) Please report this as a GHC bug: http://www.haskell.org/ghc/reportabug Simon, can you tell what's going on? --Kazu

Hi,
On Thu, Sep 5, 2013 at 9:10 PM, Kazu Yamamoto
Hi,
If you need two Ctrl-Cs to kill the program, it probably means that it deadlocked somewhere, maybe in the RTS. Kazu: if you can attach to the deadlocked process with gdb and get stack traces of all the threads, that might help.
To debug with GDB, I complied Mighty with "-debug". This changes the behavior and I got the following error:
mighty-20130905: internal error: ASSERTION FAILED: file rts/sm/MarkWeak.c, line 371
(GHC version 7.7.20130901 for i386_unknown_linux) Please report this as a GHC bug: http://www.haskell.org/ghc/reportabug
I wonder if this issue could have been introduced by the commit: https://github.com/ghc/ghc/commit/6770663f764db76dbb7138ccb3aea0527d194151 It looks like after the commit, addCFinalizerToWeak# can call into the GC with the closure lock held. This means the info pointer points to stg_WHITEHOLE_info, breaking the asserted invariant. I haven't done any testing to confirm this, however. -- Takano Akio

Hi Takano-san,
It looks like after the commit, addCFinalizerToWeak# can call into the GC with the closure lock held. This means the info pointer points to stg_WHITEHOLE_info, breaking the asserted invariant. I haven't done any testing to confirm this, however.
I can try. Should I revert this patch? --Kazu

I'm going to try to make a small test case today (probably after 08:00
UTC), but feel free to try it. If my guess is correct, reverting the patch
should fix the problem.
On Fri, Sep 6, 2013 at 7:38 AM, Kazu Yamamoto
Hi Takano-san,
It looks like after the commit, addCFinalizerToWeak# can call into the GC with the closure lock held. This means the info pointer points to stg_WHITEHOLE_info, breaking the asserted invariant. I haven't done any testing to confirm this, however.
I can try. Should I revert this patch?
--Kazu
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
participants (5)
-
Akio Takano
-
Andreas Voellmy
-
Johan Tibell
-
Kazu Yamamoto
-
Simon Marlow