Possible bug related to stm and exceptions

Hi all, I have a program that uses STM heavily and also performs lots of foreign calls. I've noticed that sometimes the program uses 100% CPU indefinitely and uses lots of memory - I see it go up to about 5GB before I kill it. I've grabbed some preliminary samples of stack traces and see lots stm related stuff (e.g. lots of stg_atomically_frame_info and stmCommitTransaction entries). I can pretty reliably get the behavior to happen now by closing a socket that my Haskell program is trying to recv from. When this causes an exception to be raised (something like "recv: resource vanished (Connection reset by peer)") , then this behavior gets triggered. I haven't pinned down the bug yet, but I'm suspecting it is STM related - somehow the exception causes some STM transaction to go wrong. Are there any known bugs that sound similar to this? BTW, this is with GHC 7.6.3 from a recent HP release on OS X. -Andi

I remember Luite mentioning that when he and Ryan Yates where working out
STM for GHCJS, that they had found some bugs in STM in ghc (presumably now
fixed in head?), have you been able to reproduce the problem in head?
theres also some issues with STM relating to fairness, could you be hitting
a fairness issue?
On Wed, Oct 16, 2013 at 9:01 PM, Andreas Voellmy
Hi all,
I have a program that uses STM heavily and also performs lots of foreign calls. I've noticed that sometimes the program uses 100% CPU indefinitely and uses lots of memory - I see it go up to about 5GB before I kill it. I've grabbed some preliminary samples of stack traces and see lots stm related stuff (e.g. lots of stg_atomically_frame_info and stmCommitTransaction entries). I can pretty reliably get the behavior to happen now by closing a socket that my Haskell program is trying to recv from. When this causes an exception to be raised (something like "recv: resource vanished (Connection reset by peer)") , then this behavior gets triggered. I haven't pinned down the bug yet, but I'm suspecting it is STM related - somehow the exception causes some STM transaction to go wrong.
Are there any known bugs that sound similar to this?
BTW, this is with GHC 7.6.3 from a recent HP release on OS X.
-Andi
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

Hi Carter, I just tested with HEAD and the problem still happens. On Wed, Oct 16, 2013 at 10:29 PM, Carter Schonwald < carter.schonwald@gmail.com> wrote:
I remember Luite mentioning that when he and Ryan Yates where working out STM for GHCJS, that they had found some bugs in STM in ghc (presumably now fixed in head?), have you been able to reproduce the problem in head?
theres also some issues with STM relating to fairness, could you be hitting a fairness issue?
On Wed, Oct 16, 2013 at 9:01 PM, Andreas Voellmy < andreas.voellmy@gmail.com> wrote:
Hi all,
I have a program that uses STM heavily and also performs lots of foreign calls. I've noticed that sometimes the program uses 100% CPU indefinitely and uses lots of memory - I see it go up to about 5GB before I kill it. I've grabbed some preliminary samples of stack traces and see lots stm related stuff (e.g. lots of stg_atomically_frame_info and stmCommitTransaction entries). I can pretty reliably get the behavior to happen now by closing a socket that my Haskell program is trying to recv from. When this causes an exception to be raised (something like "recv: resource vanished (Connection reset by peer)") , then this behavior gets triggered. I haven't pinned down the bug yet, but I'm suspecting it is STM related - somehow the exception causes some STM transaction to go wrong.
Are there any known bugs that sound similar to this?
BTW, this is with GHC 7.6.3 from a recent HP release on OS X.
-Andi
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

On 17/10/2013 03:01, Andreas Voellmy wrote:
Hi all,
I have a program that uses STM heavily and also performs lots of foreign calls. I've noticed that sometimes the program uses 100% CPU indefinitely and uses lots of memory - I see it go up to about 5GB before I kill it. I've grabbed some preliminary samples of stack traces and see lots stm related stuff (e.g. lots of stg_atomically_frame_info and stmCommitTransaction entries). I can pretty reliably get the behavior to happen now by closing a socket that my Haskell program is trying to recv from. When this causes an exception to be raised (something like "recv: resource vanished (Connection reset by peer)") , then this behavior gets triggered. I haven't pinned down the bug yet, but I'm suspecting it is STM related - somehow the exception causes some STM transaction to go wrong.
Are there any known bugs that sound similar to this?
BTW, this is with GHC 7.6.3 from a recent HP release on OS X.
Please create a ticket and dump all the information you have in it. There might be something we can tell from the stack trace, but if not we'll need a way to reproduce it. Cheers, Simon

The bug that Luite and I uncovered is
http://ghc.haskell.org/trac/ghc/ticket/7930. It would not be related.
There was a bug relating to `catchSTM` that was fixed recently:
http://ghc.haskell.org/trac/ghc/ticket/8035. And another related to
profiling: http://ghc.haskell.org/trac/ghc/ticket/8298. I doubt either of
these is related. I'm happy to help narrow things down.
Ryan
On Thu, Oct 17, 2013 at 4:39 AM, Simon Marlow
On 17/10/2013 03:01, Andreas Voellmy wrote:
Hi all,
I have a program that uses STM heavily and also performs lots of foreign calls. I've noticed that sometimes the program uses 100% CPU indefinitely and uses lots of memory - I see it go up to about 5GB before I kill it. I've grabbed some preliminary samples of stack traces and see lots stm related stuff (e.g. lots of stg_atomically_frame_info and stmCommitTransaction entries). I can pretty reliably get the behavior to happen now by closing a socket that my Haskell program is trying to recv from. When this causes an exception to be raised (something like "recv: resource vanished (Connection reset by peer)") , then this behavior gets triggered. I haven't pinned down the bug yet, but I'm suspecting it is STM related - somehow the exception causes some STM transaction to go wrong.
Are there any known bugs that sound similar to this?
BTW, this is with GHC 7.6.3 from a recent HP release on OS X.
Please create a ticket and dump all the information you have in it. There might be something we can tell from the stack trace, but if not we'll need a way to reproduce it.
Cheers, Simon
______________________________**_________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/**mailman/listinfo/ghc-devshttp://www.haskell.org/mailman/listinfo/ghc-devs

Thanks! I'll try to reduce my test case and then I'll post an issue.
I'm currently suspecting that it has something to do with signal handling
and STM. It seems that the program goes wrong after getting a SIGPIPE from
trying to send to a closed socket.
On Thu, Oct 17, 2013 at 9:35 AM, Ryan Yates
The bug that Luite and I uncovered is http://ghc.haskell.org/trac/ghc/ticket/7930. It would not be related. There was a bug relating to `catchSTM` that was fixed recently: http://ghc.haskell.org/trac/ghc/ticket/8035. And another related to profiling: http://ghc.haskell.org/trac/ghc/ticket/8298. I doubt either of these is related. I'm happy to help narrow things down.
Ryan
On Thu, Oct 17, 2013 at 4:39 AM, Simon Marlow
wrote: On 17/10/2013 03:01, Andreas Voellmy wrote:
Hi all,
I have a program that uses STM heavily and also performs lots of foreign calls. I've noticed that sometimes the program uses 100% CPU indefinitely and uses lots of memory - I see it go up to about 5GB before I kill it. I've grabbed some preliminary samples of stack traces and see lots stm related stuff (e.g. lots of stg_atomically_frame_info and stmCommitTransaction entries). I can pretty reliably get the behavior to happen now by closing a socket that my Haskell program is trying to recv from. When this causes an exception to be raised (something like "recv: resource vanished (Connection reset by peer)") , then this behavior gets triggered. I haven't pinned down the bug yet, but I'm suspecting it is STM related - somehow the exception causes some STM transaction to go wrong.
Are there any known bugs that sound similar to this?
BTW, this is with GHC 7.6.3 from a recent HP release on OS X.
Please create a ticket and dump all the information you have in it. There might be something we can tell from the stack trace, but if not we'll need a way to reproduce it.
Cheers, Simon
______________________________**_________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/**mailman/listinfo/ghc-devshttp://www.haskell.org/mailman/listinfo/ghc-devs

Hi all,
Thanks so much for everyone's responses! I finally found the problem, so I
thought I'd follow up and share what happened...
It turned out that the problem was not in the STM implementation, but
rather in bad programming on my part. For some reason, I had a thread
(thread #1) performing a transaction that blocked until any one of several
TQueues become non-empty. On success, the thread sent a value onto another
TQueue monitored by some another thread (thread #2). Thread #2 would then
process all the items in the queues monitored by the first thread.
This lead to the following problem: when one of the TQueues became
non-empty, the first thread would just go through its loop repeatedly,
filling the other queue with values, and thread #2 wouldn't get a chance to
run for a long time. This quickly lead to huge amounts of memory being used
and the program would get totally bogged down. I finally found the problem
when I noticed that I could make the problem less severe with -C0 and more
severe with large values for -C. Large values let the first thread repeat
the loop for a longer time before the second thread is scheduled and
removes values from the queues.
-Andi
On Thu, Oct 17, 2013 at 10:53 AM, Andreas Voellmy wrote: Thanks! I'll try to reduce my test case and then I'll post an issue. I'm currently suspecting that it has something to do with signal handling
and STM. It seems that the program goes wrong after getting a SIGPIPE from
trying to send to a closed socket. On Thu, Oct 17, 2013 at 9:35 AM, Ryan Yates The bug that Luite and I uncovered is
http://ghc.haskell.org/trac/ghc/ticket/7930. It would not be related.
There was a bug relating to `catchSTM` that was fixed recently:
http://ghc.haskell.org/trac/ghc/ticket/8035. And another related to
profiling: http://ghc.haskell.org/trac/ghc/ticket/8298. I doubt either
of these is related. I'm happy to help narrow things down. Ryan On Thu, Oct 17, 2013 at 4:39 AM, Simon Marlow On 17/10/2013 03:01, Andreas Voellmy wrote: Hi all, I have a program that uses STM heavily and also performs lots of foreign
calls. I've noticed that sometimes the program uses 100% CPU
indefinitely and uses lots of memory - I see it go up to about 5GB
before I kill it. I've grabbed some preliminary samples of stack traces
and see lots stm related stuff (e.g. lots of stg_atomically_frame_info
and stmCommitTransaction entries). I can pretty reliably get the
behavior to happen now by closing a socket that my Haskell program is
trying to recv from. When this causes an exception to be raised
(something like "recv: resource vanished (Connection reset by peer)") ,
then this behavior gets triggered. I haven't pinned down the bug yet,
but I'm suspecting it is STM related - somehow the exception causes some
STM transaction to go wrong. Are there any known bugs that sound similar to this? BTW, this is with GHC 7.6.3 from a recent HP release on OS X. Please create a ticket and dump all the information you have in it.
There might be something we can tell from the stack trace, but if not we'll
need a way to reproduce it. Cheers,
Simon _______________________________________________
ghc-devs mailing list
ghc-devs@haskell.org
http://www.haskell.org/mailman/listinfo/ghc-devs
participants (4)
-
Andreas Voellmy
-
Carter Schonwald
-
Ryan Yates
-
Simon Marlow