Hi all, 

Thanks so much for everyone's responses! I finally found the problem, so I thought I'd follow up and share what happened...

It turned out that the problem was not in the STM implementation, but rather in bad programming on my part. For some reason, I had a thread (thread #1) performing a transaction that blocked until any one of several TQueues become non-empty. On success, the thread sent a value onto another TQueue monitored by some another thread (thread #2). Thread #2 would then process all the items in the queues monitored by the first thread.  

This lead to the following problem: when one of the TQueues became non-empty, the first thread would just go through its loop repeatedly, filling the other queue with values, and thread #2 wouldn't get a chance to run for a long time. This quickly lead to huge amounts of memory being used and the program would get totally bogged down.  I finally found the problem when I noticed that I could make the problem less severe with -C0 and more severe with large values for -C. Large values let the first thread repeat the loop for a longer time before the second thread is scheduled and removes values from the queues.

-Andi



On Thu, Oct 17, 2013 at 10:53 AM, Andreas Voellmy <andreas.voellmy@gmail.com> wrote:
Thanks! I'll try to reduce my test case and then I'll post an issue. 

I'm currently suspecting that it has something to do with signal handling and STM. It seems that the program goes wrong after getting a SIGPIPE from trying to send to a closed socket.


On Thu, Oct 17, 2013 at 9:35 AM, Ryan Yates <fryguybob@gmail.com> wrote:
The bug that Luite and I uncovered is http://ghc.haskell.org/trac/ghc/ticket/7930.  It would not be related.  There was a bug relating to `catchSTM` that was fixed recently: http://ghc.haskell.org/trac/ghc/ticket/8035.  And another related to profiling: http://ghc.haskell.org/trac/ghc/ticket/8298.  I doubt either of these is related.  I'm happy to help narrow things down.

Ryan


On Thu, Oct 17, 2013 at 4:39 AM, Simon Marlow <marlowsd@gmail.com> wrote:
On 17/10/2013 03:01, Andreas Voellmy wrote:
Hi all,

I have a program that uses STM heavily and also performs lots of foreign
calls. I've noticed that sometimes the program uses 100% CPU
indefinitely and uses lots of memory - I see it go up to about 5GB
before I kill it. I've grabbed some preliminary samples of stack traces
and see lots stm related stuff (e.g. lots of stg_atomically_frame_info
and stmCommitTransaction entries).  I can pretty reliably get the
behavior to happen now by closing a socket that my Haskell program is
trying to recv from. When this causes an exception to be raised
(something like "recv: resource vanished (Connection reset by peer)") ,
then this behavior gets triggered.  I haven't pinned down the bug yet,
but I'm suspecting it is STM related - somehow the exception causes some
STM transaction to go wrong.

Are there any known bugs that sound similar to this?

BTW, this is with GHC 7.6.3 from a recent HP release on OS X.

Please create a ticket and dump all the information you have in it. There might be something we can tell from the stack trace, but if not we'll need a way to reproduce it.

Cheers,
Simon


_______________________________________________
ghc-devs mailing list
ghc-devs@haskell.org
http://www.haskell.org/mailman/listinfo/ghc-devs