On Thu, May 13, 2010 at 5:53 AM, Aran Donohue <aran.donohue@gmail.com> wrote:
Thanks folks! Forward progress is made...

Unfortunately, programs don't seem to write out their threadscope event logs until they terminate, and mine hangs until I kill it, so I can't get at the event log.

Tracing has taught me that before the hang-cause, my program splits its time in pthread_cond_wait in two different threads, and select in a third. After the hang, it no longer calls select and one of those pthread_cond_waits  in the other. In the version without -threaded that doesn't hang, it never does any pthread_cond_wait and never misses the select.

Now to go figure out what impossible condition it's waiting on, I guess.

The select sounds like the IO manager thread (a thread in the RTS not your code).  Is it possible that one of your threads does work but never allocates memory?  I've heard in some cases that can lead to starvation.  I think the explanation was that thread switching happens on allocation?

Jason