Thanks folks! Forward progress is made...

Unfortunately, programs don't seem to write out their threadscope event logs until they terminate, and mine hangs until I kill it, so I can't get at the event log.

Tracing has taught me that before the hang-cause, my program splits its time in pthread_cond_wait in two different threads, and select in a third. After the hang, it no longer calls select and one of those pthread_cond_waits  in the other. In the version without -threaded that doesn't hang, it never does any pthread_cond_wait and never misses the select.

Now to go figure out what impossible condition it's waiting on, I guess.

Aran

On Thu, May 13, 2010 at 2:13 AM, Ketil Malde <ketil@malde.org> wrote:
Aran Donohue <aran.donohue@gmail.com> writes:

> I have a program that I can reliably cause to hang. It's concurrent using
> STM, so I think it could be a deadlock or related issue. I also do some IO,
> so I think it could be blocking in a system call.

If it's the latter, 'strace' might help you.  Use 'strace -p PID' to
attach to a running process.  Similarly, 'ltrace' can trace library
calls (but probably less useful in this context?)

(This is on Linux, but other OSes are likely to have similar tools.)

-k
--
If I haven't seen further, it is by standing in the footprints of giants