
I'm working on trimming down the test code and filing a real bug. I'm going to list out what I know right now and if anything jumps out please let me know. Thanks! I'm running a webserver built using salvia [1] and GHC 6.10 [2]. I've trimmed down the code enough such that there is no obvious source of a deadlock in either salvia or the reset of the web server. I don't have any specific conditions that reproduce the issue as well. Just after some time, anywhere from a few minutes to a few hours, the server deadlocks. No particular request or number of requests seem to trigger the deadlock. 1) Salvia accepts connections on the main thread then forkIOs a new thread to actually handle the request. The new thread uses Handle based IO. 2) As I understand it, there are issues with forkProcess and Handle based IO. While this is a web server I'm avoiding using "daemonize" code that relies on forkProcess. no forkProcess is occurring that I know of. 3) The thread state summary printed by calling printAllThreads() from GDB is: all threads: threads on capability 0: other threads: thread 2 @ 0xb7d66000 is blocked on an MVar @ 0xb7d670b4 thread 3 @ 0xb7d74214 is blocked on an MVar @ 0xb7da88f0 4) The thread states according to a "thread apply all bt" from GDB is: 1. GDB backtrace Thread 4 (Thread 0xb7cffb90 (LWP 30891)): #0 0xb8080416 in __kernel_vsyscall () #1 0xb7fd0075 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/tls/i686/cmov/libpthread.so.0 #2 0x083f4320 in waitCondition (pCond=0x9a7cc1c, pMut=0x9a7cc4c) at posix/OSThreads.c:65 #3 0x0840de64 in yieldCapability (pCap=0xb7cff36c, task=0x9a7cc00) at Capability.c:506 #4 0x083eb292 in schedule (initialCapability=0x8565aa0, task=0x9a7cc00) at Schedule.c:293 #5 0x083ed5ff in workerStart (task=0x9a7cc00) at Schedule.c:1923 #6 0xb7fcc50f in start_thread () from /lib/tls/i686/cmov/libpthread.so.0 #7 0xb7f49a0e in clone () from /lib/tls/i686/cmov/libc.so.6 Thread 3 (Thread 0xb74feb90 (LWP 30892)): #0 0xb8080416 in __kernel_vsyscall () #1 0xb7fd0075 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/tls/i686/cmov/libpthread.so.0 #2 0x083f4320 in waitCondition (pCond=0x9a7ef3c, pMut=0x9a7ef6c) at posix/OSThreads.c:65 #3 0x0840de64 in yieldCapability (pCap=0xb74fe36c, task=0x9a7ef20) at Capability.c:506 #4 0x083eb292 in schedule (initialCapability=0x8565aa0, task=0x9a7ef20) at Schedule.c:293 #5 0x083ed5ff in workerStart (task=0x9a7ef20) at Schedule.c:1923 #6 0xb7fcc50f in start_thread () from /lib/tls/i686/cmov/libpthread.so.0 #7 0xb7f49a0e in clone () from /lib/tls/i686/cmov/libc.so.6 Thread 2 (Thread 0xb6cfdb90 (LWP 30916)): #0 0xb8080416 in __kernel_vsyscall () #1 0xb7fd0075 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/tls/i686/cmov/libpthread.so.0 #2 0x083f4320 in waitCondition (pCond=0x9a7e12c, pMut=0x9a7e15c) at posix/OSThreads.c:65 #3 0x0840de64 in yieldCapability (pCap=0xb6cfd36c, task=0x9a7e110) at Capability.c:506 #4 0x083eb292 in schedule (initialCapability=0x8565aa0, task=0x9a7e110) at Schedule.c:293 #5 0x083ed5ff in workerStart (task=0x9a7e110) at Schedule.c:1923 #6 0xb7fcc50f in start_thread () from /lib/tls/i686/cmov/libpthread.so.0 #7 0xb7f49a0e in clone () from /lib/tls/i686/cmov/libc.so.6 Thread 1 (Thread 0xb7e666b0 (LWP 30890)): #0 0xb8080416 in __kernel_vsyscall () #1 0xb7fd0075 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/tls/i686/cmov/libpthread.so.0 #2 0x083f4320 in waitCondition (pCond=0x9a7cb3c, pMut=0x9a7cb6c) at posix/OSThreads.c:65 #3 0x0840de64 in yieldCapability (pCap=0xbfa822ac, task=0x9a7cb20) at Capability.c:506 #4 0x083eb292 in schedule (initialCapability=0x8565aa0, task=0x9a7cb20) at Schedule.c:293 #5 0x083ed463 in scheduleWaitThread (tso=0xb7d80800, ret=0x0, cap=0x8565aa0) at Schedule.c:1895 #6 0x083e851a in rts_evalLazyIO (cap=0x8565aa0, p=0x8489478, ret=0x0) at RtsAPI.c:517 #7 0x083e79d5 in real_main () at Main.c:111 Anybody think of anything so far? Cheers, Corey O'Connor