thread killed

tsuraan

4 Apr 2012 4 Apr '12

4:37 a.m.

What sorts of things can cause a thread to get an asynchronous "thread killed" exception? I've been seeing rare, inexplicable "thread killed" messages in my Snap handlers for a long time, but they aren't from Snap's timeout code. I recently upgraded to ghc 7.4.1, and that caused the kills to happen a lot more often, but also gave me some traceback capabilities. I tracked the most common kills down to cryptohash's Crypto.Hash.Tiger.update function, but that's about as pure a FFI function can be, so I don't know how that would be causing anything weird to happen. I also sometimes get the kills in the Tiger.finalize function, and I get other ones in functions that I haven't been able to track down yet. Given that the thread kills aren't from Snap's timeout code (they happen in under a second, and I have snap's timeouts turned to an insanely high value), what sort of other things cause ThreadKilled exceptions? Thanks for any help; this is really driving me mad :-/

Show replies by date

Gregory Collins

4 Apr 4 Apr

6:07 a.m.

On Wed, Apr 4, 2012 at 6:37 AM, tsuraan wrote:

...

What sorts of things can cause a thread to get an asynchronous "thread killed" exception? I've been seeing rare, inexplicable "thread killed" messages in my Snap handlers for a long time, but they aren't from Snap's timeout code. I recently upgraded to ghc 7.4.1, and that caused the kills to happen a lot more often, but also gave me some traceback capabilities. I tracked the most common kills down to cryptohash's Crypto.Hash.Tiger.update function, but that's about as

That's probably not where the threadKill is being sent *from*, it's where your thread received it.

...

pure a FFI function can be, so I don't know how that would be causing anything weird to happen. I also sometimes get the kills in the Tiger.finalize function, and I get other ones in functions that I haven't been able to track down yet. Given that the thread kills aren't from Snap's timeout code (they happen in under a second, and I have snap's timeouts turned to an insanely high value), what sort of other things cause ThreadKilled exceptions?

It's hard to rule Snap timeouts out; try building snap-core with the "-fdebug" flag and running your app with "DEBUG=1", you'll get a spew of debugging output from Snap on stderr. G -- Gregory Collins

Donn Cave

6:57 a.m.

On Wed, Apr 4, 2012 at 6:37 AM, tsuraan wrote:

...

What sorts of things can cause a thread to get an asynchronous "thread killed" exception? I've been seeing rare, inexplicable "thread killed" messages in my Snap handlers for a long time, but they aren't from Snap's timeout code.

This is a long shot, but it's easy to test - turn off GHC's RTS timer, +RTS -V0 -RTS. That removes a source of SIGALRM interrupts. Donn

tsuraan

2:53 p.m.

...

This is a long shot, but it's easy to test - turn off GHC's RTS timer, +RTS -V0 -RTS. That removes a source of SIGALRM interrupts.

Awesome, I'll give that a try. It's worth a shot, anyhow :)

tsuraan

8:10 p.m.

...

This is a long shot, but it's easy to test - turn off GHC's RTS timer, +RTS -V0 -RTS. That removes a source of SIGALRM interrupts.

I was really hoping this one would reveal something interesting, but it seems to have no effect. Thanks for the hint though.

tsuraan

2:52 p.m.

...

That's probably not where the threadKill is being sent *from*, it's where your thread received it.

Yeah, it's definitely where my thread received it. It's just sort of crazy, because when I get a ThreadKilled, it's almost always in Tiger.update. My handler does much slower things, such as connecting to a database and doing operations that can take tens of milliseconds, but somehow the ThreadKilled nearly always emanates from my Tiger.update. I even went so far as to wrap my Tiger.update in an IO operation that catches the ThreadKilled and tries the update again, and that "fixed" upwards of 90% of my thread deaths. Crazy...

...

It's hard to rule Snap timeouts out; try building snap-core with the "-fdebug" flag and running your app with "DEBUG=1", you'll get a spew of debugging output from Snap on stderr.

I'll give that a try, but whenever I've added any sort of printing to my handler to try to track things down, the issue goes away entirely. My toolkit for debugging race conditions is pretty weak; I usually have been able to think real hard and then fix them intuitively, but my intuition about Haskell is still weak enough that my normal approach isn't working :)

tsuraan

8:09 p.m.

...

It's hard to rule Snap timeouts out; try building snap-core with the "-fdebug" flag and running your app with "DEBUG=1", you'll get a spew of debugging output from Snap on stderr.

Heh, that was quite a spew. I normally get the exceptions tens of MB into files that are hundreds of MB, and I sometimes don't get them at all, so printing out the entire request body was a bit slow :) After commenting out some of the more talkative debug statements, I got the exception to happen, and it looks generally like this: [ 16] killIfTooSlow: continue [ 16] rqBody iterator: continue [ 16] httpSession iteratee: continue [ 16] SimpleBackend.enumerate(13): got continue [ 16] SimpleBackend.enumerate(13): reading from socket [ 16] SimpleBackend.enumerate(13): got 8192 bytes from read end [ 16] SimpleBackend.enumerate(13): sending 8192 bytes to continuation [ 16] killIfTooSlow: continue [ 16] rqBody iterator: continue [ 16] httpSession iteratee: continue [ 16] SimpleBackend.enumerate(13): got continue [ 16] SimpleBackend.enumerate(13): reading from socket [ 16] SimpleBackend.enumerate(13): got 8192 bytes from read end [ 16] SimpleBackend.enumerate(13): sending 8192 bytes to continuation [ 16] killIfTooSlow: continue [ 16] rqBody iterator: continue [ 16] httpSession iteratee: continue [ 16] SimpleBackend.enumerate(13): got continue [ 16] SimpleBackend.enumerate(13): reading from socket [ 16] SimpleBackend.enumerate(13): got 1878 bytes from read end [ 16] SimpleBackend.enumerate(13): sending 1878 bytes to continuation [ 16] killIfTooSlow: continue [ 16] rqBody iterator: continue [ 16] rateLimit: caught thread killed [ 16] Snap.Http.Server.Config errorHandler: [ 16] During processing of request from 127.0.0.1:38088 < a bunch of headers snipped > [ 16] Server.httpSession: finished running user handler [ 16] Server.httpSession: handled, skipping request body [ 16] httpSession/skipToEof: BEGIN [ 16] httpSession/skipToEof: continue [ 16] Server.httpSession: request body skipped, sending response [ 16] sendResponse: whenEnum: enumerating bytes [ 16] countBytes writeEnd: BEGIN [ 16] writeEnd: BEGIN [ 16] writeEnd: continue [ 16] countBytes writeEnd: continue [ 16] SimpleBackend.writeOut(13): got chunk with 233 bytes [ 16] SimpleBackend.writeOut(13): wrote 233 bytes, last 10="ead killed" So, I'm not sure what that means. rateLimit caught the thread kill, but I don't see anything snap-related that caused it. That rateLimit message is the rateLimit seeing an error, and not rateLimit causing one, right?

tsuraan

9:59 p.m.

My Snap handlers communicate with various resource pools, often through MVars. Is it possible that MVar deadlock would be causing the runtime system to kill off a contending thread, giving it a ThreadKilled exception? It looks like ghc does do deadlock detection, but I can't find any docs on how exactly it deals with deadlocks.

Clark Gaebel

10:43 p.m.

Whenever I've deadlocked, it terminated the program with "thread blocked indefinitely in an MVar operation". On Wed, Apr 4, 2012 at 5:59 PM, tsuraan wrote:

...

My Snap handlers communicate with various resource pools, often through MVars. Is it possible that MVar deadlock would be causing the runtime system to kill off a contending thread, giving it a ThreadKilled exception? It looks like ghc does do deadlock detection, but I can't find any docs on how exactly it deals with deadlocks.

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

tsuraan

5 Apr 5 Apr

2:14 a.m.

...

Whenever I've deadlocked, it terminated the program with "thread blocked indefinitely in an MVar operation".

Well, I guess that's probably not what I'm seeing. I'm currently trying to simplify the heck out of the code that near where the thread killed exceptions are emanating; maybe once that's done, the thread killing will either magically go away, or at least I'll have a smaller surface area to try to debug.

4839

Age (days ago)

4840

Last active (days ago)

List overview

Download

9 comments

4 participants

participants (4)

Clark Gaebel
Donn Cave
Gregory Collins
tsuraan

thread killed

tags

participants (4)