
Hello all, I am currently investigating techniques to nicely handle SIGINTs when FFI code is running. Yes, I know it sounds kind of crazy. Ignoring the problems of cleaning up the unceremoniously terminated C computation, I'm having difficulty getting the FFI to /stop/ running when I get the signal. I currently have some code like this: http://hpaste.org/fastcgi/hpaste.fcgi/view?id=28422#a28422 which doesn't work, and which you guys can probably tell me doesn't work. :-) Throwing exceptions from the signal handler also doesn't work. Any suggestions for GHC source code to look at / techniques to try? Cheers, Edward

On Fri, Jul 30, 2010 at 8:19 PM, Edward Z. Yang
Hello all,
Hi!
Ignoring the problems of cleaning up the unceremoniously terminated C computation, I'm having difficulty getting the FFI to /stop/ running when I get the signal. I currently have some code like this:
In your test cases that fail are your C computations foreign unsafe imports? -Corey O'Connor coreyoconnor@gmail.com http://www.coreyoconnor.com

Excerpts from Corey O'Connor's message of Fri Aug 06 16:15:21 -0400 2010:
In your test cases that fail are your C computations foreign unsafe imports?
First thing I checked. :-) They were safe imports, and the Haskell code did get called--just the C code kept marching on. Cheers, Edward

On 06/08/2010 21:16, Edward Z. Yang wrote:
Excerpts from Corey O'Connor's message of Fri Aug 06 16:15:21 -0400 2010:
In your test cases that fail are your C computations foreign unsafe imports?
First thing I checked. :-) They were safe imports, and the Haskell code did get called--just the C code kept marching on.
Right, the RTS won't try to interrupt a foreign call even when there's a pending throwTo for the thread making the call. The reason is that, well, there's no way to interrupt C calls. You could try pthread_cancel I suppose, but only if the thread making the call is not a bound thread (because pthread_cancel kills the thread, it's not an exception mechanism). That might be quite interesting to try, actually. You'll need to modify the RTS: the place where we decide what to do when a throwTo is received for a thread involved in a foreign call is around line 396 of rts/RaiseAsync.c (in the HEAD): case BlockedOnCCall: case BlockedOnCCall_NoUnblockExc: blockedThrowTo(cap,target,msg); return THROWTO_BLOCKED; this is where you would call pthread_cancel (after checking for a bound thread). You should look into pthread_setcancelstate and pthread_setcanceltype, and call these appropriately for worker threads. Cheers, Simon

Excerpts from Simon Marlow's message of Mon Aug 09 11:23:42 -0400 2010:
That might be quite interesting to try, actually. You'll need to modify the RTS: the place where we decide what to do when a throwTo is received for a thread involved in a foreign call is around line 396 of rts/RaiseAsync.c (in the HEAD):
case BlockedOnCCall: case BlockedOnCCall_NoUnblockExc: blockedThrowTo(cap,target,msg); return THROWTO_BLOCKED;
this is where you would call pthread_cancel (after checking for a bound thread). You should look into pthread_setcancelstate and pthread_setcanceltype, and call these appropriately for worker threads.
I spent some time looking at the code, and I've been having a difficult time finding the thread ID of the worker thread that is performing the safe FFI call. The target TSO is the suspended Haskell thread, which afaict is distinct from the worker thread that is actually doing the FFI call, so the obvious Tasks from bound/cap seem to be the wrong ones. Do I have to walk all_tasks to find the one that's running the call I care about? Cheers, Edward

Excerpts from Edward Z. Yang's message of Thu Aug 26 01:22:22 -0400 2010:
I spent some time looking at the code, and I've been having a difficult time finding the thread ID of the worker thread that is performing the safe FFI call. The target TSO is the suspended Haskell thread, which afaict is distinct from the worker thread that is actually doing the FFI call, so the obvious Tasks from bound/cap seem to be the wrong ones. Do I have to walk all_tasks to find the one that's running the call I care about?
Of course, immediately after I send this message, my debug build finishes and I find target->bound->task is the one I care about. :-) Cheers, Edward

On 26/08/2010 06:57, Edward Z. Yang wrote:
Excerpts from Edward Z. Yang's message of Thu Aug 26 01:22:22 -0400 2010:
I spent some time looking at the code, and I've been having a difficult time finding the thread ID of the worker thread that is performing the safe FFI call. The target TSO is the suspended Haskell thread, which afaict is distinct from the worker thread that is actually doing the FFI call, so the obvious Tasks from bound/cap seem to be the wrong ones. Do I have to walk all_tasks to find the one that's running the call I care about?
Of course, immediately after I send this message, my debug build finishes and I find target->bound->task is the one I care about. :-)
target->bound->task is only present for a bound thread, for an ordinary unbound thread I think there is currently no (easy) way to get from the TSO to the Task. The InCall, which points to both the TSO and the Task, is stored on the cap->suspended_ccalls list, and you could find the right one by walking that list. Cheers, Simon

Here is a possible implementation: Task *task = NULL; blockedThrowTo(cap,target,msg); if (target->bound) { // maybe not supposed to kill bound threads, but it // seems to work ok (as long as they don't want to try // to recover!) task = target->bound->task; } else { // walk all_tasks to find the correct worker thread for (task = all_tasks; task != NULL; task = task->all_link) { if (task->incall->suspended_tso == target) { break; } } } if (task != NULL) { pthread_cancel(task->id); // cargo cult cargo cult... task->cap = NULL; task->stopped = rtsTrue; } This is quite good at causing the C computation to terminate, but not so good at letting the Task that requested the FFI call that it can wake up now. In particular, consider the following code (using the interruptible function defined earlier): foreign import ccall "foo.h" foo :: CInt -> IO () fooHs n = do putStrLn $ "Arf " ++ show n threadDelay 1000000 fooHs n main = main' 2 main' 0 = putStrLn "Quitting" main' n = do tid <- newEmptyMVar interruptible () $ do putMVar tid =<< myThreadId (r :: Either E.AsyncException ()) <- E.try $ foo n putStrLn "Thread was able to catch exception" print =<< readMVar tid print =<< threadStatus =<< readMVar tid putStrLn "----" main' (pred n) with foo.h/foo.c something like: void foo(int d) { while (1) { printf("Arf %d\n", d); sleep(1); } } Without the RTS patch, the first foo(2) loop continues even after interrupting (and resuming the primary execution of the program. With the RTS patch, the first foo(2) loop terminates upon the signal, but the thread 'tid' continues to be 'BlockedOnOther', and "Thread was able to catch exception" is never printed. If we use fooHs instead of foo, we see the expected behavior where the loop is terminated, the exception caught, and the message printed (eventually). Tomorrow, I plan on looking more closely at how we might resume the thread corresponding to 'tid'; however, it does seem like something of a dangerous proposition given that the worker thread was unceremoniously terminated, so none of the thunks actually got evaluated. Cheers, Edward P.S. I can post real diffs if other people are interested in replicating.

On 26/08/2010 08:10, Edward Z. Yang wrote:
Here is a possible implementation:
Task *task = NULL; blockedThrowTo(cap,target,msg); if (target->bound) { // maybe not supposed to kill bound threads, but it // seems to work ok (as long as they don't want to try // to recover!) task = target->bound->task; } else { // walk all_tasks to find the correct worker thread for (task = all_tasks; task != NULL; task = task->all_link) { if (task->incall->suspended_tso == target) { break; } } } if (task != NULL) { pthread_cancel(task->id); // cargo cult cargo cult... task->cap = NULL; task->stopped = rtsTrue; }
You don't want to do this for a bound thread (when target->bound != NULL), because the OS thread will have interesting things on its C stack and pthread_cancel discards the entire stack. A worker thread on the other hand has an uninteresting stack and we can easily make another one.
This is quite good at causing the C computation to terminate, but not so good at letting the Task that requested the FFI call that it can wake up now. In particular, consider the following code (using the interruptible function defined earlier):
foreign import ccall "foo.h" foo :: CInt -> IO ()
fooHs n = do putStrLn $ "Arf " ++ show n threadDelay 1000000 fooHs n
main = main' 2
main' 0 = putStrLn "Quitting" main' n = do tid<- newEmptyMVar interruptible () $ do putMVar tid =<< myThreadId (r :: Either E.AsyncException ())<- E.try $ foo n putStrLn "Thread was able to catch exception" print =<< readMVar tid print =<< threadStatus =<< readMVar tid putStrLn "----" main' (pred n)
with foo.h/foo.c something like:
void foo(int d) { while (1) { printf("Arf %d\n", d); sleep(1); } }
Without the RTS patch, the first foo(2) loop continues even after interrupting (and resuming the primary execution of the program. With the RTS patch, the first foo(2) loop terminates upon the signal, but the thread 'tid' continues to be 'BlockedOnOther', and "Thread was able to catch exception" is never printed. If we use fooHs instead of foo, we see the expected behavior where the loop is terminated, the exception caught, and the message printed (eventually).
Tomorrow, I plan on looking more closely at how we might resume the thread corresponding to 'tid'; however, it does seem like something of a dangerous proposition given that the worker thread was unceremoniously terminated, so none of the thunks actually got evaluated.
So you don't want to do blockedThrowTo, instead call raiseAsync to raise the exception, and that should put the TSO back on the the run queue. Cheers, Simon

Excerpts from Simon Marlow's message of Thu Aug 26 04:08:06 -0400 2010:
You don't want to do this for a bound thread (when target->bound != NULL), because the OS thread will have interesting things on its C stack and pthread_cancel discards the entire stack. A worker thread on the other hand has an uninteresting stack and we can easily make another one.
It seems possible that under certain (limited) circumstances, this would be desirable behavior: for example, if we truly wanted to destroy the bound thread-local state and start over from scratch.
So you don't want to do blockedThrowTo, instead call raiseAsync to raise the exception, and that should put the TSO back on the the run queue.
With: raiseAsync(cap, target, msg->exception, rtsFalse, NULL) // .... return THROWTO_SUCCESS; the thread is successfully able to catch the exception! case BlockedOnCCall: case BlockedOnCCall_NoUnblockExc: { #ifdef THREADED_RTS Task *task = NULL; raiseAsync(cap, target, msg->exception, rtsFalse, NULL); if (!target->bound) { // walk all_tasks to find the correct worker thread for (task = all_tasks; task != NULL; task = task->all_link) { if (task->incall->suspended_tso == target) { break; } } } if (task != NULL) { pthread_cancel(task->id); task->cap = NULL; task->stopped = rtsTrue; } return THROWTO_SUCCESS; #else blockedThrowTo(cap,target,msg); return THROWTO_BLOCKED; #endif } Here is a new (working) implementation interruptible: interruptible :: a -> IO a -> IO a interruptible defaultVal m = do mresult <- newEmptyMVar -- transfer exception to caller mtid <- newEmptyMVar let install = do installIntHandler (Catch ctrlc) cleanup oldHandler = do _ <- installIntHandler oldHandler return () ctrlc = do hPutStrLn stderr "Caught signal" tid <- readMVar mtid throwTo tid E.UserInterrupt bracket = reportBracket . E.bracket install cleanup . const reportBracket action = do putMVar mresult =<< E.catches (liftM Right action) [ E.Handler (\(e :: E.AsyncException) -> return $ case e of E.UserInterrupt -> Right defaultVal _ -> Left (E.toException e) ) , E.Handler (\(e :: E.SomeException) -> return (Left e)) ] putMVar mtid =<< forkIO (bracket m) either E.throw return =<< readMVar mresult -- one write only Do you have any suggestions for stress-testing this code? Cheers, Edward

Ahem, the logic in that last iteration was not quite correct. Here is the more correct version: case BlockedOnCCall: case BlockedOnCCall_NoUnblockExc: { #ifdef THREADED_RTS Task *task = NULL; if (!target->bound) { // walk all_tasks to find the correct worker thread for (task = all_tasks; task != NULL; task = task->all_link) { if (task->incall->suspended_tso == target) { break; } } if (task != NULL) { raiseAsync(cap, target, msg->exception, rtsFalse, NULL); pthread_cancel(task->id); task->cap = NULL; task->stopped = rtsTrue; return THROWTO_SUCCESS; } } #endif blockedThrowTo(cap,target,msg); return THROWTO_BLOCKED; } Is a lock necessary to walk all_tasks? Cheers, Edward

On 26/08/2010 18:20, Edward Z. Yang wrote:
Ahem, the logic in that last iteration was not quite correct. Here is the more correct version:
case BlockedOnCCall: case BlockedOnCCall_NoUnblockExc: { #ifdef THREADED_RTS Task *task = NULL; if (!target->bound) { // walk all_tasks to find the correct worker thread for (task = all_tasks; task != NULL; task = task->all_link) { if (task->incall->suspended_tso == target) { break; } } if (task != NULL) { raiseAsync(cap, target, msg->exception, rtsFalse, NULL); pthread_cancel(task->id); task->cap = NULL; task->stopped = rtsTrue; return THROWTO_SUCCESS; } } #endif blockedThrowTo(cap,target,msg); return THROWTO_BLOCKED; }
Is a lock necessary to walk all_tasks?
You should walk cap->suspended_ccalls instead, no lock is required for that. For stress testing, you want to construct an example that has lots of threads making foreign cals and other threads calling throwTo to interrupt them. So this is a proof of concept, and it seems to work - great! If we're going to do this for real, then there's a few more things we need: - we should probably annotate foreign calls with "interruptible" if they can be interrupted. That entails some changes to GHC, and to the way foreign calls get compiled: we'll need to pass an extra flag to suspendThread(). - the Task that has been cancelled needs to clean itself up - can we do this on Windows at all? It woud be even more useful on Windows where blocking I/O is done by foreign calls, and is currently non-interruptible. - bound threads: we can't cancel a bound thread, because then there's no way to return to the caller (a bound thread results from a call to a Haskell function from C). This makes the programming model slightly unpleasant, because a foreign call will only be interruptble when called in certan contexts, but I don't know what to do about that. Cheers, Simon

Excerpts from Simon Marlow's message of Fri Aug 27 04:05:46 -0400 2010:
You should walk cap->suspended_ccalls instead, no lock is required for that.
For stress testing, you want to construct an example that has lots of threads making foreign cals and other threads calling throwTo to interrupt them.
Will do.
So this is a proof of concept, and it seems to work - great!
This approach of killing threads unceremoniously also seems to have garnered a lot of bad juju in other contexts (Java, for example, lets you terminate threads, but the function that does so is deprecated, since guaranteeing that a thread cleaned up properly in a stateful environment is really, really hard.) Maybe we should just use pthread_kill() to send a signal to the thread.
If we're going to do this for real, then there's a few more things we need:
- we should probably annotate foreign calls with "interruptible" if they can be interrupted. That entails some changes to GHC, and to the way foreign calls get compiled: we'll need to pass an extra flag to suspendThread().
- the Task that has been cancelled needs to clean itself up
Sure.
- can we do this on Windows at all? It woud be even more useful on Windows where blocking I/O is done by foreign calls, and is currently non-interruptible.
We can do this on Windows, although the current state-of-the art in pthreads emulation is basically nicely asking the thead to terminate with a signal, and then forcibly suspending the thread and scribbling over %eip to point to some code that exits the thread. I'm not sure how this interacts with long-running syscalls...
- bound threads: we can't cancel a bound thread, because then there's no way to return to the caller (a bound thread results from a call to a Haskell function from C). This makes the programming model slightly unpleasant, because a foreign call will only be interruptble when called in certan contexts, but I don't know what to do about that.
It seems to me that the obvious thing to do is only allow bound FFI calls to run on bound threads. What goes wrong with this approach? Is the waste of threads too severe? Cheers, Edward

On 28/08/2010 07:45, Edward Z. Yang wrote:
Excerpts from Simon Marlow's message of Fri Aug 27 04:05:46 -0400 2010:
You should walk cap->suspended_ccalls instead, no lock is required for that.
For stress testing, you want to construct an example that has lots of threads making foreign cals and other threads calling throwTo to interrupt them.
Will do.
So this is a proof of concept, and it seems to work - great!
This approach of killing threads unceremoniously also seems to have garnered a lot of bad juju in other contexts (Java, for example, lets you terminate threads, but the function that does so is deprecated, since guaranteeing that a thread cleaned up properly in a stateful environment is really, really hard.) Maybe we should just use pthread_kill() to send a signal to the thread.
I think the idea of annotating interruptible calls should be good enough. Simple blocking system calls like "read" can all be annotated as interruptible without any problems. Also, pthread_cancel() provides ways to control when cancellation can occur - a thread can say whether it allows cancels at any time or only at cancel points, so that will allow critical sections to be protected, and allow more complicated foreign calls to be made interruptible too.
- bound threads: we can't cancel a bound thread, because then there's no way to return to the caller (a bound thread results from a call to a Haskell function from C). This makes the programming model slightly unpleasant, because a foreign call will only be interruptble when called in certan contexts, but I don't know what to do about that.
It seems to me that the obvious thing to do is only allow bound FFI calls to run on bound threads. What goes wrong with this approach? Is the waste of threads too severe?
Not sure what you mean here: what's a bound FFI call? Cheers, Simon

Excerpts from Simon Marlow's message of Tue Aug 31 05:02:13 -0400 2010:
I think the idea of annotating interruptible calls should be good enough. Simple blocking system calls like "read" can all be annotated as interruptible without any problems. Also, pthread_cancel() provides ways to control when cancellation can occur - a thread can say whether it allows cancels at any time or only at cancel points, so that will allow critical sections to be protected, and allow more complicated foreign calls to be made interruptible too.
Gotcha.
It seems to me that the obvious thing to do is only allow bound FFI calls to run on bound threads. What goes wrong with this approach? Is the waste of threads too severe?
Not sure what you mean here: what's a bound FFI call?
Good point: we don’t distinguish between FFI calls that require thread local state and which ones don’t: this might be a good thing to allow annotating. If we did know, then we could simply arrange for calls that use thread-local state to run on those threads, and we would still be able to farm out other FFI calls as necessary. A technical question about cleaning up task: when I run freeTask on the task, I get the following error: Foo: internal error: invalid closure, info=0xb76fb418 (GHC version 6.13.20100823 for i386_unknown_linux) freeTask is only used from freeTaskManager, so I suppose it’s not quite the right thing to do, however, as far as I can tell GHC doesn’t have a current story for freeing tasks. How should I proceed in figuring out the cause of this error? Cheers, Edward

On 01/09/2010 04:22, Edward Z. Yang wrote:
Not sure what you mean here: what's a bound FFI call?
Good point: we don’t distinguish between FFI calls that require thread local state and which ones don’t: this might be a good thing to allow annotating. If we did know, then we could simply arrange for calls that use thread-local state to run on those threads, and we would still be able to farm out other FFI calls as necessary.
Alternatively, "interruptible" could mean "does not use thread-local state", which makes sense because in order to interrupt a call we have to run it with a disposable thread. I'm not sure about the mechanism for making a call in another OS thread, though. It might be tricky to implement, because you have to arrange to communicate the result somehow.
A technical question about cleaning up task: when I run freeTask on the task, I get the following error:
Foo: internal error: invalid closure, info=0xb76fb418 (GHC version 6.13.20100823 for i386_unknown_linux)
freeTask is only used from freeTaskManager, so I suppose it’s not quite the right thing to do, however, as far as I can tell GHC doesn’t have a current story for freeing tasks. How should I proceed in figuring out the cause of this error?
Right, we don't currently free the Task structure until the end, because it caches some timing stats. This might be something we want to clean up in the future. For now, it would be polite to call workerTaskStop() at least for the cancelled Task. Cheers, Simon

I cooked up a Darcs patch implementing the new language keyword 'interruptible' sans tests, Windows support and avoiding executing interruptible calls on bound worker threads. However, being a Darcs newbie I ended up sending the patch to cvs-ghc, not this list. Let me know if you'd like me to explicitly repost it here. Cheers, Edward
participants (3)
-
Corey O'Connor
-
Edward Z. Yang
-
Simon Marlow