SIGALRM, SIGVTALRM, and third party libraries

I spent some time this morning trying to use HDBC-mysql to talk to a database. It uses the C mysql bindings, which talks over a blocking socket to the database server. Not surprisingly, it fails reliably when the thread it's running in is hit by an RTS-initiated alarm signal. I've managed to make the code *appear* to work successfully with some hacking: - Run in a bound thread - Use pthread_sigmask to temporarily block SIGALRM and SIGVTALRM to *only* this thread while issuing the sensitive native calls - Profit? What I am wondering is if there's a practical downside to doing this. Am I going to accidentally kill something? This is a very important gap in the usability of GHC with native libraries, and if this approach actually turns out to be safe in practice, that would be wonderful.

Excerpts from Bryan O'Sullivan's message of Fri Sep 03 17:00:03 -0400 2010:
What I am wondering is if there's a practical downside to doing this. Am I going to accidentally kill something? This is a very important gap in the usability of GHC with native libraries, and if this approach actually turns out to be safe in practice, that would be wonderful.
I think the primary downside is that it's not portable (yet) to Windows. Simon Marlow and I have been working on "interruptible FFI calls", and one of the things that needs to be addressed along the way is that the RTS should publish "portable" equivalents of the pthread functions which are blessed for foreign libraries to use for this purpose. Maybe we should emulate threading functionality at the pthreads layer? Cheers, Edward P.S. I assume that the mysql C bindings are poorly written so as not to work with alarm signals?

On Fri, Sep 3, 2010 at 2:13 PM, Edward Z. Yang
I think the primary downside is that it's not portable (yet) to Windows.
That's fine, assuming that blocking those signals doesn't cause some more catastrophic failure. My current narrow need is for code that works on the platforms I actually use :-) Simon
Marlow and I have been working on "interruptible FFI calls", and one of the things that needs to be addressed along the way is that the RTS should publish "portable" equivalents of the pthread functions which are blessed for foreign libraries to use for this purpose. Maybe we should emulate threading functionality at the pthreads layer?
I don't know. I imagine you've Googled and found this? http://locklessinc.com/articles/pthreads_on_windows/
P.S. I assume that the mysql C bindings are poorly written so as not to work with alarm signals?
It's a very rare library that handles an errno of EINTR after a system call properly. The mysql client is no exception.

On 03/09/2010 22:13, Edward Z. Yang wrote:
Excerpts from Bryan O'Sullivan's message of Fri Sep 03 17:00:03 -0400 2010:
What I am wondering is if there's a practical downside to doing this. Am I going to accidentally kill something? This is a very important gap in the usability of GHC with native libraries, and if this approach actually turns out to be safe in practice, that would be wonderful.
I think the primary downside is that it's not portable (yet) to Windows. Simon Marlow and I have been working on "interruptible FFI calls", and one of the things that needs to be addressed along the way is that the RTS should publish "portable" equivalents of the pthread functions which are blessed for foreign libraries to use for this purpose. Maybe we should emulate threading functionality at the pthreads layer?
I don't think there's a problem here. Windows doesn't have EINTR, so foreign libraries aren't affected by our timer signals. What did you have in mind with respect to "portable equivalents of pthread functions"? I'm not sure we need to do anything along these lines at all, and I'd much rather we didn't enforce any threading abstraction on foreign clients. Cheers, Simon

Excerpts from Simon Marlow's message of Mon Sep 06 05:57:59 -0400 2010:
What did you have in mind with respect to "portable equivalents of pthread functions"? I'm not sure we need to do anything along these lines at all, and I'd much rather we didn't enforce any threading abstraction on foreign clients.
My thought here is that we want interruptible FFI code to be able to say when it’s entering critical sections in a platform independent way, and if it uses pthread functions to this effect, it is then tied to POSIX. Something more portable would be for the program to tie itself to our OS threading library OSThreads.c Cheers, Edward

On 06/09/10 19:16, Edward Z. Yang wrote:
Excerpts from Simon Marlow's message of Mon Sep 06 05:57:59 -0400 2010:
What did you have in mind with respect to "portable equivalents of pthread functions"? I'm not sure we need to do anything along these lines at all, and I'd much rather we didn't enforce any threading abstraction on foreign clients.
My thought here is that we want interruptible FFI code to be able to say when it’s entering critical sections in a platform independent way, and if it uses pthread functions to this effect, it is then tied to POSIX. Something more portable would be for the program to tie itself to our OS threading library OSThreads.c
Maybe. As a first step I think we could just document what happens when a call is interrupted (pthread_cancel() on POSIX, ??? on Windows) and let the user handle it. Is there even a good lowest-common-denominator that we can build an API on top of? Cheers, Simon

Excerpts from Simon Marlow's message of Wed Sep 08 03:40:42 -0400 2010:
Maybe. As a first step I think we could just document what happens when a call is interrupted (pthread_cancel() on POSIX, ??? on Windows) and let the user handle it. Is there even a good lowest-common-denominator that we can build an API on top of?
I've been thinking carefully about this, and I kind of suspect one-size fits all won't work here. I've done a writeup here; one of the problems with moving pthread_cancel to Windows is that its semantics are so complicated. http://blog.ezyang.com/2010/09/pthread-cancel-on-window/ Cheers, Edward

On 08/09/2010 15:57, Edward Z. Yang wrote:
Excerpts from Simon Marlow's message of Wed Sep 08 03:40:42 -0400 2010:
Maybe. As a first step I think we could just document what happens when a call is interrupted (pthread_cancel() on POSIX, ??? on Windows) and let the user handle it. Is there even a good lowest-common-denominator that we can build an API on top of?
I've been thinking carefully about this, and I kind of suspect one-size fits all won't work here. I've done a writeup here; one of the problems with moving pthread_cancel to Windows is that its semantics are so complicated.
I don't think porting pthreads to Windows is the right way to handle this anyway, Windows programmers want to use Windows APIs. I suggest that we use CancelSynchronousIO if it is available, and otherwise do nothing (this means a dynamic binding which is a bit fiddly, but I think we already do this elsewhere). TerminateThread is out of the question, because it provides no way to block it or clean up. CancelSynchronousIO will let us interrupt threads blocked on I/O on Windows, which we can't currently do, and it works for both bound and unbound threads. On POSIX we can pthread_kill() bound threads. This will let us handle at least one important case I can think of: waitForProcess. We just have to find an appropriate signal to use - we can't use SIGVTALRM, because it is already set to SA_RESTART. Cheers, Simon

On 03/09/2010 22:00, Bryan O'Sullivan wrote:
I spent some time this morning trying to use HDBC-mysql to talk to a database. It uses the C mysql bindings, which talks over a blocking socket to the database server. Not surprisingly, it fails reliably when the thread it's running in is hit by an RTS-initiated alarm signal.
I've managed to make the code *appear* to work successfully with some hacking:
* Run in a bound thread * Use pthread_sigmask to temporarily block SIGALRM and SIGVTALRM to *only* this thread while issuing the sensitive native calls * Profit?
What I am wondering is if there's a practical downside to doing this. Am I going to accidentally kill something? This is a very important gap in the usability of GHC with native libraries, and if this approach actually turns out to be safe in practice, that would be wonderful.
I think that should be fine - there should always be a worker thread in the system available to handle the signal. You could probably block those signals permanently for that thread. But why does the failure occur in the first place? Is the library not handling EINTR? Cheers, Simon

On Mon, Sep 6, 2010 at 2:53 AM, Simon Marlow
I think that should be fine - there should always be a worker thread in the system available to handle the signal. You could probably block those signals permanently for that thread.
Good to know, thanks.
But why does the failure occur in the first place? Is the library not handling EINTR?
That's right. There's unfortunately a ton of library code out there that was written by people who don't know when EINTR can bite, and the mysql client library happens to be the most prominent one that affects the Haskell world.

Hi!
On Tue, Sep 7, 2010 at 6:59 PM, Bryan O'Sullivan
That's right. There's unfortunately a ton of library code out there that was written by people who don't know when EINTR can bite, and the mysql client library happens to be the most prominent one that affects the Haskell world.
Hm, write a patch for mysql libs then and send it upstream? Mitar

Quoth "Bryan O'Sullivan"
That's right. There's unfortunately a ton of library code out there that was written by people who don't know when EINTR can bite, and the mysql client library happens to be the most prominent one that affects the Haskell world.
I wouldn't bet my life all the failures that come with the runtime timer signals are due to code that erroneously neglects to handle EINTR. We had someone here with a weird problem on some version of Solaris, where cabal aborted in hGetContents, on a pipe, with error return EINTR (as seen in a system trace.) No timer signals (GHCRTS=-v0), problem solved. I don't know what the code looks like there, but I'm pretty sure cabal has worked for others, and as far as I know no one has a clue what happened there. I haven't been able to track down my own problem with timer signals, in code that runs native GUI libraries on the Haiku operating system. The obvious thing (read message port) does retry on interrupt, but who knows what else ... waiting on a semaphore, something like that? Normal applications on Haiku, needless to say, are not bombarded with signals in this manner, so ... well, GHCRTS=-v0, problem solved. Maybe over time, Haskell programmers will clean up all those problems in the foreign code they want to run. Donn Cave, donn@avvanta.com
participants (5)
-
Bryan O'Sullivan
-
Donn Cave
-
Edward Z. Yang
-
Mitar
-
Simon Marlow