Odd behavior of ncurses with -threaded

Hello, the following program should wait 3 seconds for user input before. If now user input occurs within that time, it just prints -1. {-# LANGUAGE ForeignFunctionInterface #-} module Main where import Foreign import Foreign.C.String import Foreign.C.Types import Foreign.C.Error foreign import ccall unsafe initscr :: IO (Ptr ()) foreign import ccall unsafe endwin :: IO CInt foreign import ccall unsafe getch :: IO CInt foreign import ccall unsafe timeout :: CInt -> IO () main = do initscr timeout 3000 c <- getch endwin print c This works just fine if I do not use the threaded RTS, say: ghc --make -lcurses Main.hs However, with ghc --make -threaded -lcurses Main.hs it prints -1 immediately without awaiting the 3 seconds. Is that considered a bug? Should I open a ticket? ghc: 6.12.1 linux: 2.6.32 ncurses: 5.7 Cheers, Simon

Quoth Simon Hengel
This works just fine if I do not use the threaded RTS, say:
ghc --make -lcurses Main.hs
However, with
ghc --make -threaded -lcurses Main.hs
I bet, if you switch off the barrage of thread scheduling SIGALRMs, +RTS -V0 -RTS , it will work like it's supposed to. A very casual scrutiny of an ncurses source I have at hand shows no EINTR handling on select() in lib_getch.c. That could be the problem, or something like it - select() is interrupted by GHC's SIGALRM, aborting the timeout. Or they could both be using SIGALRM, and GHC's signal is mistaken for the application library's, but I don't see any sign of that and select() is a more obvious way to do the timeout anyway.
it prints -1 immediately without awaiting the 3 seconds.
Is that considered a bug? Should I open a ticket?
ghc: 6.12.1 linux: 2.6.32 ncurses: 5.7
Someone probably should. It's tempting to conclude that the bug is in ncurses, and perhaps it is, and maybe there's nothing to be done about it anyway, but a language runtime that's spewing SIGALRMs is going to crash into this kind of thing a lot. I could comment a couple of other cases to the ticket that I've seen here or on Haskell-cafe. Donn Cave, donn@avvanta.com

Quoth Simon Hengel
ghc --make -threaded -lcurses Main.hs
I bet, if you switch off the barrage of thread scheduling SIGALRMs, +RTS -V0 -RTS , it will work like it's supposed to.
That helped, thanks!
Great, but, beware - I am not fully informed on what it does, beyond the SIGALRM difference. I run my own application this way without apparent serious harm, but it could be failing to reclaim memory for example, due to missed garbage collections. Donn Cave, donn@avvanta.com

Excerpts from Donn Cave's message of Thu Nov 11 17:07:20 -0500 2010:
ghc: 6.12.1 linux: 2.6.32 ncurses: 5.7
Someone probably should. It's tempting to conclude that the bug is in ncurses, and perhaps it is, and maybe there's nothing to be done about it anyway, but a language runtime that's spewing SIGALRMs is going to crash into this kind of thing a lot. I could comment a couple of other cases to the ticket that I've seen here or on Haskell-cafe.
I was under the impression we fixed this: http://hackage.haskell.org/trac/ghc/ticket/850 That is, we should be using SIGVTALRM, not SIGALRM, these days, except under certain conditions when your operating system doesn't support the proper timer (which Linux most assuredly does). Edward

On Thu, Nov 11, 2010 at 2:32 PM, Edward Z. Yang
I was under the impression we fixed this:
http://hackage.haskell.org/trac/ghc/ticket/850
That is, we should be using SIGVTALRM, not SIGALRM, these days, except under certain conditions when your operating system doesn't support the proper timer (which Linux most assuredly does).
It's still a pretty common problem with all kinds of libraries. http://www.serpentine.com/blog/2010/09/04/dealing-with-fragile-c-libraries-e...

On 11/11/2010 22:41, Bryan O'Sullivan wrote:
On Thu, Nov 11, 2010 at 2:32 PM, Edward Z. Yang
mailto:ezyang@mit.edu> wrote: I was under the impression we fixed this:
http://hackage.haskell.org/trac/ghc/ticket/850
That is, we should be using SIGVTALRM, not SIGALRM, these days, except under certain conditions when your operating system doesn't support the proper timer (which Linux most assuredly does).
It's still a pretty common problem with all kinds of libraries.
http://www.serpentine.com/blog/2010/09/04/dealing-with-fragile-c-libraries-e...
Is there anything that we could do in GHC to improve the situation? I suppose we could have a dedicated OS thread who's job it was to sit around and run the signal handler every Nth of a second. Cheers, Simon

On Fri, Nov 12, 2010 at 8:07 AM, Simon Marlow
Is there anything that we could do in GHC to improve the situation? I suppose we could have a dedicated OS thread who's job it was to sit around and run the signal handler every Nth of a second.
Maybe that would work, or masking out RTS signals before calling potentially blocking foreign code (which is all my hack does). Another option would be to include my hack in the Foreign or GHC.* hierarchy somewhere in base. It's clear that a large number of C programmers know nothing about restarting system calls, and authors writing FFI code typically can't do anything about that except make sure those system calls don't get interrupted in the first place.

Quoth Simon Marlow
Is there anything that we could do in GHC to improve the situation? I suppose we could have a dedicated OS thread who's job it was to sit around and run the signal handler every Nth of a second.
Since the -threaded RTS automatically spawns a couple of extra
OS threads anyway, what's one more?
Quoth Brian Sullivan
or masking out RTS signals before calling potentially blocking foreign code (which is all my hack does)
And unmasking on re-entry via callback, I suppose.
It's clear that a large number of C programmers know nothing about restarting system calls, and authors writing FFI code typically can't do anything about that except make sure those system calls don't get interrupted in the first place.
Note that one of the reported victims was "cabal", where I think it was a getContents that aborted with EINTR, on OpenSolaris. I have no idea what was going on there. Nor do I have any idea why my platform libraries are vulnerable to this signal - maybe it's just the usual system call, but I'm not on UNIX. In the present case, ncurses is probably as old as some of the parties to this discussion, and you'd have to wonder if after all this time it doesn't restart its select() on EINTR, if it isn't because that's actually how they want it to work! I'm not going to defend that proposition, but you're right, very low odds that the external world will be fixed to support timer signals. The SIGVTALRM fix solves the problem of an application that uses SIGALRM in its own timer. Donn Cave, donn@avvanta.com

On 12/11/2010 17:25, Donn Cave wrote:
Quoth Simon Marlow
, ... Is there anything that we could do in GHC to improve the situation? I suppose we could have a dedicated OS thread who's job it was to sit around and run the signal handler every Nth of a second.
Since the -threaded RTS automatically spawns a couple of extra OS threads anyway, what's one more?
I don't know - maybe it wouldn't be a problem. But we'd have to measure things to make sure the extra thread wasn't impacting performance somehow (e.g. by confusing the OS scheduler).
Quoth Brian Sullivan
, or masking out RTS signals before calling potentially blocking foreign code (which is all my hack does)
And unmasking on re-entry via callback, I suppose.
Right - making a system call for every safe foreign call and return/callback probalby would make a difference.
It's clear that a large number of C programmers know nothing about restarting system calls, and authors writing FFI code typically can't do anything about that except make sure those system calls don't get interrupted in the first place.
Note that one of the reported victims was "cabal", where I think it was a getContents that aborted with EINTR, on OpenSolaris. I have no idea what was going on there.
Yes, we still don't know what the problem is there. Cheers, Simon
Nor do I have any idea why my platform libraries are vulnerable to this signal - maybe it's just the usual system call, but I'm not on UNIX. In the present case, ncurses is probably as old as some of the parties to this discussion, and you'd have to wonder if after all this time it doesn't restart its select() on EINTR, if it isn't because that's actually how they want it to work! I'm not going to defend that proposition, but you're right, very low odds that the external world will be fixed to support timer signals.
The SIGVTALRM fix solves the problem of an application that uses SIGALRM in its own timer.
Donn Cave, donn@avvanta.com
participants (5)
-
Bryan O'Sullivan
-
Donn Cave
-
Edward Z. Yang
-
Simon Hengel
-
Simon Marlow