forkProcess, forkIO, and multithreaded runtime

Hi all, I think I have a misunderstanding of how forkProcess should be working. Ultimately this relates to some bugs in the development version of keter, but I've found some behavior in a simple test program which I wouldn't have expected either, which may or may not be related. With the program at the end of this email, I would expect that, once per second, I would get a message printed from each forkIO'd green thread, the forked process, and the master process. And if I spawn 8 or less child threads that's precisely what happens. However, as soon as I up that number to 9, the child process is never run. The process is, however, created, as can be confirmed by looking at the process table. This only occurs when using the multithreaded runtime. In other words, compiling with "ghc --make test.hs" seems to always produce the expected output, whereas "ghc --make -threaded test.hs" causes the behavior described above. Having looked through the code for the process package a bit, my initial guess is that this is being caused by a signal being sent to the child process, but I'm not familiar enough with the inner workings to confirm or disprove this guess. If anyone has any ideas on this, I'd appreciate it. Michael import System.Posix.Process (forkProcess, getProcessID) import Control.Concurrent (forkIO, threadDelay) import System.IO (hFlush, stdout) import System.Posix.Signals (signalProcess, sigKILL) import Control.Exception (finally) main :: IO () main = do mapM_ spawnChild [1..9] child <- forkProcess $ do putStrLn "starting child" hFlush stdout loop "child" 0 print ("child pid", child) hFlush stdout -- I've commented out the "finally" so that the zombie process stays alive, -- to prove that it was actually created. loop "parent" 0 -- `finally` signalProcess sigKILL child spawnChild :: Int -> IO () spawnChild i = do _ <- forkIO $ loop ("spawnChild " ++ show i) 0 return () loop :: String -> Int -> IO () loop msg i = do pid <- getProcessID print (pid, msg, i) hFlush stdout threadDelay 1000000 loop msg (i + 1)

Mon, Oct 15, 2012 at 09:47:35AM +0200, Michael Snoyman wrote
Hi all,
I think I have a misunderstanding of how forkProcess should be working. Ultimately this relates to some bugs in the development version of keter, but I've found some behavior in a simple test program which I wouldn't have expected either, which may or may not be related.
With the program at the end of this email, I would expect that, once per second, I would get a message printed from each forkIO'd green thread, the forked process, and the master process. And if I spawn 8 or less child threads that's precisely what happens. However, as soon as I up that number to 9, the child process is never run. The process is, however, created, as can be confirmed by looking at the process table.
This only occurs when using the multithreaded runtime. In other words, compiling with "ghc --make test.hs" seems to always produce the expected output, whereas "ghc --make -threaded test.hs" causes the behavior described above. Having looked through the code for the process package a bit, my initial guess is that this is being caused by a signal being sent to the child process, but I'm not familiar enough with the inner workings to confirm or disprove this guess.
If anyone has any ideas on this, I'd appreciate it.
Michael
import System.Posix.Process (forkProcess, getProcessID) import Control.Concurrent (forkIO, threadDelay) import System.IO (hFlush, stdout) import System.Posix.Signals (signalProcess, sigKILL) import Control.Exception (finally)
main :: IO () main = do mapM_ spawnChild [1..9] child <- forkProcess $ do putStrLn "starting child" hFlush stdout loop "child" 0 print ("child pid", child) hFlush stdout
-- I've commented out the "finally" so that the zombie process stays alive, -- to prove that it was actually created. loop "parent" 0 -- `finally` signalProcess sigKILL child
spawnChild :: Int -> IO () spawnChild i = do _ <- forkIO $ loop ("spawnChild " ++ show i) 0 return ()
loop :: String -> Int -> IO () loop msg i = do pid <- getProcessID print (pid, msg, i) hFlush stdout threadDelay 1000000 loop msg (i + 1)
Sounds like a bug and is not reproducing on ghc-7.6 -- Alexander Vershilov

On 15/10/2012 09:47, Michael Snoyman wrote:
With the program at the end of this email, I would expect that, once per second, I would get a message printed from each forkIO'd green thread, the forked process, and the master process. And if I spawn 8 or less child threads that's precisely what happens. However, as soon as I up that number to 9, the child process is never run. The process is, however, created, as can be confirmed by looking at the process table.
FWIW I can reproduce this problem. The strace in both cases is very
different. In the non threaded runtime, everything works as you expect:
7519 select(2, [], [1], NULL, {0, 0}) = 1 (out [1], left {0, 0})
7519 write(1, "starting child\n", 15) = 15
7519 select(2, [], [1], NULL, {0, 0}) = 1 (out [1], left {0, 0})
7519 write(1, "(7519,\"child\",0)\n", 17) = 17
7519 timer_settime(0x3, 0, {it_interval={0, 0}, it_value={0, 0}}, NULL) = 0
7519 select(0, [], [], NULL, {0, 999994}

Michael,
Having looked through the code for the process package a bit, my initial guess is that this is being caused by a signal being sent to the child process, but I'm not familiar enough with the inner workings to confirm or disprove this guess.
To remove that comment for finally, you need the following: _ <- installHandler sigCHLD Ignore Nothing Put this line into the beginning of the main. And import necessary things from System.Posix.Signals. --Kazu

Michael Snoyman wrote:
I think I have a misunderstanding of how forkProcess should be working. Ultimately this relates to some bugs in the development version of keter, but I've found some behavior in a simple test program which I wouldn't have expected either, which may or may not be related.
With the program at the end of this email, I would expect that, once per second, I would get a message printed from each forkIO'd green thread, the forked process, and the master process. And if I spawn 8 or less child threads that's precisely what happens. However, as soon as I up that number to 9, the child process is never run. The process is, however, created, as can be confirmed by looking at the process table.
This only occurs when using the multithreaded runtime. In other words, compiling with "ghc --make test.hs" seems to always produce the expected output, whereas "ghc --make -threaded test.hs" causes the behavior described above. Having looked through the code for the process package a bit, my initial guess is that this is being caused by a signal being sent to the child process, but I'm not familiar enough with the inner workings to confirm or disprove this guess.
If anyone has any ideas on this, I'd appreciate it.
While I'm not reproducing that behavior here with your test case and 7.4.1, I recently converted a large program to use -threaded (because I needed to use yesod in it, actually :), and had large quantities of pain involving forkProcess. It seemed to come down to this easily overlooked note in the docs: forkProcess comes with a giant warning: since any other running threads are not copied into the child process, it's easy to go wrong: e.g. by accessing some shared resource that was held by another thread in the parent. In my experience, forkProcess often behaves incomprehensibly (to me) with -threaded, typically resulting in a forked process hanging, and quite often only some small percentage of the time, which makes it really hard to track down and try to diagnose what thunk might not be getting forced until after the fork, or whatever. I did some analysis and produced a test case for problems caused by use of forkProcess in parts of MissingH, here: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=681621 My understanding is that System.Process avoids these problems by doing all the setup around forking a command in C code. I've banished forkProcess from my code base entirely, except for a double fork I need to daemonize, and I don't even trust that call. :/ -- see shy jo

On Mon, Oct 15, 2012 at 12:30 PM, Joey Hess
forkProcess comes with a giant warning: since any other running threads are not copied into the child process, it's easy to go wrong: e.g. by accessing some shared resource that was held by another thread in the parent.
In my experience, forkProcess often behaves incomprehensibly (to me) with -threaded, typically resulting in a forked process hanging, and quite often only some small percentage of the time, which makes it really hard to track down and try to diagnose what thunk might not be getting forced until after the fork, or whatever.
The forkProcess MissingH discussion recently left me feeling like this is going to be triggered in all cases, because it sounds like one of the problematic threads is the I/O manager's thread? -- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix/linux, openafs, kerberos, infrastructure http://sinenomine.net

My understanding is that System.Process avoids these problems by doing all the setup around forking a command in C code. I've banished forkProcess from my code base entirely, except for a double fork I need to daemonize, and I don't even trust that call. :/
I think you are right. forkProcess is necessary to deamonize a process but I don't know other usage. daemonize :: IO () -> IO () daemonize program = ensureDetachTerminalCanWork $ do detachTerminal ensureNeverAttachTerminal $ do changeWorkingDirectory "/" void $ setFileCreationMask 0 mapM_ closeFd [stdInput, stdOutput, stdError] program where ensureDetachTerminalCanWork p = do void $ forkProcess p exitSuccess ensureNeverAttachTerminal p = do void $ forkProcess p exitSuccess detachTerminal = void createSession --Kazu

On Mon, Oct 15, 2012 at 6:30 PM, Joey Hess
I think I have a misunderstanding of how forkProcess should be working. Ultimately this relates to some bugs in the development version of keter, but I've found some behavior in a simple test program which I wouldn't have expected either, which may or may not be related.
With the program at the end of this email, I would expect that, once per second, I would get a message printed from each forkIO'd green thread,
forked process, and the master process. And if I spawn 8 or less child
that's precisely what happens. However, as soon as I up that number to 9, the child process is never run. The process is, however, created, as can be confirmed by looking at the process table.
This only occurs when using the multithreaded runtime. In other words, compiling with "ghc --make test.hs" seems to always produce the expected output, whereas "ghc --make -threaded test.hs" causes the behavior described above. Having looked through the code for the process package a bit, my initial guess is that this is being caused by a signal being sent to the child
Michael Snoyman wrote: the threads process,
but I'm not familiar enough with the inner workings to confirm or disprove this guess.
If anyone has any ideas on this, I'd appreciate it.
While I'm not reproducing that behavior here with your test case and 7.4.1, I recently converted a large program to use -threaded (because I needed to use yesod in it, actually :), and had large quantities of pain involving forkProcess. It seemed to come down to this easily overlooked note in the docs:
forkProcess comes with a giant warning: since any other running threads are not copied into the child process, it's easy to go wrong: e.g. by accessing some shared resource that was held by another thread in the parent.
In my experience, forkProcess often behaves incomprehensibly (to me) with -threaded, typically resulting in a forked process hanging, and quite often only some small percentage of the time, which makes it really hard to track down and try to diagnose what thunk might not be getting forced until after the fork, or whatever.
I did some analysis and produced a test case for problems caused by use of forkProcess in parts of MissingH, here: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=681621
My understanding is that System.Process avoids these problems by doing all the setup around forking a command in C code. I've banished forkProcess from my code base entirely, except for a double fork I need to daemonize, and I don't even trust that call. :/
Well, I tried switching my code to forking/execing from C in a very similar manner to the process package, and it seems to work. Thanks for the input everyone! Michael

On 15 October 2012 09:47, Michael Snoyman
Hi all,
I think I have a misunderstanding of how forkProcess should be working. Ultimately this relates to some bugs in the development version of keter, but I've found some behavior in a simple test program which I wouldn't have expected either, which may or may not be related.
With the program at the end of this email, I would expect that, once per second, I would get a message printed from each forkIO'd green thread, the forked process, and the master process. And if I spawn 8 or less child threads that's precisely what happens. However, as soon as I up that number to 9, the child process is never run. The process is, however, created, as can be confirmed by looking at the process table.
This only occurs when using the multithreaded runtime. In other words, compiling with "ghc --make test.hs" seems to always produce the expected output, whereas "ghc --make -threaded test.hs" causes the behavior described above. Having looked through the code for the process package a bit, my initial guess is that this is being caused by a signal being sent to the child process, but I'm not familiar enough with the inner workings to confirm or disprove this guess.
If anyone has any ideas on this, I'd appreciate it.
Not being familiar with the implementation, I'll just note that combining fork() with threads, as in the multithreaded RTS is going to have subtle bugs, no matter what. The only winning strategy is to not play that game. There are variants of this, but the meta-problem is that at the point in time when you call forkProcess, you must control all threads, ensuring that *all invariants hold*. Thus no locks held, no thread is in the C library, no foreign calls active etc. As an example, if one thread is in the C library doing some stdio, then the invariants in that library will not hold, and you cannot expect stdio to work in the child. This means that the only thing you can really do in the child process is call exec. These issues do not exist in the non-threaded world. Alexander
Michael
import System.Posix.Process (forkProcess, getProcessID) import Control.Concurrent (forkIO, threadDelay) import System.IO (hFlush, stdout) import System.Posix.Signals (signalProcess, sigKILL) import Control.Exception (finally)
main :: IO () main = do mapM_ spawnChild [1..9] child <- forkProcess $ do putStrLn "starting child" hFlush stdout loop "child" 0 print ("child pid", child) hFlush stdout
-- I've commented out the "finally" so that the zombie process stays alive, -- to prove that it was actually created. loop "parent" 0 -- `finally` signalProcess sigKILL child
spawnChild :: Int -> IO () spawnChild i = do _ <- forkIO $ loop ("spawnChild " ++ show i) 0 return ()
loop :: String -> Int -> IO () loop msg i = do pid <- getProcessID print (pid, msg, i) hFlush stdout threadDelay 1000000 loop msg (i + 1)
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Since we're talking about forkIO here - not forkOS - is it possible to control the use of OS threads to avoid this problem? As I understand it, the problem is with real OS threads. A program running entirely in multiple `green' threads will fork to the same set of threads in the same state, creating no problem with internal state. I'm guessing this is why he sees the problem at 9 threads - maybe the runtime picks up another OS thread at that point. Of course one could link the unthreaded RTS, and that will cause everything, including runtime gc etc., to run in the parent thread, true? And there are some runtime options to control number of threads scheduled. Donn

The problems with forkProcess really are not Haskell's fault. You will find warnings in the documentation for C's fork(): There are limits to what you can do in the child process. To be totally safe you should restrict yourself to only executing async-signal safe operations until such time as one of the exec functions is called. All APIs, including global data symbols, in any framework or library should be assumed to be unsafe after a fork() unless explicitly documented to be safe or async-signal safe. That's actually pretty scary. I'd always assumed that this was one of the reasons why the posix_spawn() function and its support crew were devised. Which reminds me that I expected to find posix_spawn() in System.Posix.Process but didn't. http://www.haskell.org/ghc/docs/7.4-latest/html/libraries/unix-2.5.1.1/Syste...

On Tue, 16 Oct 2012 21:55:44 +0200
Alexander Kjeldaas
There are variants of this, but the meta-problem is that at the point in time when you call forkProcess, you must control all threads, ensuring that *all invariants hold*. Thus no locks held, no thread is in the C library, no foreign calls active etc. As an example, if one thread is in the C library doing some stdio, then the invariants in that library will not hold, and you cannot expect stdio to work in the child. This means that the only thing you can really do in the child process is call exec.
Further, you can only call exec if you make sure that the exec
correctly reverts everything back to a state where those invariants
hold. Mostly, this is automatic as resources get freed on exec and "do
the right thing." Locks on file descriptors that aren't closed on exec
will leave dangling locks, and locks on files that are closed on exec
will unexpectedly close them in the parent.

On 17 October 2012 00:17, Mike Meyer
On Tue, 16 Oct 2012 21:55:44 +0200 Alexander Kjeldaas
wrote: There are variants of this, but the meta-problem is that at the point in time when you call forkProcess, you must control all threads, ensuring that *all invariants hold*. Thus no locks held, no thread is in the C library, no foreign calls active etc. As an example, if one thread is in the C library doing some stdio, then the invariants in that library will not hold, and you cannot expect stdio to work in the child. This means that the only thing you can really do in the child process is call exec.
Further, you can only call exec if you make sure that the exec correctly reverts everything back to a state where those invariants hold. Mostly, this is automatic as resources get freed on exec and "do the right thing." Locks on file descriptors that aren't closed on exec will leave dangling locks, and locks on files that are closed on exec will unexpectedly close them in the parent.
Right. It should be renamed mostlyUnsafeForkProcess, assuming the multi-threaded RTS is "mostly" the default one. Alexander
http://www.mired.org/ Independent Software developer/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Sorry to be reviving this thread so long after.... but I seem to be running into similar issues as Michael S. did at the start. In short, I'm using forkProcess with the threaded RTS, and see occasional hangs: - I see these only on Linux. On Mac OS X, I never do. - I'm using GHC 7.4.2 - I noticed the warning in the doc for forkProcess, but assumed I was safe, as I wasn't holding any shared resources at the time of the fork, and no shared resources in the program are used in the child. - WIth gdb, I've traced the hang to here in the run-time: forkProcess > discardTasksExcept > freeTask > closeMutex(&task->lock)
pthread_mutex_destroy
The discussion in this thread leaves me with these questions: - Is there reason to think the situation has gotten better in 7.6 and later? - Isn't the only reason *System.Process* is safer because it does an immediate exec in the child? Alas, I really want to just fork()sometimes. - Is it really true that even if my program has no shared resources with the child, that the IO subsystem and FFI system do anyway? Surely the RTS would take care of doing the right thing with those, no? - There should be no concern with exec w.r.t. library invariants since exec is wholesale replacement - all the libraries will reinitialize. Is there a problem here I'm missing? Alas, I've stopped using the threaded RTS until I understand this better. - Mark

I just looked at this code and since I don't know the code I can't give you
good solutions, but for others watching this thread the links might prove
interesting.
My main theory is that you do have some other thread in FFI-land while you
are fork()ing. The task->cond, task->lock seems to be related to this (see
quoted comments below).
Also, pthread_mutex_destroy is undefined if the lock is locked, so I am
guessing that the task->lock is somehow locked when it shouldn't be.
It isn't clear from your description whether this is consistently happening
on Linux, or whether this only sometimes happens.
The forkProcess() code seems to hold all capabilities during fork, but that
does not include FFI-land threads AFAIU.
Assuming that this happens only rarely, I am trying to understand what
happens if the thread that is in FFI-land returns to the RTS (in the
parent) after fork(), but before the freeTask() in the child. Based on the
descriptions I read, it seems likely that this thread will try to inspect
task->cap, which requires holding task->lock.
That would in turn make the pthread_mutex_destroy in the child invalid.
https://github.com/ghc/ghc/blob/master/rts/Task.h#L57
"""
...
When a task is migrated from sleeping on one Capability to another,
its task->cap field must be modified. When the task wakes up, it
will read the new value of task->cap to find out which Capability
it belongs to. Hence some synchronisation is required on
task->cap, and this is why we have task->lock.
If the Task is not currently owned by task->id, then the thread is
either
(a) waiting on the condition task->cond. The Task is either
(1) a bound Task, the TSO will be on a queue somewhere
(2) a worker task, on the spare_workers queue of task->cap.
...
"""
freeTask:
https://github.com/ghc/ghc/blob/master/rts/Task.c#L142
the comment in freeTask refers to this test:
https://github.com/ghc/testsuite/blob/master/tests/concurrent/should_run/con...
That test calls the RTC from C which then forkIOs off actions that are
outstanding when the RTS exits.
in forkProcess, child code
https://github.com/ghc/ghc/blob/master/rts/Schedule.c#L1837
It look like all this code supports the notion that some other thread can
be in foreign code during the fork call.
discardTasksExcept
https://github.com/ghc/ghc/blob/master/rts/Task.c#L305
Alexander
On Mon, Jan 21, 2013 at 12:15 AM, Mark Lentczner
Sorry to be reviving this thread so long after.... but I seem to be running into similar issues as Michael S. did at the start.
In short, I'm using forkProcess with the threaded RTS, and see occasional hangs:
- I see these only on Linux. On Mac OS X, I never do. - I'm using GHC 7.4.2 - I noticed the warning in the doc for forkProcess, but assumed I was safe, as I wasn't holding any shared resources at the time of the fork, and no shared resources in the program are used in the child. - WIth gdb, I've traced the hang to here in the run-time: forkProcess
discardTasksExcept > freeTask > closeMutex(&task->lock) pthread_mutex_destroy
The discussion in this thread leaves me with these questions:
- Is there reason to think the situation has gotten better in 7.6 and later? - Isn't the only reason *System.Process* is safer because it does an immediate exec in the child? Alas, I really want to just fork()sometimes. - Is it really true that even if my program has no shared resources with the child, that the IO subsystem and FFI system do anyway? Surely the RTS would take care of doing the right thing with those, no? - There should be no concern with exec w.r.t. library invariants since exec is wholesale replacement - all the libraries will reinitialize. Is there a problem here I'm missing?
Alas, I've stopped using the threaded RTS until I understand this better.
- Mark

I think you can test this theory with this patch. If a thread is waiting on the task->cond condition variable which is matched up with task->lock, then pthread_cond_destroy will return EBUSY, which must always be a bug in the RTS. Alexander diff --git a/rts/posix/OSThreads.c b/rts/posix/OSThreads.c index ae31966..0f12830 100644 --- a/rts/posix/OSThreads.c +++ b/rts/posix/OSThreads.c @@ -91,7 +91,8 @@ initCondition( Condition* pCond ) void closeCondition( Condition* pCond ) { - pthread_cond_destroy(pCond); + int ret = pthread_cond_destroy(pCond); + CHECKM(ret == 0, "RTS BUG! Someone is waiting on condvar %d.", ret); return; } On Mon, Jan 21, 2013 at 8:18 AM, Alexander Kjeldaas < alexander.kjeldaas@gmail.com> wrote:
I just looked at this code and since I don't know the code I can't give you good solutions, but for others watching this thread the links might prove interesting.
My main theory is that you do have some other thread in FFI-land while you are fork()ing. The task->cond, task->lock seems to be related to this (see quoted comments below).
Also, pthread_mutex_destroy is undefined if the lock is locked, so I am guessing that the task->lock is somehow locked when it shouldn't be.
It isn't clear from your description whether this is consistently happening on Linux, or whether this only sometimes happens.
The forkProcess() code seems to hold all capabilities during fork, but that does not include FFI-land threads AFAIU.
Assuming that this happens only rarely, I am trying to understand what happens if the thread that is in FFI-land returns to the RTS (in the parent) after fork(), but before the freeTask() in the child. Based on the descriptions I read, it seems likely that this thread will try to inspect task->cap, which requires holding task->lock.
That would in turn make the pthread_mutex_destroy in the child invalid.
https://github.com/ghc/ghc/blob/master/rts/Task.h#L57
""" ... When a task is migrated from sleeping on one Capability to another, its task->cap field must be modified. When the task wakes up, it will read the new value of task->cap to find out which Capability it belongs to. Hence some synchronisation is required on task->cap, and this is why we have task->lock.
If the Task is not currently owned by task->id, then the thread is either
(a) waiting on the condition task->cond. The Task is either (1) a bound Task, the TSO will be on a queue somewhere (2) a worker task, on the spare_workers queue of task->cap. ... """
freeTask: https://github.com/ghc/ghc/blob/master/rts/Task.c#L142
the comment in freeTask refers to this test:
https://github.com/ghc/testsuite/blob/master/tests/concurrent/should_run/con...
That test calls the RTC from C which then forkIOs off actions that are outstanding when the RTS exits.
in forkProcess, child code https://github.com/ghc/ghc/blob/master/rts/Schedule.c#L1837
It look like all this code supports the notion that some other thread can be in foreign code during the fork call.
discardTasksExcept https://github.com/ghc/ghc/blob/master/rts/Task.c#L305
Alexander
On Mon, Jan 21, 2013 at 12:15 AM, Mark Lentczner
wrote:
Sorry to be reviving this thread so long after.... but I seem to be running into similar issues as Michael S. did at the start.
In short, I'm using forkProcess with the threaded RTS, and see occasional hangs:
- I see these only on Linux. On Mac OS X, I never do. - I'm using GHC 7.4.2 - I noticed the warning in the doc for forkProcess, but assumed I was safe, as I wasn't holding any shared resources at the time of the fork, and no shared resources in the program are used in the child. - WIth gdb, I've traced the hang to here in the run-time: forkProcess
discardTasksExcept > freeTask > closeMutex(&task->lock) pthread_mutex_destroy
The discussion in this thread leaves me with these questions:
- Is there reason to think the situation has gotten better in 7.6 and later? - Isn't the only reason *System.Process* is safer because it does an immediate exec in the child? Alas, I really want to just fork()sometimes. - Is it really true that even if my program has no shared resources with the child, that the IO subsystem and FFI system do anyway? Surely the RTS would take care of doing the right thing with those, no? - There should be no concern with exec w.r.t. library invariants since exec is wholesale replacement - all the libraries will reinitialize. Is there a problem here I'm missing?
Alas, I've stopped using the threaded RTS until I understand this better.
- Mark

Or this. It seems that you must compile with DEBUG for the mutex check. This enables error-checking mutexes on posix. Alexander diff --git a/rts/posix/OSThreads.c b/rts/posix/OSThreads.c index ae31966..e07221d 100644 --- a/rts/posix/OSThreads.c +++ b/rts/posix/OSThreads.c @@ -91,7 +91,8 @@ initCondition( Condition* pCond ) void closeCondition( Condition* pCond ) { - pthread_cond_destroy(pCond); + int ret = pthread_cond_destroy(pCond); + CHECKM(ret == 0, "RTS Bug! Someone is waiting on condvar ret=%d.", ret); return; } @@ -165,7 +166,8 @@ initMutex(Mutex* pMut) void closeMutex(Mutex* pMut) { - pthread_mutex_destroy(pMut); + int ret = pthread_mutex_destroy(pMut); + CHECKM(ret == 0, "RTS Bug! Destroying held mutex ret=%d", ret); } void On Mon, Jan 21, 2013 at 10:14 AM, Alexander Kjeldaas < alexander.kjeldaas@gmail.com> wrote:
I think you can test this theory with this patch. If a thread is waiting on the task->cond condition variable which is matched up with task->lock, then pthread_cond_destroy will return EBUSY, which must always be a bug in the RTS.
Alexander
diff --git a/rts/posix/OSThreads.c b/rts/posix/OSThreads.c index ae31966..0f12830 100644 --- a/rts/posix/OSThreads.c +++ b/rts/posix/OSThreads.c @@ -91,7 +91,8 @@ initCondition( Condition* pCond ) void closeCondition( Condition* pCond ) { - pthread_cond_destroy(pCond); + int ret = pthread_cond_destroy(pCond); + CHECKM(ret == 0, "RTS BUG! Someone is waiting on condvar %d.", ret); return; }
On Mon, Jan 21, 2013 at 8:18 AM, Alexander Kjeldaas < alexander.kjeldaas@gmail.com> wrote:
I just looked at this code and since I don't know the code I can't give you good solutions, but for others watching this thread the links might prove interesting.
My main theory is that you do have some other thread in FFI-land while you are fork()ing. The task->cond, task->lock seems to be related to this (see quoted comments below).
Also, pthread_mutex_destroy is undefined if the lock is locked, so I am guessing that the task->lock is somehow locked when it shouldn't be.
It isn't clear from your description whether this is consistently happening on Linux, or whether this only sometimes happens.
The forkProcess() code seems to hold all capabilities during fork, but that does not include FFI-land threads AFAIU.
Assuming that this happens only rarely, I am trying to understand what happens if the thread that is in FFI-land returns to the RTS (in the parent) after fork(), but before the freeTask() in the child. Based on the descriptions I read, it seems likely that this thread will try to inspect task->cap, which requires holding task->lock.
That would in turn make the pthread_mutex_destroy in the child invalid.
https://github.com/ghc/ghc/blob/master/rts/Task.h#L57
""" ... When a task is migrated from sleeping on one Capability to another, its task->cap field must be modified. When the task wakes up, it will read the new value of task->cap to find out which Capability it belongs to. Hence some synchronisation is required on task->cap, and this is why we have task->lock.
If the Task is not currently owned by task->id, then the thread is either
(a) waiting on the condition task->cond. The Task is either (1) a bound Task, the TSO will be on a queue somewhere (2) a worker task, on the spare_workers queue of task->cap. ... """
freeTask: https://github.com/ghc/ghc/blob/master/rts/Task.c#L142
the comment in freeTask refers to this test:
https://github.com/ghc/testsuite/blob/master/tests/concurrent/should_run/con...
That test calls the RTC from C which then forkIOs off actions that are outstanding when the RTS exits.
in forkProcess, child code https://github.com/ghc/ghc/blob/master/rts/Schedule.c#L1837
It look like all this code supports the notion that some other thread can be in foreign code during the fork call.
discardTasksExcept https://github.com/ghc/ghc/blob/master/rts/Task.c#L305
Alexander
On Mon, Jan 21, 2013 at 12:15 AM, Mark Lentczner < mark.lentczner@gmail.com> wrote:
Sorry to be reviving this thread so long after.... but I seem to be running into similar issues as Michael S. did at the start.
In short, I'm using forkProcess with the threaded RTS, and see occasional hangs:
- I see these only on Linux. On Mac OS X, I never do. - I'm using GHC 7.4.2 - I noticed the warning in the doc for forkProcess, but assumed I was safe, as I wasn't holding any shared resources at the time of the fork, and no shared resources in the program are used in the child. - WIth gdb, I've traced the hang to here in the run-time: forkProcess
discardTasksExcept > freeTask > closeMutex(&task->lock) pthread_mutex_destroy
The discussion in this thread leaves me with these questions:
- Is there reason to think the situation has gotten better in 7.6 and later? - Isn't the only reason *System.Process* is safer because it does an immediate exec in the child? Alas, I really want to just fork()sometimes. - Is it really true that even if my program has no shared resources with the child, that the IO subsystem and FFI system do anyway? Surely the RTS would take care of doing the right thing with those, no? - There should be no concern with exec w.r.t. library invariants since exec is wholesale replacement - all the libraries will reinitialize. Is there a problem here I'm missing?
Alas, I've stopped using the threaded RTS until I understand this better.
- Mark

On Mon, Jan 21, 2013 at 12:15 AM, Mark Lentczner
Sorry to be reviving this thread so long after.... but I seem to be running into similar issues as Michael S. did at the start.
In short, I'm using forkProcess with the threaded RTS, and see occasional hangs:
- I see these only on Linux. On Mac OS X, I never do.
Previous versions of the linux pthreads library didn't hold any shared resources in locks, so pthread_mutex_destroy could not hang. Now Linux is much improved, and thus it hangs (see pthread_mutex_destroy man page) ;-).
- I'm using GHC 7.4.2 - I noticed the warning in the doc for forkProcess, but assumed I was safe, as I wasn't holding any shared resources at the time of the fork, and no shared resources in the program are used in the child. - WIth gdb, I've traced the hang to here in the run-time: forkProcess
discardTasksExcept > freeTask > closeMutex(&task->lock) pthread_mutex_destroy
The discussion in this thread leaves me with these questions:
- Is there reason to think the situation has gotten better in 7.6 and later? - Isn't the only reason *System.Process* is safer because it does an immediate exec in the child? Alas, I really want to just fork()sometimes.
If you immediately do exec() in the child, you can use vfork() which blocks the parent. This serializes the actions and makes this whole mess smooth and consistent.
- Is it really true that even if my program has no shared resources with the child, that the IO subsystem and FFI system do anyway? Surely the RTS would take care of doing the right thing with those, no?
It looks like the RTS is trying to do a lot of things to control all the
Haskell threads etc. But I don't think it waits for FFI-land threads before commencing a fork(), so that's why I'm guessing that some interaction between threads using FFI and fork() could be the issue.
- There should be no concern with exec w.r.t. library invariants since exec is wholesale replacement - all the libraries will reinitialize. Is there a problem here I'm missing?
I think that's right. vfork() + exec() can be serialized and deterministic
thus is a lot easier to reason about. Alexander

I am also sorry to be late on this but I have run into the same problem
trying to demonise a programme on 7.4.2. My solution was to get a shell
wrapper to run the daemon in debug mode (I.e., sans fork) and get the shell
script to do the demonising.
Other than this I have found the threaded RTS to be sound and I am quite
reliant on it. So, where things that run threaded are concerned, no
forkProcess calls for me until I can better understand this better.
If anybody does think they understand what is going on here it would be
great if they could write it up. IMHO, either the current notes on
forkProcess don't go far enough, or there is a bug needing a workaround
until the platform gets fixed.
Chris
From: Mark Lentczner

While trying to dig around this morning I started adding clang-style thread
locking annotations to the source code. These can be very handy and I
found at least one place where the documented locking policy doesn't seem
to match what is happening.
Here is an example with annotations, and what might or might not be a bug.
With these annotations, clang will be able to prove whether the program
obeys the locking regime or not.
But this is of course only one part of the RTS, but the locking can be
pretty "interesting" in itself.
Does anyone else feel that this sort of annotations would help?
Alexander
diff --git a/rts/Task.c b/rts/Task.c
index e6781a1..1e499dc 100644
--- a/rts/Task.c
+++ b/rts/Task.c
@@ -25,12 +25,12 @@
// Task lists and global counters.
// Locks required: all_tasks_mutex.
-Task *all_tasks = NULL;
+Task *all_tasks GUARDED_BY(all_tasks_mutex) = NULL;
-nat taskCount;
-nat workerCount;
-nat currentWorkerCount;
-nat peakWorkerCount;
+nat taskCount GUARDED_BY(all_tasks_mutex);
+nat workerCount GUARDED_BY(all_tasks_mutex);
+nat currentWorkerCount GUARDED_BY(all_tasks_mutex);
+nat peakWorkerCount GUARDED_BY(all_tasks_mutex);
static int tasksInitialized = 0;
@@ -339,9 +339,11 @@ void updateCapabilityRefs (void)
ACQUIRE_LOCK(&all_tasks_mutex);
for (task = all_tasks; task != NULL; task=task->all_next) {
+ ACQUIRE_LOCK(task->lock);
if (task->cap != NULL) {
task->cap = &capabilities[task->cap->no];
}
+ RELEASE_LOCK(task->lock);
for (incall = task->incall; incall != NULL; incall =
incall->prev_stack) {
if (incall->suspended_cap != NULL) {
On Mon, Jan 21, 2013 at 2:14 PM, Chris Dornan
I am also sorry to be late on this but I have run into the same problem trying to demonise a programme on 7.4.2. My solution was to get a shell wrapper to run the daemon in debug mode (I.e., sans fork) and get the shell script to do the demonising.
Other than this I have found the threaded RTS to be sound and I am quite reliant on it. So, where things that run —threaded are concerned, no forkProcess calls for me until I can better understand this better.
If anybody does think they understand what is going on here it would be great if they could write it up. IMHO, either the current notes on forkProcess don't go far enough, or there is a bug needing a workaround until the platform gets fixed.
Chris
From: Mark Lentczner
Date: Sunday, 20 January 2013 23:15 To: haskell Cc: Mike Meyer Subject: Re: [Haskell-cafe] forkProcess, forkIO, and multithreaded runtime Sorry to be reviving this thread so long after.... but I seem to be running into similar issues as Michael S. did at the start.
In short, I'm using forkProcess with the threaded RTS, and see occasional hangs:
- I see these only on Linux. On Mac OS X, I never do. - I'm using GHC 7.4.2 - I noticed the warning in the doc for forkProcess, but assumed I was safe, as I wasn't holding any shared resources at the time of the fork, and no shared resources in the program are used in the child. - WIth gdb, I've traced the hang to here in the run-time: forkProcess
discardTasksExcept > freeTask > closeMutex(&task->lock) pthread_mutex_destroy
The discussion in this thread leaves me with these questions:
- Is there reason to think the situation has gotten better in 7.6 and later? - Isn't the only reason *System.Process* is safer because it does an immediate exec in the child? Alas, I really want to just fork()sometimes. - Is it really true that even if my program has no shared resources with the child, that the IO subsystem and FFI system do anyway? Surely the RTS would take care of doing the right thing with those, no? - There should be no concern with exec w.r.t. library invariants since exec is wholesale replacement - all the libraries will reinitialize. Is there a problem here I'm missing?
Alas, I've stopped using the threaded RTS until I understand this better.
- Mark _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
participants (12)
-
Alexander Kjeldaas
-
Alexander V Vershilov
-
Brandon Allbery
-
Chris Dornan
-
Donn Cave
-
Joey Hess
-
Kazu Yamamoto
-
Mark Lentczner
-
Michael Snoyman
-
Mike Meyer
-
Richard O'Keefe
-
Simon Marechal