Discussion: The CLOEXEC problem

Hello Cafe, I would like to point out a problem common to all programming languages, and that Haskell hasn't addressed yet while other languages have. It is about what happens to file descriptors when the `exec()` syscall is used (whenever you `readProcess`, `createProcess`, `system`, use any form of `popen()`, Shake's `cmd` etc.). (A Markdown-rendered version containing most of this email can be found at https://github.com/ndmitchell/shake/issues/253.) Take the following function f :: IO () f = do inSomeTemporaryDirectory $ do BS.writeFile "mybinary" binaryContents setPermissions "mybinary" (setOwnerExecutable True emptyPermissions) _ <- readProcess "./mybinary" [] "" return () If this is happening in parallel, e.g. using, forkIO f >> forkIO f >> forkIO f >> threadDelay 5000000` then on Linux the `readProcess` might often fail wit the error message mybinary: Text file busy This error means "Cannot execute the program 'mybinary' because it is open for writing by some process". How can this happen, given that we're writing all `mybinary` files in completely separate temporary directories, and given that `BS.writeFile` guarantees to close the file handle / file descriptor (`Fd`) before it returns? The answer is that by default, child processes on Unix (`fork()+exec()`) inherit all open file descriptors of the parent process. An ordering that leads to the problematic case could be: * Thread 1 writes its file completely (opens and closes an Fd 1) * Thread 2 starts writing its file (Fd 2 open for writing) * Thread 1 executes "myBinary" (which calls `fork()` and `exec()`). Fd 2 is inherited by the child process * Thread 2 finishes writing (closes its Fd 2) * Thread 2 executes "myBinary", which fails with `Text file busy` because an Fd is still open to it in the child of Process 1 The scope of this program is quite general unfortunately: It will happen for any program that uses parallel threads, and that runs two or more external processes at some time. It cannot be fixed by the part that starts the external process (e.g. you can't write a reliable `readProcess` function that doesn't have this problem, since the problem is rooted in the Fds, and there is no version of `exec()` that doesn't inherit parent Fds). This problem is a general problem in C on Unix, and was discovered quite late. Naive solutions to this use `fcntl`, e.g. `fcntl(fd, F_SETFD, FD_CLOEXEC)`: http://stackoverflow.com/questions/6125068/what-does-the-fd-cloexec-fcntl-fl... which is the equivalent of Haskell's `setFdOption` to set the `CLOEXEC` flag to all Fds before `exec()`ing. Fds with this flag are not inherited by `exec()`ed child processes. However, these solutions are racy in multi-threaded programs (such as typical Haskell programs), where an `exec()` made by some thread can fall just in between the `int fd = open(...); exec(...)` of some other thread. For this reason, the `O_CLOEXEC` flag was added in Linux 2.6.23, see e.g. `man 2 open` http://man7.org/linux/man-pages/man2/open.2.html to the `open()` syscall to atomically open a file and set the Fd to CLOEXEC in a single step. This flag is not the default in Haskell - but maybe it should be. Other languages set it by default, for example Python. See PEP-433: https://www.python.org/dev/peps/pep-0433/ and the newer PEP-446: https://www.python.org/dev/peps/pep-0446/ for a very good description of the situation. Python >= 3.2 closes open Fds *after* the `exec()` when performed with its `subprocess` module. Python 3.4 uses O_CLOEXEC by default on all Fds opened by Python. It is also noted that "The programming languages Go, Perl and Ruby make newly created file descriptors non-inheritable by default: since Go 1.0 (2009), Perl 1.0 (1987) and Ruby 2.0 (2013)": https://www.python.org/dev/peps/pep-0446/#related-work A work-around for Haskell is to use `O_CLOEXEC` explicitly, as in this example module `System/Posix/IO/ExecSafe.hsc`: https://gist.github.com/nh2/4932ecf5ca919659ae51 Then we can implement a safe version of `BS.writeFile`: https://gist.github.com/nh2/4932ecf5ca919659ae51 Using this form of `writeFileExecSafe` helps in cases when your program is very small, when you can change all the code and you don't use any libraries that open files. However, this is a very rare case, and not a real solution. All multi-threaded Haskell programs that write and execute files will inherently trigger the `Text file busy` problem. We need to discuss what to do about this. Let us run this discussion on haskell-cafe and move to the libraries@ mailing list once we've got some ideas and opinions. My personal stance is that we should follow Python's example, and all functions in our standard libraries should open files with the O_CLOEXEC flag set. Niklas

I've run into this, but with sockets instead of files. For example, if
you run a kind of launcher that spawns processes with a double fork,
and it listens on its own socket, restarting it will fail to rebind
the socket, since the spawned processes inherited it. We set
FD_CLOEXEC on the socket now, but, at least on Linux, you could pass
SOCK_CLOEXEC to 'socket' in a similar way as with 'open'. Mac support
is trickier: it does seem to support the flag on 'open', but not on
'socket', as far as I can tell. I have no idea if this discussion
applies to Windows at all.
Personally I agree with you that we should probably set this by
default, and expose a flag to change it.
Erik
On Mon, Jul 20, 2015 at 3:07 PM, Niklas Hambüchen
Hello Cafe,
I would like to point out a problem common to all programming languages, and that Haskell hasn't addressed yet while other languages have.
It is about what happens to file descriptors when the `exec()` syscall is used (whenever you `readProcess`, `createProcess`, `system`, use any form of `popen()`, Shake's `cmd` etc.).
(A Markdown-rendered version containing most of this email can be found at https://github.com/ndmitchell/shake/issues/253.)
Take the following function
f :: IO () f = do inSomeTemporaryDirectory $ do BS.writeFile "mybinary" binaryContents setPermissions "mybinary" (setOwnerExecutable True emptyPermissions) _ <- readProcess "./mybinary" [] "" return ()
If this is happening in parallel, e.g. using,
forkIO f >> forkIO f >> forkIO f >> threadDelay 5000000`
then on Linux the `readProcess` might often fail wit the error message
mybinary: Text file busy
This error means "Cannot execute the program 'mybinary' because it is open for writing by some process".
How can this happen, given that we're writing all `mybinary` files in completely separate temporary directories, and given that `BS.writeFile` guarantees to close the file handle / file descriptor (`Fd`) before it returns?
The answer is that by default, child processes on Unix (`fork()+exec()`) inherit all open file descriptors of the parent process. An ordering that leads to the problematic case could be:
* Thread 1 writes its file completely (opens and closes an Fd 1) * Thread 2 starts writing its file (Fd 2 open for writing) * Thread 1 executes "myBinary" (which calls `fork()` and `exec()`). Fd 2 is inherited by the child process * Thread 2 finishes writing (closes its Fd 2) * Thread 2 executes "myBinary", which fails with `Text file busy` because an Fd is still open to it in the child of Process 1
The scope of this program is quite general unfortunately: It will happen for any program that uses parallel threads, and that runs two or more external processes at some time. It cannot be fixed by the part that starts the external process (e.g. you can't write a reliable `readProcess` function that doesn't have this problem, since the problem is rooted in the Fds, and there is no version of `exec()` that doesn't inherit parent Fds).
This problem is a general problem in C on Unix, and was discovered quite late.
Naive solutions to this use `fcntl`, e.g. `fcntl(fd, F_SETFD, FD_CLOEXEC)`:
http://stackoverflow.com/questions/6125068/what-does-the-fd-cloexec-fcntl-fl...
which is the equivalent of Haskell's `setFdOption` to set the `CLOEXEC` flag to all Fds before `exec()`ing. Fds with this flag are not inherited by `exec()`ed child processes. However, these solutions are racy in multi-threaded programs (such as typical Haskell programs), where an `exec()` made by some thread can fall just in between the `int fd = open(...); exec(...)` of some other thread.
For this reason, the `O_CLOEXEC` flag was added in Linux 2.6.23, see e.g. `man 2 open`
http://man7.org/linux/man-pages/man2/open.2.html
to the `open()` syscall to atomically open a file and set the Fd to CLOEXEC in a single step.
This flag is not the default in Haskell - but maybe it should be. Other languages set it by default, for example Python. See
PEP-433: https://www.python.org/dev/peps/pep-0433/ and the newer PEP-446: https://www.python.org/dev/peps/pep-0446/
for a very good description of the situation.
Python >= 3.2 closes open Fds *after* the `exec()` when performed with its `subprocess` module. Python 3.4 uses O_CLOEXEC by default on all Fds opened by Python.
It is also noted that "The programming languages Go, Perl and Ruby make newly created file descriptors non-inheritable by default: since Go 1.0 (2009), Perl 1.0 (1987) and Ruby 2.0 (2013)":
https://www.python.org/dev/peps/pep-0446/#related-work
A work-around for Haskell is to use `O_CLOEXEC` explicitly, as in this example module `System/Posix/IO/ExecSafe.hsc`:
https://gist.github.com/nh2/4932ecf5ca919659ae51
Then we can implement a safe version of `BS.writeFile`:
https://gist.github.com/nh2/4932ecf5ca919659ae51
Using this form of `writeFileExecSafe` helps in cases when your program is very small, when you can change all the code and you don't use any libraries that open files. However, this is a very rare case, and not a real solution.
All multi-threaded Haskell programs that write and execute files will inherently trigger the `Text file busy` problem.
We need to discuss what to do about this.
Let us run this discussion on haskell-cafe and move to the libraries@ mailing list once we've got some ideas and opinions.
My personal stance is that we should follow Python's example, and all functions in our standard libraries should open files with the O_CLOEXEC flag set.
Niklas _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe

This is just one example of the general problem of fork()'ing with threads holding a lock. Given that locks come in different flavors and have different uses, they need individual handling. The best general solution I know of is "don't do that". One of the things I like about Haskell is that the type system gives me hints when I'm trying to do that. That said, there are a number of issues raised by leaving FD's open across an exec, and that's rarely the right thing to do. So making it the default behavior is probably a good idea.

quoth Niklas Hambüchen, ...
The scope of this program is quite general unfortunately: It will happen for any program that uses parallel threads, and that runs two or more external processes at some time. It cannot be fixed by the part that starts the external process (e.g. you can't write a reliable `readProcess` function that doesn't have this problem, since the problem is rooted in the Fds, and there is no version of `exec()` that doesn't inherit parent Fds).
This problem is a general problem in C on Unix, and was discovered quite late.
I believe it has actually been a familiar issue for decades. I don't have any code handy to check, but I'm pretty sure the UNIX system(3) and popen(3) functions closed extraneous file descriptors back in the early '90s, and probably had been doing it for some time by then. I believe this approach to the problem is supported in System.Process, via close_fds. Implementation is a walk through open FDs, in the child fork, closing anything not called for by the procedure's parameters prior to the exec. That approach has the advantage that it applies to all file descriptors, whether created by open(2) or by other means - socket, dup(2), etc. I like this already implemented solution much better than adding a new flag to "all" opens (really only those opens that occur within the Haskell runtime, while of course not for external library FDs.) The O_CLOEXEC proposal wouldn't be the worst or most gratuitous way Haskell tampers with normal UNIX parameters, but any time you do that, you set up the conditions for breaking something that works in C, which I hate to see happen with Haskell. Donn

Hello Donn, Python has a detailed discussion of this suggestion: * https://www.python.org/dev/peps/pep-0433/#close-file-descriptors-after-fork * https://www.python.org/dev/peps/pep-0446/#closing-all-open-file-descriptors It highlights some problems with this approach, most notably Windows problems, not solving the problem when you exec() without fork(), and looping up to MAXFD being slow (this is what the current Haskell `runInteractiveProcess` code (http://hackage.haskell.org/package/process-1.2.3.0/src/cbits/runProcess.c) seems to be doing; Python improved upon this by not looping up to MAXFD, but instead looking up the open FDs in /proc/<PID>/fd/, after people complained about this loop of close() syscalls being very slow when many FDs were open.
do that, you set up the conditions for breaking something that works in C, which I hate to see happen with Haskell.
While I understand your opinion here, I'm not sure that "breaking something that works in C" is the right description. O_CLOEXEC changes a default setting, but does not irrevocably disable any feature that is available in C. The difference is that you'd have to say which FDs you want to keep in the child - which to my knowledge is OK, since it is a much more common thing to work with *some* designated FDs in the child process than with all of them. To elaborate a bit, if you wanted to write a program where a child process would access the parent's Fds, you would in most cases already have those Fds in some Haskell variables you're working with. In that case, it is easy to `setFdOption fd CloseOnExec False` on those if CLOEXEC is the default, and everybody is happy. If CLOEXEC is not the default, then you'd get a problem with all those Fds on which do *not* have a grip in your program, and it's much harder to fix problems with these resources that are around invisible in the background than with those that you have in variables that you use. In other words, CLOEXEC is something that is easy to *undo* locally when you don't want it, but hard to *do* globally when you need it. Let me know what you think about this. Niklas On 22/07/15 04:47, Donn Cave wrote:
quoth Niklas Hambüchen, ...
The scope of this program is quite general unfortunately: It will happen for any program that uses parallel threads, and that runs two or more external processes at some time. It cannot be fixed by the part that starts the external process (e.g. you can't write a reliable `readProcess` function that doesn't have this problem, since the problem is rooted in the Fds, and there is no version of `exec()` that doesn't inherit parent Fds).
This problem is a general problem in C on Unix, and was discovered quite late.
I believe it has actually been a familiar issue for decades. I don't have any code handy to check, but I'm pretty sure the UNIX system(3) and popen(3) functions closed extraneous file descriptors back in the early '90s, and probably had been doing it for some time by then.
I believe this approach to the problem is supported in System.Process, via close_fds. Implementation is a walk through open FDs, in the child fork, closing anything not called for by the procedure's parameters prior to the exec.
That approach has the advantage that it applies to all file descriptors, whether created by open(2) or by other means - socket, dup(2), etc.
I like this already implemented solution much better than adding a new flag to "all" opens (really only those opens that occur within the Haskell runtime, while of course not for external library FDs.) The O_CLOEXEC proposal wouldn't be the worst or most gratuitous way Haskell tampers with normal UNIX parameters, but any time you do that, you set up the conditions for breaking something that works in C, which I hate to see happen with Haskell.
Donn _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe

quoth Niklas_Hamb�chen,
do that, you set up the conditions for breaking something that works in C, which I hate to see happen with Haskell.
While I understand your opinion here, I'm not sure that "breaking something that works in C" is the right description. O_CLOEXEC changes a default setting, but does not irrevocably disable any feature that is available in C.
Sure, it isn't irrevocable - so what's broken may be fixed, if you have access to it, but of course it's better not to break things in the first place.
In other words, CLOEXEC is something that is easy to *undo* locally when you don't want it, but hard to *do* globally when you need it.
Yes, of course, I understand the appeal. But it's a deep change to the way FDs have historically worked that affects widely used UNIX features, and it doesn't solve the problem - sockets, file descriptors created by external libraries or inherited from the parent process, child processes that don't exec - so if you want to relieve a child process of all extraneous open files, you still have to walk the FD table, the sam way it's been done for the last 20 or 30 years. Fork-exec is the relatively unusual event where it makes sense to deal with these issues - including other resources besides FDs as required. Fork-exec outside of GHC should of course continue to work as written. Donn

On Thu, Jul 23, 2015 at 3:23 AM, Donn Cave
quoth Niklas_Hambüchen,
do that, you set up the conditions for breaking something that works in C, which I hate to see happen with Haskell.
While I understand your opinion here, I'm not sure that "breaking something that works in C" is the right description. O_CLOEXEC changes a default setting, but does not irrevocably disable any feature that is available in C.
Sure, it isn't irrevocable - so what's broken may be fixed, if you have access to it, but of course it's better not to break things in the first place.
In other words, CLOEXEC is something that is easy to *undo* locally when you don't want it, but hard to *do* globally when you need it.
Yes, of course, I understand the appeal. But it's a deep change to the way FDs have historically worked that affects widely used UNIX features, and it doesn't solve the problem - sockets, file descriptors created by external libraries or inherited from the parent process, child processes that don't exec - so if you want to relieve a child process of all extraneous open files, you still have to walk the FD table, the sam way it's been done for the last 20 or 30 years. Fork-exec is the relatively unusual event where it makes sense to deal with these issues - including other resources besides FDs as required. Fork-exec outside of GHC should of course continue to work as written.
This history is from before the c10k problem and related file descriptor scaling became relevant. Yes we need to walk the open file descriptors by walking /prod/self/fd and using obscure APIs on OS X. No matter how you see it, it's not what it was 30 years ago. Alexander

On Mon, Jul 20, 2015 at 3:07 PM, Niklas Hambüchen
Hello Cafe,
I would like to point out a problem common to all programming languages, and that Haskell hasn't addressed yet while other languages have.
It is about what happens to file descriptors when the `exec()` syscall is used (whenever you `readProcess`, `createProcess`, `system`, use any form of `popen()`, Shake's `cmd` etc.).
(A Markdown-rendered version containing most of this email can be found at https://github.com/ndmitchell/shake/issues/253.)
Take the following function
f :: IO () f = do inSomeTemporaryDirectory $ do BS.writeFile "mybinary" binaryContents setPermissions "mybinary" (setOwnerExecutable True emptyPermissions) _ <- readProcess "./mybinary" [] "" return ()
If this is happening in parallel, e.g. using,
forkIO f >> forkIO f >> forkIO f >> threadDelay 5000000`
then on Linux the `readProcess` might often fail wit the error message
mybinary: Text file busy
This error means "Cannot execute the program 'mybinary' because it is open for writing by some process".
How can this happen, given that we're writing all `mybinary` files in completely separate temporary directories, and given that `BS.writeFile` guarantees to close the file handle / file descriptor (`Fd`) before it returns?
The answer is that by default, child processes on Unix (`fork()+exec()`) inherit all open file descriptors of the parent process. An ordering that leads to the problematic case could be:
* Thread 1 writes its file completely (opens and closes an Fd 1) * Thread 2 starts writing its file (Fd 2 open for writing) * Thread 1 executes "myBinary" (which calls `fork()` and `exec()`). Fd 2 is inherited by the child process * Thread 2 finishes writing (closes its Fd 2) * Thread 2 executes "myBinary", which fails with `Text file busy` because an Fd is still open to it in the child of Process 1
I think CLOEXEC should be the default, but it doesn't seem to solve your problem. What if thread 2 executes "myBinary" before thread 1 called exec()? Alexander

On Fri, Jul 24, 2015 at 6:29 PM, Alexander Kjeldaas < alexander.kjeldaas@gmail.com> wrote:
I think CLOEXEC should be the default, but it doesn't seem to solve your problem. What if thread 2 executes "myBinary" before thread 1 called exec()?
I think you missed that each thread is using its own temporary directory --- they're not all running at the same time in the same directory, which would be pretty much guaranteed to fail. -- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net

The directory is irrelevant. fork() + exec() is not an atomic operation:
* Thread 1 writes its file completely (opens and closes an Fd 1, using
O_CLOEXEC)
* Thread 2 starts writing its file (Fd 2 open for writing, using O_CLOEXEC)
* Thread 1 starts executing "myBinary" by calling *fork()*. Fd 2 is
inherited by the child process
* Thread 2 finishes writing (closes its Fd 2)
* Thread 2 executes "myBinary", which fails with `Text file busy`
because an Fd is still open to it in the child of Process 1
* Thread 1 executes "myBinary" (calling exec()). Fd 2 is automatically
closed during exec(), but it's too late.
You need the file descriptor to not be inherited by a child process, which
is != from O_CLOEXEC.
Alexander
On Fri, Aug 28, 2015 at 10:14 PM, Brandon Allbery
On Fri, Jul 24, 2015 at 6:29 PM, Alexander Kjeldaas < alexander.kjeldaas@gmail.com> wrote:
I think CLOEXEC should be the default, but it doesn't seem to solve your problem. What if thread 2 executes "myBinary" before thread 1 called exec()?
I think you missed that each thread is using its own temporary directory --- they're not all running at the same time in the same directory, which would be pretty much guaranteed to fail.
-- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net

On 30/08/15 15:19, Alexander Kjeldaas wrote:
The directory is irrelevant. fork() + exec() is not an atomic operation:
* Thread 1 writes its file completely (opens and closes an Fd 1, using O_CLOEXEC) * Thread 2 starts writing its file (Fd 2 open for writing, using O_CLOEXEC) * Thread 1 starts executing "myBinary" by calling *fork()*. Fd 2 is inherited by the child process * Thread 2 finishes writing (closes its Fd 2) * Thread 2 executes "myBinary", which fails with `Text file busy` because an Fd is still open to it in the child of Process 1 * Thread 1 executes "myBinary" (calling exec()). Fd 2 is automatically closed during exec(), but it's too late.
You need the file descriptor to not be inherited by a child process, which is != from O_CLOEXEC.
You are right. This makes solving my original problem impossible, and * writing the file * then renaming it * then executing it seems to be the only way to do it safely. Let us then move the discussion back to whether CLOEXEC by default or not.

On Sun, Aug 30, 2015 at 9:19 AM Alexander Kjeldaas < alexander.kjeldaas@gmail.com> wrote:
The directory is irrelevant. fork() + exec() is not an atomic operation:
This creates problems for all resources that act as locks. IIRC (it's been a few years since I looked through it thoroughly), it's been shown that there isn't a general fix for this. I.e - that the POSIX threading model & fork() will having timing issues of some sort or another no matter what you do. The work-around is to only fork when no such resources are held. So you do things like fork all your processes before starting a thread, or fork a server that will do all further forks upon request before starting a thread, etc. So the question should not be whether CLO_EXEC "fixes everything", but whether having it as the default is a good enough idea to be worth the pain of changing. I suspect the answer is yes, as most cases where it isn't set are probably because it's the default, so won't need changing.
participants (6)
-
Alexander Kjeldaas
-
Brandon Allbery
-
Donn Cave
-
Erik Hesselink
-
Mike Meyer
-
Niklas Hambüchen