
I've run into this, but with sockets instead of files. For example, if
you run a kind of launcher that spawns processes with a double fork,
and it listens on its own socket, restarting it will fail to rebind
the socket, since the spawned processes inherited it. We set
FD_CLOEXEC on the socket now, but, at least on Linux, you could pass
SOCK_CLOEXEC to 'socket' in a similar way as with 'open'. Mac support
is trickier: it does seem to support the flag on 'open', but not on
'socket', as far as I can tell. I have no idea if this discussion
applies to Windows at all.
Personally I agree with you that we should probably set this by
default, and expose a flag to change it.
Erik
On Mon, Jul 20, 2015 at 3:07 PM, Niklas Hambüchen
Hello Cafe,
I would like to point out a problem common to all programming languages, and that Haskell hasn't addressed yet while other languages have.
It is about what happens to file descriptors when the `exec()` syscall is used (whenever you `readProcess`, `createProcess`, `system`, use any form of `popen()`, Shake's `cmd` etc.).
(A Markdown-rendered version containing most of this email can be found at https://github.com/ndmitchell/shake/issues/253.)
Take the following function
f :: IO () f = do inSomeTemporaryDirectory $ do BS.writeFile "mybinary" binaryContents setPermissions "mybinary" (setOwnerExecutable True emptyPermissions) _ <- readProcess "./mybinary" [] "" return ()
If this is happening in parallel, e.g. using,
forkIO f >> forkIO f >> forkIO f >> threadDelay 5000000`
then on Linux the `readProcess` might often fail wit the error message
mybinary: Text file busy
This error means "Cannot execute the program 'mybinary' because it is open for writing by some process".
How can this happen, given that we're writing all `mybinary` files in completely separate temporary directories, and given that `BS.writeFile` guarantees to close the file handle / file descriptor (`Fd`) before it returns?
The answer is that by default, child processes on Unix (`fork()+exec()`) inherit all open file descriptors of the parent process. An ordering that leads to the problematic case could be:
* Thread 1 writes its file completely (opens and closes an Fd 1) * Thread 2 starts writing its file (Fd 2 open for writing) * Thread 1 executes "myBinary" (which calls `fork()` and `exec()`). Fd 2 is inherited by the child process * Thread 2 finishes writing (closes its Fd 2) * Thread 2 executes "myBinary", which fails with `Text file busy` because an Fd is still open to it in the child of Process 1
The scope of this program is quite general unfortunately: It will happen for any program that uses parallel threads, and that runs two or more external processes at some time. It cannot be fixed by the part that starts the external process (e.g. you can't write a reliable `readProcess` function that doesn't have this problem, since the problem is rooted in the Fds, and there is no version of `exec()` that doesn't inherit parent Fds).
This problem is a general problem in C on Unix, and was discovered quite late.
Naive solutions to this use `fcntl`, e.g. `fcntl(fd, F_SETFD, FD_CLOEXEC)`:
http://stackoverflow.com/questions/6125068/what-does-the-fd-cloexec-fcntl-fl...
which is the equivalent of Haskell's `setFdOption` to set the `CLOEXEC` flag to all Fds before `exec()`ing. Fds with this flag are not inherited by `exec()`ed child processes. However, these solutions are racy in multi-threaded programs (such as typical Haskell programs), where an `exec()` made by some thread can fall just in between the `int fd = open(...); exec(...)` of some other thread.
For this reason, the `O_CLOEXEC` flag was added in Linux 2.6.23, see e.g. `man 2 open`
http://man7.org/linux/man-pages/man2/open.2.html
to the `open()` syscall to atomically open a file and set the Fd to CLOEXEC in a single step.
This flag is not the default in Haskell - but maybe it should be. Other languages set it by default, for example Python. See
PEP-433: https://www.python.org/dev/peps/pep-0433/ and the newer PEP-446: https://www.python.org/dev/peps/pep-0446/
for a very good description of the situation.
Python >= 3.2 closes open Fds *after* the `exec()` when performed with its `subprocess` module. Python 3.4 uses O_CLOEXEC by default on all Fds opened by Python.
It is also noted that "The programming languages Go, Perl and Ruby make newly created file descriptors non-inheritable by default: since Go 1.0 (2009), Perl 1.0 (1987) and Ruby 2.0 (2013)":
https://www.python.org/dev/peps/pep-0446/#related-work
A work-around for Haskell is to use `O_CLOEXEC` explicitly, as in this example module `System/Posix/IO/ExecSafe.hsc`:
https://gist.github.com/nh2/4932ecf5ca919659ae51
Then we can implement a safe version of `BS.writeFile`:
https://gist.github.com/nh2/4932ecf5ca919659ae51
Using this form of `writeFileExecSafe` helps in cases when your program is very small, when you can change all the code and you don't use any libraries that open files. However, this is a very rare case, and not a real solution.
All multi-threaded Haskell programs that write and execute files will inherently trigger the `Text file busy` problem.
We need to discuss what to do about this.
Let us run this discussion on haskell-cafe and move to the libraries@ mailing list once we've got some ideas and opinions.
My personal stance is that we should follow Python's example, and all functions in our standard libraries should open files with the O_CLOEXEC flag set.
Niklas _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe