Re: [Haskell-cafe] Discussion: The CLOEXEC problem

Quoth David Turner
Could you be a bit more specific? Which bits of pre-existing software didn't have a FD_CLOEXEC bit and would be broken by this proposal?
Well, of course to be precise, the bit's always there, it's just normally not set - that's the normal environment that anything written up to now would expect. And of course, anything that depends on a GHC-opened file to stay open over an exec would be broken. I can't enumerate the software that meets that criterion.
Since Python recently decided to go through this exact transition, their experience should be instructive. Do you know if there was negative fallout from PEP 0466?
I gave up on Python a long time ago and don't follow what goes on. If recently means less than a decade or so, though, it's not much to go on. If the problem addressed by the O_CLOEXEC proposal is obscure, the problems it may create are even more so - I'll certainly concede that - and it could take a lot of experience before those problems would be well known enough to show up if you went looking for them.
When thinking about FDs from outside the Haskell runtime (whether inherited or simply opened in an external library), can you give an example of a case where such a FD causes a problem if inherited and yet cannot be set as FD_CLOEXEC at source?
Sorry, I'm confused here. Files opened within GHC and externally have equal potential to work as intended or to cause problems, it seems to me. If we infer from the proposal that GHC-opened files with CLOEXEC unset may cause a problem, then it follows that other files with CLOEXEC unset also may cause the same problem. The proposal addresses only the former, and not the latter, and only for normal files - while the ordinary solution, as implemented in UNIX popen(3), deals with all - pipes, sockets, etc. Donn

On 07/24/2015 09:22 PM, Donn Cave wrote:
Quoth David Turner
, Could you be a bit more specific? Which bits of pre-existing software didn't have a FD_CLOEXEC bit and would be broken by this proposal?
Well, of course to be precise, the bit's always there, it's just normally not set - that's the normal environment that anything written up to now would expect. And of course, anything that depends on a GHC-opened file to stay open over an exec would be broken. I can't enumerate the software that meets that criterion.
Since Python recently decided to go through this exact transition, their experience should be instructive. Do you know if there was negative fallout from PEP 0466?
I gave up on Python a long time ago and don't follow what goes on. If recently means less than a decade or so, though, it's not much to go on. If the problem addressed by the O_CLOEXEC proposal is obscure, the problems it may create are even more so - I'll certainly concede that - and it could take a lot of experience before those problems would be well known enough to show up if you went looking for them.
It seems to me that discovering a "FD-was-unexpectedly-closed-before-it-was-supposed-to" problem is a lot more likely than discovering FD leaks, no? (Not that I'm advocating any particular solution to this -- backward compatibility is a harsh mistress.) Regards,

Quoth Bardur Arantsson
On 07/24/2015 09:22 PM, Donn Cave wrote: ...
If recently means less than a decade or so, though, it's not much to go on. If the problem addressed by the O_CLOEXEC proposal is obscure, the problems it may create are even more so - I'll certainly concede that - and it could take a lot of experience before those problems would be well known enough to show up if you went looking for them.
It seems to me that discovering a "FD-was-unexpectedly-closed-before-it-was-supposed-to" problem is a lot more likely than discovering FD leaks, no?
Maybe ... Note that if it were exactly about FD leaks, that problem would be undiscovered yet. The reason anyone cares is that the leaked file descriptor may go on to inconveniently hold a file open. In what I think is the most common case, the file is a pipe, and the open write end makes a read hang when it should complete. Pipes aren't created by open(2) so won't be part of an O_CLOEXEC solution, but I imagine this is where the issue is usually first encountered, and why popen(3) closes all file descriptors. With disk files ... off the top of my head, the most likely effect might NOT be read/write I/O errors, because here we're talking about passing a file descriptor value through an exec, which I think is an unusual programming practice. It's easy enough to do, e.g. you can format the value into a shell command like "echo whatever >&6", but eccentric. But there are other things that could turn up. For example, you could use flock(2) (Berkeley, not POSIX fcntl lock) to keep an advisory file lock until the exec exits. If the file is closed prematurely, you lose the lock, and ... whatever happens then. Donn

While this discussion has been about the programming errors that result from leaked file descriptors, can I point out what I think is a more important issue? A leaked file descriptor is a potential security hole. If you want your code to be secure - and in this age of internet-based applications built by plugging things together, that should always be the case - you want bugs from not dealing with an access issue to result in a permission denied error, not someone being able to read stuff they shouldn't. So while we can't fix all the holes related to this issue or the larger issues related to forking a threaded program, changing the default to automatically close things will result in improving the security of haskell programs.

On 25/07/15 21:09, Donn Cave wrote:
But there are other things that could turn up. For example, you could use flock(2) (Berkeley, not POSIX fcntl lock) to keep an advisory file lock until the exec exits. If the file is closed prematurely, you lose the lock, and ... whatever happens then.
This is a very valid point. Applications that rely on this will break by changing the default here. I'm wondering though whether this is an acceptable price to pay for better (in my opinion) defaults. Given enough announcement and time, it should not be too difficult to find Berkeley flock() invocations, and explicitly fnctl their FDs to CLOEXEC=False, or open() them with CLOEXEC=False. I would even be surprised if there is a single Haskell program out there that uses this; I know of one that uses file locking, bu that's using fnctl style locks.

On Sun, Aug 30, 2015 at 9:58 AM, Niklas Hambüchen
I would even be surprised if there is a single Haskell program out there that uses this; I know of one that uses file locking, bu that's using fnctl style locks.
Also note that many systems emulate flock() with fcntl() locks, so that trick is nonportable anyway. (Linux used to do that, but stopped; unless you're holding onto a system with a pre-2.0 kernel or a weird Linux distribution whose glibc has been modified to do emulation, you should have real flock().) -- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net

Quoth Niklas Hambüchen
On 25/07/15 21:09, Donn Cave wrote:
But there are other things that could turn up. For example, you could use flock(2) (Berkeley, not POSIX fcntl lock) to keep an advisory file lock until the exec exits. If the file is closed prematurely, you lose the lock, and ... whatever happens then.
This is a very valid point. Applications that rely on this will break by changing the default here.
... and you go on to demonstrate that you didn't really take the point I was trying to make. How did you find out about this flock(2) scenario? Do you suppose that this is the only one, because you and I don't know of any more? My point is not that the whole thing might hinge on whether we can deal with flock locking in this scenario, it is that when we think about reversing ancient defaults in the underlying system, we have to assume the risk of obscure breakage as a result. To worry about the flock problem now is to miss the point. And again, this half-a-fix inconsistently applies to only files created by open(2), and not to pipes, sockets and whatever else creates a file descriptor in some other way, so if it's a real problem, it seems you must address it in some other way anyway. Donn

On 30/08/15 16:59, Donn Cave wrote:
Quoth Niklas Hambüchen
, ... and you go on to demonstrate that you didn't really take the point I was trying to make. How did you find out about this flock(2) scenario? Do you suppose that this is the only one, because you and I don't know of any more?
No, I do take your point, and I admit that more things may break when changing the default. What I'm saying is that such cases are relatively easy to find. When you rely on inheriting FDs, you know it. It is exotic enough that when you've built something that needs it, you'll remember it. This leads me to think that the damage done / cost to fix introduced by breaking the backwards compatibility here might be smaller than the damage / fixing that will arise in the future from surprising behaviour and security/privilege problems through leaked FDs in all non-exotic programs that want to exec() something. I'd assume the Python and Perl people came to this conclusion. Further, our community is small, and if you announce something loud and long-term via mailing list and Reddit, it will go a long way and make it unlikely that somebody will be unaware of such a change. Your point that we may break things that we don't know of / understand still stands though. Regarding pipes and sockets, pipe()/socket() accept CLOEXEC as well.

Exactly. As I said earlier, if you forget to clear FD_CLOEXEC when you
meant to then your program breaks loudly and obviously; if you forget to
*set* FD_CLOEXEC then the bug is much quieter and more subtle.
I asked for a specific example of some existing code that would be broken
by this, but none was forthcoming. I understand why, in theory, changing
this would be a problem (indeed, changing anything is similarly
problematic) but in practice the pros enormously outweigh the cons here.
On 30 Aug 2015 11:17 pm, "Niklas Hambüchen"
On 30/08/15 16:59, Donn Cave wrote:
Quoth Niklas Hambüchen
, ... and you go on to demonstrate that you didn't really take the point I was trying to make. How did you find out about this flock(2) scenario? Do you suppose that this is the only one, because you and I don't know of any more? No, I do take your point, and I admit that more things may break when changing the default.
What I'm saying is that such cases are relatively easy to find. When you rely on inheriting FDs, you know it. It is exotic enough that when you've built something that needs it, you'll remember it.
This leads me to think that the damage done / cost to fix introduced by breaking the backwards compatibility here might be smaller than the damage / fixing that will arise in the future from surprising behaviour and security/privilege problems through leaked FDs in all non-exotic programs that want to exec() something. I'd assume the Python and Perl people came to this conclusion.
Further, our community is small, and if you announce something loud and long-term via mailing list and Reddit, it will go a long way and make it unlikely that somebody will be unaware of such a change. Your point that we may break things that we don't know of / understand still stands though.
Regarding pipes and sockets, pipe()/socket() accept CLOEXEC as well. _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe

On 31/08/15 09:28, David Turner wrote:
Exactly. As I said earlier, if you forget to clear FD_CLOEXEC when you meant to then your program breaks loudly and obviously; if you forget to *set* FD_CLOEXEC then the bug is much quieter and more subtle.
I think the example given by Donn (a lock silently being cleared too early) is a case where it does not break loudly and obviously. I agree with the second part though that bugs related to leaking tend to be quieter and more subtle in general.

Yes, sure, but what actually does that? i.e. what locks a FD opened by the Haskell runtime and then expects to preserve the lock across a fork? I see that this is a problem in theory, but in practice is it? The point is, how much code would really have to change to fix the fallout from this? On 31/08/15 09:28, David Turner wrote:
Exactly. As I said earlier, if you forget to clear FD_CLOEXEC when you meant to then your program breaks loudly and obviously; if you forget to *set* FD_CLOEXEC then the bug is much quieter and more subtle.
I think the example given by Donn (a lock silently being cleared too early) is a case where it does not break loudly and obviously. I agree with the second part though that bugs related to leaking tend to be quieter and more subtle in general.
participants (6)
-
Bardur Arantsson
-
Brandon Allbery
-
David Turner
-
Donn Cave
-
Mike Meyer
-
Niklas Hambüchen