[GHC] #14143: forkProcess leaks file descriptors

#14143: forkProcess leaks file descriptors ----------------------------------------+--------------------------- Reporter: danharaj | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: Component: libraries/base | Version: 8.2.1 Keywords: | Operating System: POSIX Architecture: Unknown/Multiple | Type of failure: Other Test Case: | Blocked By: Blocking: | Related Tickets: Differential Rev(s): | Wiki Page: ----------------------------------------+--------------------------- This is normal behavior as forking a process in POSIX will copy all file descriptors unless they are marked O_CLOEXEC. But in Haskell it's quite difficult to figure out which FDs need to be manually closed. For example, if a `Handle` to a file is opened in the parent process and isn't referenced in the code passed to `forkProcess`, its FD will leak. In order to safely fork, a user has to know about all `Handle`s and other structures that use file descriptors currently active in the program as well as which ones will survive by being referenced in the child process. A simpler problem is wanting to close most FDs (e.g. perhaps excepting std*) when forking. When you don't know where the file descriptors in the current process are coming from but you want them to be closed, a not uncommon approach is to iterate over all file descriptors and close them all. The `process` library does this. This doesn't work for `forkProcess` if a Haskell program is built against the threaded runtime because the IO event manager holds on to file descriptors it uses for control. Attempting to iterate over all FDs carelessly causes the IO manager to die when `-threaded` is used. As far as I understand, all of these FDs are held by the `Control` structure associated with an `EventManager`: [https://hackage.haskell.org/package/base-4.10.0.0/docs/src/GHC.Event.Control...] . The `base` library does not expose these modules so there is no way to figure out what they are from user code. In one's own application, these issues are tricky but ultimately surmountable as one in principle has the ability to track down every file descriptor being opened. However, when using `forkProcess` in a library, one might need a sledgehammer. For example, in the `hdaemonize` package it is noted that the library can leak file descriptors as there is no way to deal with this issue: [https://hackage.haskell.org/package/hdaemonize-0.5.4/docs/System-Posix- Daemonize.html#v:daemonize] I am writing a library in the same design space as `hdaemonize` that I would like to be able to sensibly handle file descriptors. In general the problem looks intractable (for example because arbitrary C libraries could initialize their own internal FDs), but if I could know which file descriptors are being used by the IO Manager, then I could at least provide for the use case where no FDs should be shared between parent and child. Would it be sensible to expose more of the guts of the IO Manager in `base`? Are there other parts of the RTS that use file descriptors that need to be preserved? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14143 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14143: forkProcess leaks file descriptors -------------------------------------+------------------------------------- Reporter: danharaj | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: Component: libraries/base | Version: 8.2.1 Resolution: | Keywords: Operating System: POSIX | Architecture: | Unknown/Multiple Type of failure: Other | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by svenpanne): This one seems to be a *nix classic... :-} Basically the same issue has been discussed e.g. here https://sourceware.org/bugzilla/show_bug.cgi?id=10353. Although I'm not a big fan of Ulrich Drepper, I think he is right in this case: You simply can't know which FDs are open for what reason, at least you can't if you use any kind of external library. The GHC RTS is just one kind of such library, and having some hook for it doesn't solve the problem in general. What about (native) libraries you use? Would it be correct to simply close their FDs or not? One can't know. I think the main problem is that the default for open() is wrong: Normally you do not want to pass down FDs to subprocesses, so O_CLOEXEC should be the default IMHO. In the rare case that the application/library wants an FD to stay open, it should say so explicitly. BTW: What about the O_CLOEXEC flag for the FDs behind Haskell's Handles? Is it on or off? Do we really have the right default? I might be oversimplifying some things here, but given the *nix history, this is a hard problem, and I consider any library closing FDs it doesn't own buggy. Just my 2c... :-) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14143#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14143: forkProcess leaks file descriptors -------------------------------------+------------------------------------- Reporter: danharaj | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: Component: libraries/base | Version: 8.2.1 Resolution: | Keywords: Operating System: POSIX | Architecture: | Unknown/Multiple Type of failure: Other | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): I agree with Sven; `forkProcess` is (sadly) essentially impossible to support robustly. I have tried quite hard to use it in my own projects in the past but ultimately gave up due to similar issues (or rather, even worse: a program with multiple Haskell threads writing to `stdout` may deadlock after fork as a thread has taken the `Handle`'s `MVar` at the time that `forkProcess` killed the holding thread). tl;dr: `forkProcess` is a can of worms which introduces a number of essentially unsolvable problems. Avoid it at all costs. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14143#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC