
#14143: forkProcess leaks file descriptors ----------------------------------------+--------------------------- Reporter: danharaj | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: Component: libraries/base | Version: 8.2.1 Keywords: | Operating System: POSIX Architecture: Unknown/Multiple | Type of failure: Other Test Case: | Blocked By: Blocking: | Related Tickets: Differential Rev(s): | Wiki Page: ----------------------------------------+--------------------------- This is normal behavior as forking a process in POSIX will copy all file descriptors unless they are marked O_CLOEXEC. But in Haskell it's quite difficult to figure out which FDs need to be manually closed. For example, if a `Handle` to a file is opened in the parent process and isn't referenced in the code passed to `forkProcess`, its FD will leak. In order to safely fork, a user has to know about all `Handle`s and other structures that use file descriptors currently active in the program as well as which ones will survive by being referenced in the child process. A simpler problem is wanting to close most FDs (e.g. perhaps excepting std*) when forking. When you don't know where the file descriptors in the current process are coming from but you want them to be closed, a not uncommon approach is to iterate over all file descriptors and close them all. The `process` library does this. This doesn't work for `forkProcess` if a Haskell program is built against the threaded runtime because the IO event manager holds on to file descriptors it uses for control. Attempting to iterate over all FDs carelessly causes the IO manager to die when `-threaded` is used. As far as I understand, all of these FDs are held by the `Control` structure associated with an `EventManager`: [https://hackage.haskell.org/package/base-4.10.0.0/docs/src/GHC.Event.Control...] . The `base` library does not expose these modules so there is no way to figure out what they are from user code. In one's own application, these issues are tricky but ultimately surmountable as one in principle has the ability to track down every file descriptor being opened. However, when using `forkProcess` in a library, one might need a sledgehammer. For example, in the `hdaemonize` package it is noted that the library can leak file descriptors as there is no way to deal with this issue: [https://hackage.haskell.org/package/hdaemonize-0.5.4/docs/System-Posix- Daemonize.html#v:daemonize] I am writing a library in the same design space as `hdaemonize` that I would like to be able to sensibly handle file descriptors. In general the problem looks intractable (for example because arbitrary C libraries could initialize their own internal FDs), but if I could know which file descriptors are being used by the IO Manager, then I could at least provide for the use case where no FDs should be shared between parent and child. Would it be sensible to expose more of the guts of the IO Manager in `base`? Are there other parts of the RTS that use file descriptors that need to be preserved? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14143 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler