[GHC] #14143: forkProcess leaks file descriptors

21 Aug 2017

      #14143: forkProcess leaks file descriptors
----------------------------------------+---------------------------
           Reporter:  danharaj          |             Owner:  (none)
               Type:  feature request   |            Status:  new
           Priority:  normal            |         Milestone:
          Component:  libraries/base    |           Version:  8.2.1
           Keywords:                    |  Operating System:  POSIX
       Architecture:  Unknown/Multiple  |   Type of failure:  Other
          Test Case:                    |        Blocked By:
           Blocking:                    |   Related Tickets:
Differential Rev(s):                    |         Wiki Page:
----------------------------------------+---------------------------
 This is normal behavior as forking a process in POSIX will copy all file
 descriptors unless they are marked O_CLOEXEC. But in Haskell it's quite
 difficult to figure out which FDs need to be manually closed.

 For example, if a `Handle` to a file is opened in the parent process and
 isn't referenced in the code passed to `forkProcess`, its FD will leak. In
 order to safely fork, a user has to know about all `Handle`s and other
 structures that use file descriptors currently active in the program as
 well as which ones will survive by being referenced in the child process.

 A simpler problem is wanting to close most FDs (e.g. perhaps excepting
 std*) when forking. When you don't know where the file descriptors in the
 current process are coming from but you want them to be closed, a not
 uncommon approach is to iterate over all file descriptors and close them
 all. The `process` library does this. This doesn't work for `forkProcess`
 if a Haskell program is built against the threaded runtime because the IO
 event manager holds on to file descriptors it uses for control. Attempting
 to iterate over all FDs carelessly causes the IO manager to die when
 `-threaded` is used. As far as I understand, all of these FDs are held by
 the `Control` structure associated with an `EventManager`:
 [https://hackage.haskell.org/package/base-4.10.0.0/docs/src/GHC.Event.Control...]
 .

 The `base` library does not expose these modules so there is no way to
 figure out what they are from user code.

 In one's own application, these issues are tricky but ultimately
 surmountable as one in principle has the ability to track down every file
 descriptor being opened. However, when using `forkProcess` in a library,
 one might need a sledgehammer. For example, in the `hdaemonize` package it
 is noted that the library can leak file descriptors as there is no way to
 deal with this issue:
 [https://hackage.haskell.org/package/hdaemonize-0.5.4/docs/System-Posix-
 Daemonize.html#v:daemonize]

 I am writing a library in the same design space as `hdaemonize` that I
 would like to be able to sensibly handle file descriptors. In general the
 problem looks intractable (for example because arbitrary C libraries could
 initialize their own internal FDs), but if I could know which file
 descriptors are being used by the IO Manager, then I could at least
 provide for the use case where no FDs should be shared between parent and
 child.

 Would it be sensible to expose more of the guts of the IO Manager in
 `base`? Are there other parts of the RTS that use file descriptors that
 need to be preserved?

-- 
Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14143
GHC http://www.haskell.org/ghc/
The Glasgow Haskell Compiler