Preventing leaked open file descriptors when catching exceptions

Hi all, I'm trying to run a loop that repeatedly attempts to open a file until it succeeds. The file is a named pipe in nonblocking mode, so the writer can only connect after the reader has connected. (Perhaps there is some way to determine this by stat'ing the pipe, but I don't know it yet.) Thus I do something like the following: tryUntilNoIOErr $ do performGC -- The reader must connect first, the writer here spins with backoff. PIO.openFd filename PIO.WriteOnly Nothing fileFlags I'm running GC between iterations to try to make sure I get rid of open files. Also, in the "tryUntilNoIOErr" code below I have some debugging messages which indicate that ioeGetHandle reports no handles associated with the exceptions I'm getting back. (If there were handles provided I could close them explicitly.) In spite of these attempted precautions I'm seeing "too many open files" exceptions in simple benchmarks that should only have a maximum of ONE file open. Any hints / pointers? Thanks, -Ryan mkBackoff :: IO (IO ()) mkBackoff = do tref <- newIORef 1 return$ do t <- readIORef tref writeIORef tref (min maxwait (2 * t)) threadDelay t where maxwait = 50 * 1000 tryUntilNoIOErr :: IO a -> IO a tryUntilNoIOErr action = mkBackoff >>= loop where loop bkoff = handle (\ (e :: IOException) -> do bkoff BSS.hPutStr stderr$ BSS.pack$ " got IO err: " ++ show e case ioeGetHandle e of Nothing -> BSS.hPutStrLn stderr$ BSS.pack$ " no hndl io err." Just x -> BSS.hPutStrLn stderr$ BSS.pack$ " HNDL on io err!" ++ show x loop bkoff) $ action

FYI, lsof confirms that there are indeed many many open connections to the
same FIFO:
Is there some other way to get at (and clean up) the file descriptor that
is left by System.Posix.IO.openFD after it throws an exception?
PingPipes 25115 rrnewton 124r FIFO 8,2 0t0 25166171
/tmp/pipe_9083984821255795683
PingPipes 25115 rrnewton 125r FIFO 8,2 0t0 25166171
/tmp/pipe_9083984821255795683
PingPipes 25115 rrnewton 126r FIFO 8,2 0t0 25166171
/tmp/pipe_9083984821255795683
PingPipes 25115 rrnewton 127r FIFO 8,2 0t0 25166171
/tmp/pipe_9083984821255795683
PingPipes 25115 rrnewton 128r FIFO 8,2 0t0 25166171
/tmp/pipe_9083984821255795683
PingPipes 25115 rrnewton 129r FIFO 8,2 0t0 25166171
/tmp/pipe_9083984821255795683
PingPipes 25115 rrnewton 130r FIFO 8,2 0t0 25166171
/tmp/pipe_9083984821255795683
PingPipes 25115 rrnewton 131r FIFO 8,2 0t0 25166171
/tmp/pipe_9083984821255795683
PingPipes 25115 rrnewton 132r FIFO 8,2 0t0 25166171
/tmp/pipe_9083984821255795683
PingPipes 25115 rrnewton 133r FIFO 8,2 0t0 25166171
/tmp/pipe_9083984821255795683
PingPipes 25115 rrnewton 134r FIFO 8,2 0t0 25166171
/tmp/pipe_9083984821255795683
PingPipes 25115 rrnewton 135r FIFO 8,2 0t0 25166171
/tmp/pipe_9083984821255795683
PingPipes 25115 rrnewton 136r FIFO 8,2 0t0 25166171
/tmp/pipe_9083984821255795683
PingPipes 25115 rrnewton 137r FIFO 8,2 0t0 25166171
/tmp/pipe_9083984821255795683
PingPipes 25115 rrnewton 138r FIFO 8,2 0t0 25166171
/tmp/pipe_9083984821255795683
On Tue, Feb 21, 2012 at 11:13 AM, Ryan Newton
Hi all,
I'm trying to run a loop that repeatedly attempts to open a file until it succeeds. The file is a named pipe in nonblocking mode, so the writer can only connect after the reader has connected. (Perhaps there is some way to determine this by stat'ing the pipe, but I don't know it yet.)
Thus I do something like the following:
tryUntilNoIOErr $ do performGC -- The reader must connect first, the writer here spins with backoff. PIO.openFd filename PIO.WriteOnly Nothing fileFlags
I'm running GC between iterations to try to make sure I get rid of open files. Also, in the "tryUntilNoIOErr" code below I have some debugging messages which indicate that ioeGetHandle reports no handles associated with the exceptions I'm getting back. (If there were handles provided I could close them explicitly.)
In spite of these attempted precautions I'm seeing "too many open files" exceptions in simple benchmarks that should only have a maximum of ONE file open.
Any hints / pointers?
Thanks, -Ryan
mkBackoff :: IO (IO ()) mkBackoff = do tref <- newIORef 1 return$ do t <- readIORef tref writeIORef tref (min maxwait (2 * t)) threadDelay t where maxwait = 50 * 1000
tryUntilNoIOErr :: IO a -> IO a tryUntilNoIOErr action = mkBackoff >>= loop where loop bkoff = handle (\ (e :: IOException) -> do bkoff BSS.hPutStr stderr$ BSS.pack$ " got IO err: " ++ show e case ioeGetHandle e of Nothing -> BSS.hPutStrLn stderr$ BSS.pack$ " no hndl io err." Just x -> BSS.hPutStrLn stderr$ BSS.pack$ " HNDL on io err!" ++ show x loop bkoff) $ action

On Tue, Feb 21, 2012 at 8:16 AM, Ryan Newton
FYI, lsof confirms that there are indeed many many open connections to the same FIFO:
Like all of the lowest-level I/O functions, openFD just gives you back an integer, and the Fd type has no notion that there's an underlying system resource associated with it. It's your responsibility to manage it (i.e. clean up manually when catching an exception).

Quoth "Bryan O'Sullivan"
On Tue, Feb 21, 2012 at 8:16 AM, Ryan Newton
wrote: FYI, lsof confirms that there are indeed many many open connections to the same FIFO:
Like all of the lowest-level I/O functions, openFD just gives you back an integer, and the Fd type has no notion that there's an underlying system resource associated with it. It's your responsibility to manage it (i.e. clean up manually when catching an exception).
What's more - if I understood the hypothesis correctly, that the exception occurs during openFd - that fails to return an Fd because the open(2) system call fails to return one, so it would presumably be an OS level bug if there's really an open file descriptor left from this. Donn

Ah, thanks Bryan. I hadn't looked into it enough to realize that FDs are
just ints and not ForeignPtrs w/ finalizers.
Re: Donn's point. Well, yes, that would seem to be the case! But since I
think a linux bug is unlikely, I'm afraid that there's something else going
on here which I am not thinking of.
I'll make a self contained test of this and send it out.
On Tue, Feb 21, 2012 at 12:53 PM, Donn Cave
On Tue, Feb 21, 2012 at 8:16 AM, Ryan Newton
wrote: FYI, lsof confirms that there are indeed many many open connections to
Quoth "Bryan O'Sullivan"
, the same FIFO:
Like all of the lowest-level I/O functions, openFD just gives you back an integer, and the Fd type has no notion that there's an underlying system resource associated with it. It's your responsibility to manage it (i.e. clean up manually when catching an exception).
What's more - if I understood the hypothesis correctly, that the exception occurs during openFd - that fails to return an Fd because the open(2) system call fails to return one, so it would presumably be an OS level bug if there's really an open file descriptor left from this.
Donn
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

I cannot reproduce pretty much any claim made in this thread. Unless PIO does not mean System.IO.Posix. I run "mkfifo hello" to create a named pipe. Then I run this program to keep trying to open for writing, non-blocking (without anyone at the read end initially): import System.Posix.IO import System.Posix.Types(Fd(..)) import qualified Control.Exception as E import Control.Concurrent(threadDelay) main = do E.handle (\e -> putStrLn ("caught exception: " ++ show (e :: E.IOException))) -- you can change IOException to SomeException too (do fd <- openFd "hello" WriteOnly Nothing defaultFileFlags{nonBlock=True} case fd of Fd n -> putStrLn ("fd number " ++ show n) -- I deliberately leak fd ) threadDelay 1500000 main openFd failures are successfully caught as exceptions; it does not return an Fd that stands for -1 when it fails. (Check its source code and chase down what throwErrnoPathIfMinus1Retry means.) When it fails, it does not leak file descriptors. "lsof hello" shows nothing. To force file descriptors to be leaked and see what lsof says, I then run "cat hello" as the read end while the above program is still running, so that openFd succeeds and I have something to leak. "lsof hello" successfully shows: COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME f 3725 trebla 3w FIFO 8,5 0t0 158922 hello f 3725 trebla 4w FIFO 8,5 0t0 158922 hello f 3725 trebla 5w FIFO 8,5 0t0 158922 hello f 3725 trebla 6w FIFO 8,5 0t0 158922 hello cat 3726 trebla 3r FIFO 8,5 0t0 158922 hello My point is that if "openFd ... WriteOnly" leaks anything, you should be seeing 3w, 4w, etc., emphasis on "w". But you're seeing a ton of "r"s. Your leaker is some read-end code. Ubuntu 11.04 x86 32-bit, kernel 2.6.38, GHC 6.12.3, 7.0.4, 7.2.1, 7.4.1 Lastly, the control structure loop = handle (\e -> ... loop) job is very problematic. Go to the haddock of Control.Exception, search for the string "The difference between using try and catch for recovery" to see why. You should use this: loop = do lr <- try job case lr of Left e -> ... loop Right a -> return a i.e., get out of the exception handler as soon as possible. (Thus, my use of putStrLn inside a handler is also questionable. But mine is a toy. I wouldn't do it in production code.)
participants (4)
-
Albert Y. C. Lai
-
Bryan O'Sullivan
-
Donn Cave
-
Ryan Newton