conduit/pipes/streaming and prompt cleanup

4 Jun 2019

      I'm using the 'streaming' library and realized it doesn't close files in a
timely way for the way I'm using it, and in fact can't, due to how the library
works.  I know conduit has put a lot of thought into closing resources in a
timely way, so I did an experiment to see what it does, but as far as I can
tell, conduit has the same problem.  Maybe I'm doing it wrong?

The situation is that I'm opening multiple files and mixing their output.
I want to close the inputs as soon as I'm done with them.  But I can get
done with them earlier than the end of the file.  Since all of these libraries
are based on pulling from downstream, if you don't pull all the way to the end,
the close at the end doesn't happen, and has to wait until runResourceT
returns, which is too late.  I remember long ago reading Oleg's original
iteratee paper, and it seems like he called out this problem with pull-based
iterators, that the iterator doesn't know when its caller is done with it, so
it can't close files on time.

Here's a conduit version that I think illustrates the situation:

    import qualified Conduit as C
    import           Conduit ((.|))
    import qualified Control.Monad.Trans as Trans
    import qualified System.IO as IO

    main :: IO ()
    main = C.runResourceT $ C.runConduit pipe

    pipe :: C.ConduitM a c (C.ResourceT IO) ()
    pipe = fileLines "TODO" .| (C.takeC 3 >> C.yield "***")
        .| C.mapM_C (Trans.liftIO . putStrLn)

    fileLines :: C.MonadResource m => FilePath -> C.ConduitT i String m ()
    fileLines fname = C.bracketP (IO.openFile fname IO.ReadMode) close
handleLines

    handleLines :: Trans.MonadIO m => IO.Handle -> C.ConduitT i String m ()
    handleLines hdl = loop
        where
        loop = do
            eof <- Trans.liftIO $ IO.hIsEOF hdl
            if eof then return () else do
                line <- Trans.liftIO $ IO.hGetLine hdl
                C.yield line
                loop

    close :: IO.Handle -> IO ()
    close hdl = IO.hClose hdl >> putStrLn "=== close"

This prints the first three lines of TOOD, then ***, and then "=== close",
where the close should go before the ***s.

As far as I can see, conduit can't do this any more than 'streaming' can,
because 'C.takeC' is just some awaits and yields, with no indication that
the final await is more special than any other await.

I think what would be necessary to solve this is that combinators like 'take'
have to be able to tell the stream to close, and that has to propagate back up
to each producer that has registered a cleanup.  Of course this renders the
stream invalid, so it's not safe to have any other references to the stream
around, but I guess streams are stateful in general so that's always true.
Maybe I could accumulate the finalizers in the stream data type and have
combinators like 'take' call it as soon as they've taken their last.  What
I actually wound up doing was make a 'takeClose' that also takes a 'close'
action to run when its done with the stream.  It's not exactly general but I'm
not writing a library so I don't need general.

Is there some kind of standard or built-in solution for this situation?
I know others have given a lot more thought to streaming than I have,
so surely this issue has come up.

I know there is a lot of talk about "prompt finalisation" and streams vs.
pipes vs. conduit, and talk about brackets and whatnot, but despite reading
various documents (https://hackage.haskell.org/package/streaming-with,
http://www.haskellforall.com/2013/01/pipes-safe-10-resource-management-and.h...,
etc.) I still don't really understand what they're talking about.
It seems like they're really about reliable cleanup when there are
exceptions, not really about prompt cleanup.  Certainly pipes-safe doesn't do
prompt cleanup, at least not the kind I'm talking about.

Evan Laforge

Michael Snoyman

Evan Laforge

tags

participants (2)