Proposal: end lazy IO results with errors/exceptions

Currently, withFile "foo" hGetContents >>= putStrLn prints out an empty line, the better to confuse newbies. I propose modifying the lazyRead function in GHC.IO.Handle.Text that currently reads lazyRead :: Handle -> IO String lazyRead handle = unsafeInterleaveIO $ withHandle "hGetContents" handle $ \ handle_ -> do case haType handle_ of ClosedHandle -> return (handle_, "") SemiClosedHandle -> lazyReadBuffered handle handle_ _ -> ioException (IOError (Just handle) IllegalOperation "hGetContents" "illegal handle type" Nothing Nothing) to something like lazyRead :: Handle -> IO String lazyRead handle = unsafeInterleaveIO $ withHandle "hGetContents" handle $ \ handle_ -> do case haType handle_ of ClosedHandle -> return (handle_, error "Forcing the result of a lazy read led to an attempt to read from a closed handle.") SemiClosedHandle -> lazyReadBuffered handle handle_ _ -> ioException (IOError (Just handle) IllegalOperation "hGetContents" "illegal handle type" Nothing Nothing) Ideally that error should instead be something to throw an imprecise exception, but I don't know how to use those yet. I can't personally see a way for this to break sane, working code, but the folks on #ghc thought it should be discussed and debated on list. David Feuer

Hi, superficially¹ +1 Joachim ¹ I like the idea, but can’t tell what the downsides are. Am Montag, den 21.07.2014, 16:16 -0400 schrieb David Feuer:
Currently,
withFile "foo" hGetContents >>= putStrLn
prints out an empty line, the better to confuse newbies.
I propose modifying the lazyRead function in GHC.IO.Handle.Text that currently reads
lazyRead :: Handle -> IO String lazyRead handle = unsafeInterleaveIO $ withHandle "hGetContents" handle $ \ handle_ -> do case haType handle_ of ClosedHandle -> return (handle_, "") SemiClosedHandle -> lazyReadBuffered handle handle_ _ -> ioException (IOError (Just handle) IllegalOperation "hGetContents" "illegal handle type" Nothing Nothing)
to something like
lazyRead :: Handle -> IO String lazyRead handle = unsafeInterleaveIO $ withHandle "hGetContents" handle $ \ handle_ -> do case haType handle_ of ClosedHandle -> return (handle_, error "Forcing the result of a lazy read led to an attempt to read from a closed handle.") SemiClosedHandle -> lazyReadBuffered handle handle_ _ -> ioException (IOError (Just handle) IllegalOperation "hGetContents" "illegal handle type" Nothing Nothing)
Ideally that error should instead be something to throw an imprecise exception, but I don't know how to use those yet. I can't personally see a way for this to break sane, working code, but the folks on #ghc thought it should be discussed and debated on list.
David Feuer _______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries
-- Joachim Breitner e-Mail: mail@joachim-breitner.de Homepage: http://www.joachim-breitner.de Jabber-ID: nomeata@joachim-breitner.de

As far as I can tell, the current behavior is that once the file is
closed, the string representing the contents essentially looks like
the part of the string whose spine was forced ++ zero or more
characters in the buffer ++ []
So a program relying on the current behavior would have to inspect the
stuff past the first (++) *after closing the file* and rely on it
being there, but it can't really rely on any details. That sort of
behavior seems like it belongs in the acme hierarchy.
On Mon, Jul 21, 2014 at 4:21 PM, Joachim Breitner
Hi,
superficially¹ +1
Joachim
¹ I like the idea, but can’t tell what the downsides are.
Am Montag, den 21.07.2014, 16:16 -0400 schrieb David Feuer:
Currently,
withFile "foo" hGetContents >>= putStrLn
prints out an empty line, the better to confuse newbies.
I propose modifying the lazyRead function in GHC.IO.Handle.Text that currently reads
lazyRead :: Handle -> IO String lazyRead handle = unsafeInterleaveIO $ withHandle "hGetContents" handle $ \ handle_ -> do case haType handle_ of ClosedHandle -> return (handle_, "") SemiClosedHandle -> lazyReadBuffered handle handle_ _ -> ioException (IOError (Just handle) IllegalOperation "hGetContents" "illegal handle type" Nothing Nothing)
to something like
lazyRead :: Handle -> IO String lazyRead handle = unsafeInterleaveIO $ withHandle "hGetContents" handle $ \ handle_ -> do case haType handle_ of ClosedHandle -> return (handle_, error "Forcing the result of a lazy read led to an attempt to read from a closed handle.") SemiClosedHandle -> lazyReadBuffered handle handle_ _ -> ioException (IOError (Just handle) IllegalOperation "hGetContents" "illegal handle type" Nothing Nothing)
Ideally that error should instead be something to throw an imprecise exception, but I don't know how to use those yet. I can't personally see a way for this to break sane, working code, but the folks on #ghc thought it should be discussed and debated on list.
David Feuer _______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries
-- Joachim Breitner e-Mail: mail@joachim-breitner.de Homepage: http://www.joachim-breitner.de Jabber-ID: nomeata@joachim-breitner.de
_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

Agreed, this sounds like a great idea.
* Joachim Breitner
Hi,
superficially¹ +1
Joachim
¹ I like the idea, but can’t tell what the downsides are.
Am Montag, den 21.07.2014, 16:16 -0400 schrieb David Feuer:
Currently,
withFile "foo" hGetContents >>= putStrLn
prints out an empty line, the better to confuse newbies.
I propose modifying the lazyRead function in GHC.IO.Handle.Text that currently reads
lazyRead :: Handle -> IO String lazyRead handle = unsafeInterleaveIO $ withHandle "hGetContents" handle $ \ handle_ -> do case haType handle_ of ClosedHandle -> return (handle_, "") SemiClosedHandle -> lazyReadBuffered handle handle_ _ -> ioException (IOError (Just handle) IllegalOperation "hGetContents" "illegal handle type" Nothing Nothing)
to something like
lazyRead :: Handle -> IO String lazyRead handle = unsafeInterleaveIO $ withHandle "hGetContents" handle $ \ handle_ -> do case haType handle_ of ClosedHandle -> return (handle_, error "Forcing the result of a lazy read led to an attempt to read from a closed handle.") SemiClosedHandle -> lazyReadBuffered handle handle_ _ -> ioException (IOError (Just handle) IllegalOperation "hGetContents" "illegal handle type" Nothing Nothing)
Ideally that error should instead be something to throw an imprecise exception, but I don't know how to use those yet. I can't personally see a way for this to break sane, working code, but the folks on #ghc thought it should be discussed and debated on list.
David Feuer _______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries
-- Joachim Breitner e-Mail: mail@joachim-breitner.de Homepage: http://www.joachim-breitner.de Jabber-ID: nomeata@joachim-breitner.de
_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

+1
Petr
Dne 21. 7. 2014 22:16 "David Feuer"
Currently,
withFile "foo" hGetContents >>= putStrLn
prints out an empty line, the better to confuse newbies.
I propose modifying the lazyRead function in GHC.IO.Handle.Text that currently reads
lazyRead :: Handle -> IO String lazyRead handle = unsafeInterleaveIO $ withHandle "hGetContents" handle $ \ handle_ -> do case haType handle_ of ClosedHandle -> return (handle_, "") SemiClosedHandle -> lazyReadBuffered handle handle_ _ -> ioException (IOError (Just handle) IllegalOperation "hGetContents" "illegal handle type" Nothing Nothing)
to something like
lazyRead :: Handle -> IO String lazyRead handle = unsafeInterleaveIO $ withHandle "hGetContents" handle $ \ handle_ -> do case haType handle_ of ClosedHandle -> return (handle_, error "Forcing the result of a lazy read led to an attempt to read from a closed handle.") SemiClosedHandle -> lazyReadBuffered handle handle_ _ -> ioException (IOError (Just handle) IllegalOperation "hGetContents" "illegal handle type" Nothing Nothing)
Ideally that error should instead be something to throw an imprecise exception, but I don't know how to use those yet. I can't personally see a way for this to break sane, working code, but the folks on #ghc thought it should be discussed and debated on list.
David Feuer _______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

+1 from me personally. Even as a "breaking change" it has semantics that
are much more sane.
Any code that can see the exception has a deeply flawed model of how
withFile works and has behavior that will vary fairly wildly across
platforms, give silently wrong answers, etc.
This is just my personal stamp of approval, though, not necessarily the
informed opinion of the core libraries committee.
On Tuesday, July 22, 2014, Petr Pudlák
+1
Petr Dne 21. 7. 2014 22:16 "David Feuer"
javascript:_e(%7B%7D,'cvml','david.feuer@gmail.com');> napsal(a): Currently,
withFile "foo" hGetContents >>= putStrLn
prints out an empty line, the better to confuse newbies.
I propose modifying the lazyRead function in GHC.IO.Handle.Text that currently reads
lazyRead :: Handle -> IO String lazyRead handle = unsafeInterleaveIO $ withHandle "hGetContents" handle $ \ handle_ -> do case haType handle_ of ClosedHandle -> return (handle_, "") SemiClosedHandle -> lazyReadBuffered handle handle_ _ -> ioException (IOError (Just handle) IllegalOperation "hGetContents" "illegal handle type" Nothing Nothing)
to something like
lazyRead :: Handle -> IO String lazyRead handle = unsafeInterleaveIO $ withHandle "hGetContents" handle $ \ handle_ -> do case haType handle_ of ClosedHandle -> return (handle_, error "Forcing the result of a lazy read led to an attempt to read from a closed handle.") SemiClosedHandle -> lazyReadBuffered handle handle_ _ -> ioException (IOError (Just handle) IllegalOperation "hGetContents" "illegal handle type" Nothing Nothing)
Ideally that error should instead be something to throw an imprecise exception, but I don't know how to use those yet. I can't personally see a way for this to break sane, working code, but the folks on #ghc thought it should be discussed and debated on list.
David Feuer _______________________________________________ Libraries mailing list Libraries@haskell.org javascript:_e(%7B%7D,'cvml','Libraries@haskell.org'); http://www.haskell.org/mailman/listinfo/libraries

-1
I think that this change would make otherwise reasonable uses of
hGetContents on large files more likely to end with an exception. It seems
perfectly reasonable to use hGetContents to stream data from a either a
large source or an infinite source (say a pipe or socket or /dev/urandom).
Perhaps you're streaming those bytes into another form, who knows. In any
case, file descriptors are precious and it would make sense to hClose the
file when you're done with it rather than wait and hope that the GC gets to
closing your file.
With this change we'd be introducing asynchronous minefields into code that
don't need to exist.
On Thu, Jul 24, 2014 at 8:50 PM, John Wiegley
+1 from me.
John _______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries
-- Eric Mertens

What is an example of a use case where relying on the current behavior
would be a good thing?
For an exception to happen, one has to force more than what was already
forced when the handle was closed. The vast majority of instances of
this---as far as I can tell---are people immediately closing handles
they've lazily read from, and an exception is probably more informative
than the empty input they currently get.
For correct-given-the-current-spec code to break, it seems like it would
have to be relying on one part of the code forcing the input before closing
to be an implicit signal to another part of the code that inspects the
input after the closing. But I don't see a reason to encourage/support that
kind of design.
I'm in favor of this change, in case that wasn't obvious. :)
On Fri, Jul 25, 2014 at 12:05 AM, Eric Mertens
-1
I think that this change would make otherwise reasonable uses of hGetContents on large files more likely to end with an exception. It seems perfectly reasonable to use hGetContents to stream data from a either a large source or an infinite source (say a pipe or socket or /dev/urandom). Perhaps you're streaming those bytes into another form, who knows. In any case, file descriptors are precious and it would make sense to hClose the file when you're done with it rather than wait and hope that the GC gets to closing your file.
With this change we'd be introducing asynchronous minefields into code that don't need to exist.
On Thu, Jul 24, 2014 at 8:50 PM, John Wiegley
wrote: +1 from me.
John _______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries
-- Eric Mertens
_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

On Fri, Jul 25, 2014 at 11:46 AM, Dan Doel
For an exception to happen, one has to force more than what was already forced when the handle was closed. The vast majority of instances of this---as far as I can tell---are people immediately closing handles they've lazily read from, and an exception is probably more informative than the empty input they currently get.
Perhaps Eric's suggesting that the existing behavior provides a simple asynchronous signalling mechanism? When reading from an infinite source, an empty read signals that the source handle was closed elsewhere in a different thread. The reading thread then acts accordingly, say, branching out of a loop. (I'm neutral on the proposal and on the haskelliness of everything said so far. I'm just trying to understand what everyone's saying in this discussion.) -- Kim-Ee

On Fri, Jul 25, 2014 at 12:05 AM, Eric Mertens
In any case, file descriptors are precious and it would make sense to hClose the file when you're done with it rather than wait and hope that the GC gets to closing your file.
With this change we'd be introducing asynchronous minefields into code that don't need to exist.
hGetContents *is* an asynchronous minefield. This is attempting to make it less so. As for hClose, currently it is erroneous to hClose a handle on which hGetContents has been done; you *cannot* hClose it at any point after you have used hGetContents without losing data, as there is no way to know when it is safe/correct to do so. If you believe otherwise, you may not understand how it is implemented. (unsafeInterleaveIO is "unsafe" for a reason.) -- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net

That's going a little too far. It's safe to close a handle once *you*
know the lazy IO has been done. The burden is on you, unfortunately.
I'm just hoping with this proposal to make it more likely to find out
where you've messed up when things don't work.
Safe:
Open handle
Read lazily from handle
Perform output using lazily read input
Close handle
Safe:
Open handle
Read lazily from handle
Produce a value and ensure that as much of it as you will need has
been forced
Close handle
Return value
Unsafe:
Open handle
Read lazily from handle
Produce a value without ensuring that the parts you need have been forced
Close handle
Return value
On Fri, Jul 25, 2014 at 11:36 AM, Brandon Allbery
On Fri, Jul 25, 2014 at 12:05 AM, Eric Mertens
wrote: In any case, file descriptors are precious and it would make sense to hClose the file when you're done with it rather than wait and hope that the GC gets to closing your file.
With this change we'd be introducing asynchronous minefields into code that don't need to exist.
hGetContents *is* an asynchronous minefield. This is attempting to make it less so.
As for hClose, currently it is erroneous to hClose a handle on which hGetContents has been done; you *cannot* hClose it at any point after you have used hGetContents without losing data, as there is no way to know when it is safe/correct to do so. If you believe otherwise, you may not understand how it is implemented. (unsafeInterleaveIO is "unsafe" for a reason.)
-- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net
_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

I've uploaded this change (with a much more useful error message) as D327
on Phabricator. I have done some very limited testing, and it passes GHC's
validation (on Linux), but I would really appreciate if some people who use
a lot of lazy IO could test this against their programs to make sure it
doesn't produce errors in any cases when it shouldn't.
Thanks,
David
On Mon, Jul 21, 2014 at 4:16 PM, David Feuer
Currently,
withFile "foo" hGetContents >>= putStrLn
prints out an empty line, the better to confuse newbies.
I propose modifying the lazyRead function in GHC.IO.Handle.Text that currently reads
lazyRead :: Handle -> IO String lazyRead handle = unsafeInterleaveIO $ withHandle "hGetContents" handle $ \ handle_ -> do case haType handle_ of ClosedHandle -> return (handle_, "") SemiClosedHandle -> lazyReadBuffered handle handle_ _ -> ioException (IOError (Just handle) IllegalOperation "hGetContents" "illegal handle type" Nothing Nothing)
to something like
lazyRead :: Handle -> IO String lazyRead handle = unsafeInterleaveIO $ withHandle "hGetContents" handle $ \ handle_ -> do case haType handle_ of ClosedHandle -> return (handle_, error "Forcing the result of a lazy read led to an attempt to read from a closed handle.") SemiClosedHandle -> lazyReadBuffered handle handle_ _ -> ioException (IOError (Just handle) IllegalOperation "hGetContents" "illegal handle type" Nothing Nothing)
Ideally that error should instead be something to throw an imprecise exception, but I don't know how to use those yet. I can't personally see a way for this to break sane, working code, but the folks on #ghc thought it should be discussed and debated on list.
David Feuer
participants (10)
-
Brandon Allbery
-
Dan Doel
-
David Feuer
-
Edward Kmett
-
Eric Mertens
-
Joachim Breitner
-
John Wiegley
-
Kim-Ee Yeoh
-
Petr Pudlák
-
Roman Cheplyaka