
Hello, In describing the Handle type, the GHC documentation says (in the System.IO documentation): GHC note: a Handle will be automatically closed when the garbage collector detects that it has become unreferenced by the program. However, relying on this behaviour is not generally recommended: the garbage collector is unpredictable. If possible, use explicit an explicit hClose to close Handles when they are no longer required. GHC does not currently attempt to free up file descriptors when they have run out, it is your responsibility to ensure that this doesn't happen. But one cannot call hClose on Handles on which something like hGetContents has been called; it just terminates the character list at the point till which it has already read. Further the manual says that hGetContents puts the handle in the semi-closed state, and further, A semi-closed handle becomes closed: - if hClose is applied to it; - if an I/O error occurs when reading an item from the handle; - or once the entire contents of the handle has been read. So do I safely assume here, according to the third point above, that it's fine if I do not call hClose explicitly as far as I am consuming all the contents returned by hGetContents? Thanks, Abhay

2008/4/14 Abhay Parvate
Hello,
In describing the Handle type, the GHC documentation says (in the System.IO documentation):
GHC note: a Handle will be automatically closed when the garbage collector detects that it has become unreferenced by the program. However, relying on this behaviour is not generally recommended: the garbage collector is unpredictable. If possible, use explicit an explicit hClose to close Handles when they are no longer required. GHC does not currently attempt to free up file descriptors when they have run out, it is your responsibility to ensure that this doesn't happen.
But one cannot call hClose on Handles on which something like hGetContents has been called; it just terminates the character list at the point till which it has already read. Further the manual says that hGetContents puts the handle in the semi-closed state, and further,
A semi-closed handle becomes closed:
- if hClose is applied to it; - if an I/O error occurs when reading an item from the handle; - or once the entire contents of the handle has been read.
So do I safely assume here, according to the third point above, that it's fine if I do not call hClose explicitly as far as I am consuming all the contents returned by hGetContents?
Yes, not only is it fine, it's recommended! Calling hClose explicitly on a handle after calling hGetContents is a sure way to introduce bugs. -Brent

Thanks! I was worried about how/where would I place hClose!
On Mon, Apr 14, 2008 at 10:58 PM, Brent Yorgey
2008/4/14 Abhay Parvate
: Hello,
In describing the Handle type, the GHC documentation says (in the System.IO documentation):
GHC note: a Handle will be automatically closed when the garbage collector detects that it has become unreferenced by the program. However, relying on this behaviour is not generally recommended: the garbage collector is unpredictable. If possible, use explicit an explicit hClose to close Handles when they are no longer required. GHC does not currently attempt to free up file descriptors when they have run out, it is your responsibility to ensure that this doesn't happen.
But one cannot call hClose on Handles on which something like hGetContents has been called; it just terminates the character list at the point till which it has already read. Further the manual says that hGetContents puts the handle in the semi-closed state, and further,
A semi-closed handle becomes closed:
- if hClose is applied to it; - if an I/O error occurs when reading an item from the handle; - or once the entire contents of the handle has been read.
So do I safely assume here, according to the third point above, that it's fine if I do not call hClose explicitly as far as I am consuming all the contents returned by hGetContents?
Yes, not only is it fine, it's recommended! Calling hClose explicitly on a handle after calling hGetContents is a sure way to introduce bugs.
-Brent

I usually use something like this instead:
hStrictGetContents :: Handle -> IO String
hStrictGetContents h = do
s <- hGetContents h
length s `seq` hClose h
return s
This guarantees the following:
1) The whole file is read before hStrictGetContents exits (could be
considered bad, but usually it's The Right Thing)
2) You guarantee that you don't leak file handles (good benefit!)
A slightly better version:
import qualified Data.ByteString.Char8 as B
hStrictGetContents :: Handle -> IO String
hStrictGetContents h = do
bs <- B.hGetContents h
hClose h -- not sure if this is required; ByteString documentation
isn't clear.
return $ B.unpack bs -- lazy unpack into String
This saves a ton of memory for big reads; a String is ~12 bytes per
character, this is only 1 byte per character + fixed overhead. Then,
assuming the function consuming the String doesn't leak, you'll end up
with a much smaller space requirement.
-- ryan
2008/4/14 Abhay Parvate
Thanks! I was worried about how/where would I place hClose!
On Mon, Apr 14, 2008 at 10:58 PM, Brent Yorgey
wrote: 2008/4/14 Abhay Parvate
: Hello,
In describing the Handle type, the GHC documentation says (in the
System.IO documentation):
GHC note: a Handle will be automatically closed when the garbage
collector detects that it has become unreferenced by the program. However, relying on this behaviour is not generally recommended: the garbage collector is unpredictable. If possible, use explicit an explicit hClose to close Handles when they are no longer required. GHC does not currently attempt to free up file descriptors when they have run out, it is your responsibility to ensure that this doesn't happen. > > > > But one cannot call hClose on Handles on which something like hGetContents has been called; it just terminates the character list at the point till which it has already read. Further the manual says that hGetContents puts the handle in the semi-closed state, and further, > > > > > > A semi-closed handle becomes closed: > > > > if hClose is applied to it; > > if an I/O error occurs when reading an item from the handle; > > or once the entire contents of the handle has been read. So do I safely assume here, according to the third point above, that it's fine if I do not call hClose explicitly as far as I am consuming all the contents returned by hGetContents? > > > > > Yes, not only is it fine, it's recommended! Calling hClose explicitly on a handle after calling hGetContents is a sure way to introduce bugs. > > -Brent > >
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Thanks Ryan, this will definitely not leak handles. I had thought about
making a strict version of hGetContents, though on a bit different lines.
My question was that since the documentation says that the semi-closed
handle becomes closed as soon as the entire contents have been read; can I
conclude that as far as I consume the string, I am not leaking handles?
I am still interested in using hGetContents, since these contents are going
soon through hPutStr, which will consume it anyway. And hGetContents being
lazy will not occupy memory of the order of size of the input file. That's
why the question.
Regards,
Abhay
On Tue, Apr 15, 2008 at 1:07 PM, Ryan Ingram
I usually use something like this instead:
hStrictGetContents :: Handle -> IO String hStrictGetContents h = do s <- hGetContents h length s `seq` hClose h return s
This guarantees the following: 1) The whole file is read before hStrictGetContents exits (could be considered bad, but usually it's The Right Thing) 2) You guarantee that you don't leak file handles (good benefit!)
A slightly better version:
import qualified Data.ByteString.Char8 as B
hStrictGetContents :: Handle -> IO String hStrictGetContents h = do bs <- B.hGetContents h hClose h -- not sure if this is required; ByteString documentation isn't clear. return $ B.unpack bs -- lazy unpack into String
This saves a ton of memory for big reads; a String is ~12 bytes per character, this is only 1 byte per character + fixed overhead. Then, assuming the function consuming the String doesn't leak, you'll end up with a much smaller space requirement.
-- ryan
Thanks! I was worried about how/where would I place hClose!
On Mon, Apr 14, 2008 at 10:58 PM, Brent Yorgey
wrote: 2008/4/14 Abhay Parvate
: Hello,
In describing the Handle type, the GHC documentation says (in the
System.IO documentation):
GHC note: a Handle will be automatically closed when the garbage
collector detects that it has become unreferenced by the program. However, relying on this behaviour is not generally recommended: the garbage collector is unpredictable. If possible, use explicit an explicit hClose to close Handles when they are no longer required. GHC does not currently attempt to free up file descriptors when they have run out, it is your responsibility to ensure that this doesn't happen. > > > > But one cannot call hClose on Handles on which something like hGetContents has been called; it just terminates the character list at
2008/4/14 Abhay Parvate
: the point till which it has already read. Further the manual says that hGetContents puts the handle in the semi-closed state, and further,
A semi-closed handle becomes closed:
if hClose is applied to it; if an I/O error occurs when reading an item from the handle; or once the entire contents of the handle has been read. So do I
safely assume here, according to the third point above, that it's fine if I do not call hClose explicitly as far as I am consuming all the contents returned by hGetContents?
Yes, not only is it fine, it's recommended! Calling hClose explicitly on a handle after calling hGetContents is a sure way to introduce bugs.
-Brent
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Ryan Ingram wrote:
I usually use something like this instead:
hStrictGetContents :: Handle -> IO String hStrictGetContents h = do s <- hGetContents h length s `seq` hClose h return s
A small idiomatic nitpick: When I see (length s) gets computed and thrown away I wince at the wasted effort. I would prefer (finiteSpine s): finiteSpine = foldr (const id) () hStrictGetContents :: Handle -> IO String hStrictGetContents h = do s <- hGetContents h finiteSpine s `seq` hClose h return s "finiteSpine" finds the "end" of a finite list and will hang forever on an infinite list. One can even notice that the type of finiteSpine is Strategy [a]: import Control.Parallel.Strategies(Strategy) finiteSpine :: Strategy [a] finiteSpine = foldr (const id) () And in fact "finiteSpine = seqList r0", which returns () after applying the "do nothing" strategy "r0" to every element. -- Chris

On 4/15/08, ChrisK
A small idiomatic nitpick: When I see (length s) gets computed and thrown away I wince at the wasted effort. I would prefer (finiteSpine s):
On every piece of hardware I've seen, the actual calculation done by "length" is basically free. Compared to the cache misses you'll get from traversing the list, or especially the disk access from reading the file, it's vanishingly small. It's also directly from the prelude and it's usually pretty clear to a newbie what it's doing, as compared to "foldr (const id) () s" which is on the path to "functional languages make no sense" land. I consider myself moderately experienced, and assuming that it typechecks I know what it has to mean, but I can't just look at it and know what it does like I can with "length". If there was a standard "seqList" or "deepSeq", I'd use that instead. -- ryan

Ryan Ingram wrote:
On 4/15/08, ChrisK
wrote: A small idiomatic nitpick: When I see (length s) gets computed and thrown away I wince at the wasted effort. I would prefer (finiteSpine s):
On every piece of hardware I've seen, the actual calculation done by "length" is basically free. Compared to the cache misses you'll get from traversing the list, or especially the disk access from reading the file, it's vanishingly small.
It's also directly from the prelude and it's usually pretty clear to a newbie what it's doing, as compared to "foldr (const id) () s" which is on the path to "functional languages make no sense" land. I consider myself moderately experienced, and assuming that it typechecks I know what it has to mean, but I can't just look at it and know what it does like I can with "length".
If there was a standard "seqList" or "deepSeq", I'd use that instead.
I am almost sure that there is something in Control.Parallel.Strategies that you can use (but too lazy to look it up now...) Cheers Ben
participants (5)
-
Abhay Parvate
-
Ben Franksen
-
Brent Yorgey
-
ChrisK
-
Ryan Ingram