Question about IO, particularly hGetContents

Hello all, I was playing with some Haskell code that read in text files and processed them and often found myself writing empty result-files. I've pared the problem down to the following small example. -- example program import IO main = do rhdl <- openFile "test.in" ReadMode content <- hGetContents rhdl putStrLn content -- if I cmt out this line, content will be empty hClose rhdl putStrLn "Content: " putStrLn content -- if the first 'putStrLn' call was commented, this will print a blank line -- end example For some reason, if I comment out the 'putStrLn content' between hGetContents and hClose, the data from the hGetContents call is not stored. Can somebody verify this behavior and (if so) explain why it's happening? Thanks, -- kov

On 04/01/10 21:06, Ken Overton wrote:
Hello all, I was playing with some Haskell code that read in text files and processed them and often found myself writing empty result-files. I've pared the problem down to the following small example.
-- example program import IO main = do rhdl<- openFile "test.in" ReadMode content<- hGetContents rhdl putStrLn content -- if I cmt out this line, content will be empty hClose rhdl putStrLn "Content: " putStrLn content -- if the first 'putStrLn' call was commented, this will print a blank line -- end example
For some reason, if I comment out the 'putStrLn content' between hGetContents and hClose, the data from the hGetContents call is not stored. Can somebody verify this behavior and (if so) explain why it's happening?
Yes. hGetContents retrieves the contents "lazily", only reading them when demanded by the program*. When it's read all the contents, it automatically closes the file, so you shouldn't use hClose in combination with it (also it stops any future reading from happening, which is why your program broke). *implemented using the somewhat controversial "unsafeInterleaveIO". Some people believe that lazy IO like this is not a good idea, for reasons similar to the one you encountered, but in circumstances that are harder to fix. -Isaac

Thanks for the explanation and the warning about hClose. I thought that using do{} blocks forced lazy evaluations? ________________________________________ From: Isaac Dupree [ml@isaac.cedarswampstudios.org] Sent: Thursday, April 01, 2010 9:22 PM To: Ken Overton Cc: beginners@haskell.org Subject: Re: [Haskell-beginners] Question about IO, particularly hGetContents On 04/01/10 21:06, Ken Overton wrote:
Hello all, I was playing with some Haskell code that read in text files and processed them and often found myself writing empty result-files. I've pared the problem down to the following small example.
-- example program import IO main = do rhdl<- openFile "test.in" ReadMode content<- hGetContents rhdl putStrLn content -- if I cmt out this line, content will be empty hClose rhdl putStrLn "Content: " putStrLn content -- if the first 'putStrLn' call was commented, this will print a blank line -- end example
For some reason, if I comment out the 'putStrLn content' between hGetContents and hClose, the data from the hGetContents call is not stored. Can somebody verify this behavior and (if so) explain why it's happening?
Yes. hGetContents retrieves the contents "lazily", only reading them when demanded by the program*. When it's read all the contents, it automatically closes the file, so you shouldn't use hClose in combination with it (also it stops any future reading from happening, which is why your program broke). *implemented using the somewhat controversial "unsafeInterleaveIO". Some people believe that lazy IO like this is not a good idea, for reasons similar to the one you encountered, but in circumstances that are harder to fix. -Isaac

I'm a bit of a beginner so I might be wrong, but I think do only forces evaluation at the level of the do block, not recursively. Think of it this way, you've got a series of function calls, which are represented as thunks, and which in turn return more thunks. If you use do notation to execute that series of functions they all get evaluated in order, but you're still left with more un-evaluated thunks because that's what the functions returned. To use your program as an example it gets evaluated as follows:
import IO main = do -- starts do block rhdl<- openFile "test.in" ReadMode -- thunk to open test.in for reading, I'm not sure if it actually opens test.in at this point, or if that happens when you call hGetContents or hClose. content<- hGetContents rhdl -- thunk to get a lazy read into rhdl and assign that thunk to content
At this point content hasn't been evaluated, it's still a thunk, *but* hGetContents has been evaluated and produced the thunk that's stored in content.
hClose rhdl -- thunk to close rhdl
This thunk closes rhdl, and gets evaluated after the thunk that gets the thunk to read the contents of rhdl.
putStrLn "Content: " -- thunk to write "Content: " to stdout putStrLn content -- thunk to write content to stdout, this will
force evaluation of the thunk stored in content This last thunk attempts to evaluate the thunk stored in content, which in turn tries to read from rhdl, which it can't because rhdl has already been closed. If you enable the bang patterns extension you could fix this I think by doing:
{-# LANGUAGE BangPatterns #-} import IO main = do rhdl<- openFile "test.in" ReadMode !content <- hGetContents rhdl putStrLn "Content: " putStrLn content
Which would force the evaluation of the thunk stored in content. The
difference between the non-bang pattern version and the above is fairly
trivial in this case, as even if you didn't have the bang on the content
variable it would still be evaluated 2 lines down at the putStrLn call, but
if you had much larger files, and perhaps didn't force evaluation for quite
some time you could quite easily chew up a lot of memory. Lazy IO is both a
blessing and a curse, although the more I see and the more I think about it,
the more I'm in favor of simply not allowing lazy IO, as it's just way too
easy to make mistakes with lazy IO.
-R. Kyle Murphy
--
Curiosity was framed, Ignorance killed the cat.
On Thu, Apr 1, 2010 at 22:36, Ken Overton
Thanks for the explanation and the warning about hClose. I thought that using do{} blocks forced lazy evaluations?
________________________________________ From: Isaac Dupree [ml@isaac.cedarswampstudios.org] Sent: Thursday, April 01, 2010 9:22 PM To: Ken Overton Cc: beginners@haskell.org Subject: Re: [Haskell-beginners] Question about IO, particularly hGetContents
Hello all, I was playing with some Haskell code that read in text files and processed them and often found myself writing empty result-files. I've
On 04/01/10 21:06, Ken Overton wrote: pared the problem down to the following small example.
-- example program import IO main = do rhdl<- openFile "test.in" ReadMode content<- hGetContents rhdl putStrLn content -- if I cmt out this line, content will be empty hClose rhdl putStrLn "Content: " putStrLn content -- if the first 'putStrLn' call was commented,
this will print a blank line
-- end example
For some reason, if I comment out the 'putStrLn content' between hGetContents and hClose, the data from the hGetContents call is not stored. Can somebody verify this behavior and (if so) explain why it's happening?
Yes. hGetContents retrieves the contents "lazily", only reading them when demanded by the program*. When it's read all the contents, it automatically closes the file, so you shouldn't use hClose in combination with it (also it stops any future reading from happening, which is why your program broke).
*implemented using the somewhat controversial "unsafeInterleaveIO". Some people believe that lazy IO like this is not a good idea, for reasons similar to the one you encountered, but in circumstances that are harder to fix.
-Isaac_______________________________________________ Beginners mailing list Beginners@haskell.org http://www.haskell.org/mailman/listinfo/beginners

On 04/02/10 10:28, Kyle Murphy wrote:
I'm a bit of a beginner so I might be wrong, but I think do only forces evaluation at the level of the do block, not recursively. Think of it this way, you've got a series of function calls, which are represented as thunks, and which in turn return more thunks. If you use do notation to execute that series of functions they all get evaluated in order, but you're still left with more un-evaluated thunks because that's what the functions returned.
The only reason hGetContents was able to return thunks with not-yet-completed IO is because it cheated and used unsafeInterleaveIO. "hGetContents" is one of the worst functions you could possibly pick to learn the basic semantics of Haskell IO. Normally, an IO-action being executed in sequence will force all the input and output it contains to complete, though some pure computation might remain not-yet-evaluated (say, sorting or summing a list that you'd retrieved from IO...) -Isaac
participants (3)
-
Isaac Dupree
-
Ken Overton
-
Kyle Murphy