
Moved to haskell-cafe...
Thanks for your comments. (I had overlooked "readFile", which certainly looks safer. I'm not yet fully used to the distribution of functions between standard libraries and the Prelude. As for my 'hSafeGetContents', I realized exactly your point about using 'id' too late, after I'd sent my email. :-( )
I've done this many many times...too often do I find myself in the state of "open mouth, insert foot." People tend to be pretty forgiving though.
The broader debate I was trying to evoke was: how, as a programmer using a combination of monads and non-strict functions, can I be confident that there aren't hidden interactions waiting to bite?
Well, I think it's more the combination of *strict* monads and non-strict functions that causes trouble. In fact, as pointed out before, lazy IO uses unsafeInterleaveIO to work. I think the rationale for lazy IO is clear. Without lazy IO, the following program, when run on a 2gb file, will allocate 2gb of memory:
main = interact (map toUpper)
whereas, with lazy IO, it will run in constant (small) space. I think the point of lazy IO (and people who designed it can probably speak better to this) is to allow people to continue to write pure functions. That is, without lazy IO, the above would have to be written something like:
main = do b <- hIsEOF stdin when (not b) $ do c <- hGetChar stdin putChar (toUpper c) main
or something like that. obviously in *this case*, we would prefer to use the pure function map. However, in order to use map, we need to only read in characters as they are demanded, which means we need lazy IO. I think that what it really boils down to is you need to ask yourself: am I benefiting by using lazy IO. In the 'interact (map toUpper)' case, clearly you are, so you should probably continue to you it. However, in a case where you know you're going to read the whole file and store it in some data structure and need to simultaneously use data from the beginning and end of the file, then you're probably not benefiting from lazy IO. It is somewhat unfortunate that the IO library doesn't contain strict versions of all the operations, but they're easy enough to write with DeepSeq in your pocket.
Using the hOpen ... hClose vs readFile case as an example, is the difficulty here to do with the interleaving of multiple statements that modify some state with non-strict functions that return results based on some particular instance of that state? Is this really a fundamental mismatch?
Yes, this is precisely the problem. Using hGetContents implicitly uses unsafeInterleaveIO, which basically only performs IO as demanded. When you then say 'hClose h', none of the input has been demanded, so nothing gets read. Ever. Someone (one of the Simons, perhaps) could comment on this, but it seems that the primary problem with using readFile is the open handles issue, which boils down to the definition of hGetContents:
hGetContents :: Handle -> IO String hGetContents handle = withHandle "hGetContents" handle $ \handle_ -> case haType handle_ of ClosedHandle -> ioe_closedHandle SemiClosedHandle -> ioe_closedHandle AppendHandle -> ioe_notReadable WriteHandle -> ioe_notReadable _ -> do xs <- lazyRead handle return (handle_{ haType=SemiClosedHandle}, xs )
If this were changed to actually close the handle, rather than leaving it in a semi closed state (call this new function hGetContentsAndClose or something) and then readFileAndClose were implemented based on this, would this solve the problem?
I read recently that other functional languages (ML?) with strict evaluation semantics can be used to mimic lazy evaluation, but saw no clue how that would work. I'm wondering if any insight might be gained by looking at how such mechanisms might interact with functional state-transformer mechanisms?
The basic idea is to do something like:
data Lazy a = Value a (Lazy a) Deferred (() -> Lazy a)
or something, and then you provide a function:
force (Value a l) = Value a l force (Deferred f) = force (f ())
You can then force values as you need them; presumably in strict evaluation semantics, 'f ()' would be evaluated only at the point that it's forced. This isn't really relevant, though, I don't think, since it's sort of the opposite of what we want to be doing... - Hal