RE: FW: lazy file reading in H98

I'm working on this text right now. Check out http://research.microsoft.com/~simonpj/tmp/haskell98-library-html/ and see if you think it's improved Simon | -----Original Message----- | From: Malcolm Wallace [mailto:Malcolm.Wallace@cs.york.ac.uk] | Sent: 03 April 2001 17:04 | To: libraries@haskell.org | Subject: Re: FW: lazy file reading in H98 | | | > (1) At least, the definition should say that the behaviour | > is undefined if a program ever writes to a file that it | > has read with `readFile' or `hGetContents' before. | | The Library Report is already stronger than this. The | behaviour is fully defined: an error should be raised. | Here's what it says: | | > Implementations should enforce as far as possible, locally | to the Haskell | > process, multiple-reader single-writer locking on files. | ... If any | > open or semi-closed handle is managing a file for input, | new handles | > can only be allocated if they do not manage output. | > ... | > Error reporting: the openFile computation may fail with | > isAlreadyInUseError if the file is already open and cannot be | > reopened. | | The only very slightly confusing aspect is that the handles | used by "readFile" and "writeFile" are internal, not written | directly by the programmer. Perhaps the description of this | behaviour should be moved up from the 11.3.1 "Opening Files" | subsection to the enclosing 11.3 section, because it is more | generally applicable. | | Subsection 11.2.1 "Semi-closed Handles" should mention | "readFile" in addition to "hGetContents". It could also | explicitly refer to the multiple-reader single-writer | restriction, which is not otherwise mentioned here. | | Regards, | Malcolm | | _______________________________________________ | Libraries mailing list | Libraries@haskell.org | http://www.haskell.org/mailman/listinfo/libraries |

(1) At least, the definition should say that the behaviour is undefined if a program ever writes to a file that it has read with `readFile' or `hGetContents' before.
This restriction would be too harsh. It is desirable to be able to write to a file that has been read before with `readFile' or `hGetContents'. With hGetContents you can do this safely: do handle <- openFile "configuration" ReadMode contents <- hGetContents ... hClose handle writeFile "configuration" newContents At the point where you write newContents, you should no longer be interested in contents. Either, because you are at the end of the program execution anyway (as in Manuel's example), or because you checked that you could parse the contents and hence processed the whole contents, e.g.: ... case parse contents of Nothing -> error "corrupted file format" Just info -> ... I would think that the later case is very common and then you do not even have to use `hClose'. However, I would prefer to do so and it is unfortunate that you cannot when you use `readFile'. Maybe these points should be mentioned in the report. Another point: I had difficulties parsing the following sentence of Section 11.2.2 on first reading: Any operation except for hClose that fails because a handle is closed, also fails if a handle is semi-closed. Alternatives: Any operation (except for hClose) that fails because a handle is closed, also fails if a handle is semi-closed. Any operation that fails because a handle is closed, also fails if a handle is semi-closed. The only exception is hClose. Olaf -- OLAF CHITIL, Dept. of Computer Science, University of York, York YO10 5DD, UK. URL: http://www.cs.york.ac.uk/~olaf/ Tel: +44 1904 434756; Fax: +44 1904 432767

Sat, 07 Apr 2001 20:35:52 +0100, Olaf Chitil
do handle <- openFile "configuration" ReadMode contents <- hGetContents ... hClose handle writeFile "configuration" newContents
It will be wrong if the contents is not fully evaluated at the point of hClose. Perhaps hClose should arrange to performGC and suck the rest of the file if it's still needed. -- __("< Marcin Kowalczyk * qrczak@knm.org.pl http://qrczak.ids.net.pl/ \__/ ^^ SYGNATURA ZASTÊPCZA QRCZAK

Marcin 'Qrczak' Kowalczyk wrote:
Sat, 07 Apr 2001 20:35:52 +0100, Olaf Chitil
pisze: do handle <- openFile "configuration" ReadMode contents <- hGetContents ... hClose handle writeFile "configuration" newContents
It will be wrong if the contents is not fully evaluated at the point of hClose.
What do you mean by `wrong'? As 11.2.2 says: Once a semi-closed handle becomes closed, the contents of the associated stream becomes fixed, and is the list of those items which were successfully read from that handle. So only if you continue evaluating `contents' after `hClose' *and* expect to get the full remaing file contents, then you have a problem.
Perhaps hClose should arrange to performGC and suck the rest of the file if it's still needed.
This would be bad, if you still had a reference to `contents' and were not interested in the rest of the file (seems to be the case in Manuel's example). Also, a full garbage collection is quite expensive. Simon doesn't want to make changes to Haskell. My opinion is that the current definition of readFile and hGetContents is fine. They are useful when used with a bit of care. Some remarks should make the reader of the report aware of the caveats. (I'm not saying that I couldn't imagine some useful additional functions for lazy reading and writing.) Olaf -- OLAF CHITIL, Dept. of Computer Science, University of York, York YO10 5DD, UK. URL: http://www.cs.york.ac.uk/~olaf/ Tel: +44 1904 434756; Fax: +44 1904 432767

Sun, 08 Apr 2001 19:12:10 +0100, Olaf Chitil
So only if you continue evaluating `contents' after `hClose' *and* expect to get the full remaing file contents, then you have a problem.
Indeed. But in a lazy language it's not always obvious that the evaluation of contents happens after hClose. Another problem with readFile is that reading many files requires many open file descriptors if their contents are not evaluated early enough. This has bitten me in practice. I agree that reading the rest on hClose would be dangerous, since the programmer might be no longer interested in it. So perhaps an explicit operation on a handle should be provided: similar to hClose, but which causes any remaining contents of a semi-closed handle to be slurped? Thus for files closed with that operation hGetContents could be safely treated as if it was strict (assuming that file contents don't change over time). Applying 'foldr (const id) (return ())' to the contents string is not perfect: unintuitive, requires to hold the contents string, and probably not as efficient as possible. -- __("< Marcin Kowalczyk * qrczak@knm.org.pl http://qrczak.ids.net.pl/ \__/ ^^ SYGNATURA ZASTÊPCZA QRCZAK
participants (3)
-
Olaf Chitil
-
qrczak@knm.org.pl
-
Simon Peyton-Jones