FW: lazy file reading in H98

Here's a library issue. The conclusion of this conversation was that H98 already specifies option (1) below, and I will clarify that in revising the library report. Nevertheless, the absence of a simple way to read-modify-write a file is a pain in the neck. Question: should one of our extended-IO libraries support a version of openFile that guarantees option (2)? Simon -----Original Message----- From: Manuel M. T. Chakravarty [mailto:chak@cse.unsw.edu.au] Sent: 05 September 2000 02:10 To: haskell@haskell.org Subject: lazy file reading in H98 In an assignment, in my class, we came across a lack of specification of the behaviour of `Prelude.readFile' and `IO.hGetContents' and IMHO also a lack of functionality. As both operations read a file lazily, subsequent writes to the same file are potentially disastrous. In this assignment, the file was used to make a Haskell data structure persistent over multiple runs of the program - ie, readFile fname >>= return . read at the start of the program and writeFile fname . show at the end of the program. For certain inputs, where the data structure stored in the file was only partially used, the file was overwritten before it was fully read. H98 doesn't really specify what happens in this situation. I think, there are two ways to solve that: (1) At least, the definition should say that the behaviour is undefined if a program every writes to a file that it has read with `readFile' or `hGetContents' before. (2) Alternatively, it could demand more sophistication from the implementation and require that upon opening of a file for writing that is currently semi-closed, the implementation has to make sure that the contents of the semi-closed file is not corrupted before it is fully read.[1] In the case that solution (1) is chosen, I think, we should also have something like `strictReadFile' (and `hStrictGetContents') which reads the whole file before proceeding to the next IO action. Otherwise, in situations like in the mentioned assignment, you have to resort to reading the file character by character, which seems very awkward. So, overall, I think solution (2) is more elegant. Cheers, Manuel [1] On Unix-like (POSIX?) systems, unlinking the file and then opening the writable file would be sufficient. On certain legacy OSes, the implementation would have to read the rest of the file into memory before creating a new file under the same name.

(1) At least, the definition should say that the behaviour is undefined if a program ever writes to a file that it has read with `readFile' or `hGetContents' before.
The Library Report is already stronger than this. The behaviour is fully defined: an error should be raised. Here's what it says:
Implementations should enforce as far as possible, locally to the Haskell process, multiple-reader single-writer locking on files. ... If any open or semi-closed handle is managing a file for input, new handles can only be allocated if they do not manage output. ... Error reporting: the openFile computation may fail with isAlreadyInUseError if the file is already open and cannot be reopened.
The only very slightly confusing aspect is that the handles used by "readFile" and "writeFile" are internal, not written directly by the programmer. Perhaps the description of this behaviour should be moved up from the 11.3.1 "Opening Files" subsection to the enclosing 11.3 section, because it is more generally applicable. Subsection 11.2.1 "Semi-closed Handles" should mention "readFile" in addition to "hGetContents". It could also explicitly refer to the multiple-reader single-writer restriction, which is not otherwise mentioned here. Regards, Malcolm
participants (2)
-
Malcolm Wallace
-
Simon Peyton-Jones