getContents and lazy evaluation

Tamas K Papp

1 Sep 2006 1 Sep '06

7:19 p.m.

Hi, I am newbie, reading the Gentle Introduction. Chapter 7 (Input/Output) says Pragmatically, it may seem that getContents must immediately read an entire file or channel, resulting in poor space and time performance under certain conditions. However, this is not the case. The key point is that getContents returns a "lazy" (i.e. non-strict) list of characters (recall that strings are just lists of characters in Haskell), whose elements are read "by demand" just like any other list. An implementation can be expected to implement this demand-driven behavior by reading one character at a time from the file as they are required by the computation. So what happens if I do contents <- getContents handle putStr (take 5 contents) -- assume that the implementation -- only reads a few chars -- delete the file in some way putStr (take 500 contents) -- but the file is not there now If an IO function is lazy, doesn't that break sequentiality? Sorry if the question is stupid. Thanks, Tamas

Show replies by date

Robert Dockins

1 Sep 1 Sep

8:28 p.m.

On Friday 01 September 2006 15:19, Tamas K Papp wrote:

...

Hi,

I am newbie, reading the Gentle Introduction. Chapter 7 (Input/Output) says

Pragmatically, it may seem that getContents must immediately read an entire file or channel, resulting in poor space and time performance under certain conditions. However, this is not the case. The key point is that getContents returns a "lazy" (i.e. non-strict) list of characters (recall that strings are just lists of characters in Haskell), whose elements are read "by demand" just like any other list. An implementation can be expected to implement this demand-driven behavior by reading one character at a time from the file as they are required by the computation.

So what happens if I do

contents <- getContents handle putStr (take 5 contents) -- assume that the implementation -- only reads a few chars -- delete the file in some way putStr (take 500 contents) -- but the file is not there now

If an IO function is lazy, doesn't that break sequentiality? Sorry if the question is stupid.

This is not a stupid question at all, and it highlights the main problem with lazy IO. The solution is, in essence "don't do that, because Bad Things will happen". It's pretty unsatisfactory, but there it is. For this reason, lazy IO is widely regarded as somewhat dangerous (or even as an outright misfeature, by a few). If you are going to be doing simple pipe-style IO (ie, read some data sequentially, manipulate it, spit out the output), lazy IO is very convenient, and it makes putting together quick scripts very easy. However, if you're doing something more advanced, you'd probably do best to stay away from lazy IO. Welcome to Haskell, BTW :-)

...

Thanks,

Tamas

-- Rob Dockins Talk softly and drive a Sherman tank. Laugh hard, it's a long way to the bank. -- TMBG

Duncan Coutts

8:46 p.m.

On Fri, 2006-09-01 at 16:28 -0400, Robert Dockins wrote:

...

On Friday 01 September 2006 15:19, Tamas K Papp wrote:

...
Hi,

I am newbie, reading the Gentle Introduction. Chapter 7 (Input/Output) says

Pragmatically, it may seem that getContents must immediately read an entire file or channel, resulting in poor space and time performance under certain conditions. However, this is not the case. The key point is that getContents returns a "lazy" (i.e. non-strict) list of characters (recall that strings are just lists of characters in Haskell), whose elements are read "by demand" just like any other list. An implementation can be expected to implement this demand-driven behavior by reading one character at a time from the file as they are required by the computation.

So what happens if I do

contents <- getContents handle putStr (take 5 contents) -- assume that the implementation -- only reads a few chars -- delete the file in some way putStr (take 500 contents) -- but the file is not there now

If an IO function is lazy, doesn't that break sequentiality? Sorry if the question is stupid.

This is not a stupid question at all, and it highlights the main problem with lazy IO. The solution is, in essence "don't do that, because Bad Things will happen". It's pretty unsatisfactory, but there it is. For this reason, lazy IO is widely regarded as somewhat dangerous (or even as an outright misfeature, by a few).

If you are going to be doing simple pipe-style IO (ie, read some data sequentially, manipulate it, spit out the output), lazy IO is very convenient, and it makes putting together quick scripts very easy. However, if you're doing something more advanced, you'd probably do best to stay away from lazy IO.

Since working on Data.ByteString.Lazy I'm now even more of a pro-lazy-IO zealot than I was before ;-) In practise I expect that most programs that deal with file IO strictly do not handle the file disappearing under them very well either. At best the probably throw an exception and let something else clean up. The same can be done with lazy I, though it requires using imprecise exceptions which some people grumble about. So I would contend that lazy IO is actually applicable in rather a wider range of circumstances than you might. :-) Note also, that with lazy IO we can write really short programs that are blindingly quick. Lazy IO allows us to save a copy through the Handle buffer. BTW in the above case the "bad thing that will happen" is that contents will be truncated. As I said, I think it's better to throw an exception, which is what Data.ByteString.Lazy.hGetContents does. Duncan

Robert Dockins

9:36 p.m.

On Friday 01 September 2006 16:46, Duncan Coutts wrote:

...

On Fri, 2006-09-01 at 16:28 -0400, Robert Dockins wrote:

...
On Friday 01 September 2006 15:19, Tamas K Papp wrote:

...
Hi,

I am newbie, reading the Gentle Introduction. Chapter 7 (Input/Output) says

Pragmatically, it may seem that getContents must immediately read an entire file or channel, resulting in poor space and time performance under certain conditions. However, this is not the case. The key point is that getContents returns a "lazy" (i.e. non-strict) list of characters (recall that strings are just lists of characters in Haskell), whose elements are read "by demand" just like any other list. An implementation can be expected to implement this demand-driven behavior by reading one character at a time from the file as they are required by the computation.

So what happens if I do

contents <- getContents handle putStr (take 5 contents) -- assume that the implementation -- only reads a few chars -- delete the file in some way putStr (take 500 contents) -- but the file is not there now

If an IO function is lazy, doesn't that break sequentiality? Sorry if the question is stupid.

This is not a stupid question at all, and it highlights the main problem with lazy IO. The solution is, in essence "don't do that, because Bad Things will happen". It's pretty unsatisfactory, but there it is. For this reason, lazy IO is widely regarded as somewhat dangerous (or even as an outright misfeature, by a few).

If you are going to be doing simple pipe-style IO (ie, read some data sequentially, manipulate it, spit out the output), lazy IO is very convenient, and it makes putting together quick scripts very easy. However, if you're doing something more advanced, you'd probably do best to stay away from lazy IO.

Since working on Data.ByteString.Lazy I'm now even more of a pro-lazy-IO zealot than I was before ;-)

In practise I expect that most programs that deal with file IO strictly do not handle the file disappearing under them very well either.

That's probably true, except for especially robust applications where such a thing is a regular (or at least expected) event.

...

At best the probably throw an exception and let something else clean up. The same can be done with lazy I, though it requires using imprecise exceptions which some people grumble about. So I would contend that lazy IO is actually applicable in rather a wider range of circumstances than you might. :-)

Perhaps I should be more clear. When I said "advanced" above I meant "any use whereby you treat a file as random access, read/write storage, or do any kind of directory manipulation (including deleting and or renaming files)". Lazy I/O (as it currently stands) doesn't play very nice with those use cases. I agree generally with the idea that lazy I/O is good. The problem is that it is a "leaky abstraction"; details are exposed to the user that should ideally be completely hidden. Unfortunately, the leaks aren't likely to get plugged without pretty tight operating system support, which I suspect won't be happening anytime soon.

...

Note also, that with lazy IO we can write really short programs that are blindingly quick. Lazy IO allows us to save a copy through the Handle buffer.

...

BTW in the above case the "bad thing that will happen" is that contents will be truncated. As I said, I think it's better to throw an exception, which is what Data.ByteString.Lazy.hGetContents does.

Well, AFAIK, the behavior is officially undefined, which is my real beef. I agree that it _should_ throw an exception.

...

Duncan

-- Rob Dockins Talk softly and drive a Sherman tank. Laugh hard, it's a long way to the bank. -- TMBG

Donn Cave

10:01 p.m.

On Fri, 1 Sep 2006, Robert Dockins wrote:

...

On Friday 01 September 2006 16:46, Duncan Coutts wrote: ...

...
Note also, that with lazy IO we can write really short programs that are blindingly quick. Lazy IO allows us to save a copy through the Handle buffer.

(Never understood why some people think it would be such a good thing to be blinded, but as long as it's you and not me ... )

...

...
BTW in the above case the "bad thing that will happen" is that contents will be truncated. As I said, I think it's better to throw an exception, which is what Data.ByteString.Lazy.hGetContents does.

Well, AFAIK, the behavior is officially undefined, which is my real beef. I agree that it _should_ throw an exception.

Is this about Microsoft Windows? On UNIX, I would expect deletion of a file to have no effect on I/O of any kind on that file. I thought the problems with hGetContents more commonly involve operations on the file handle, e.g., hClose. Donn Cave, donn@drizzle.com

Robert Dockins

10:30 p.m.

On Friday 01 September 2006 18:01, Donn Cave wrote:

...

On Fri, 1 Sep 2006, Robert Dockins wrote:

...
On Friday 01 September 2006 16:46, Duncan Coutts wrote:

...

...
...
Note also, that with lazy IO we can write really short programs that are blindingly quick. Lazy IO allows us to save a copy through the Handle buffer.

(Never understood why some people think it would be such a good thing to be blinded, but as long as it's you and not me ... )

...
...
BTW in the above case the "bad thing that will happen" is that contents will be truncated. As I said, I think it's better to throw an exception, which is what Data.ByteString.Lazy.hGetContents does.

Well, AFAIK, the behavior is officially undefined, which is my real beef. I agree that it _should_ throw an exception.

Is this about Microsoft Windows? On UNIX, I would expect deletion of a file to have no effect on I/O of any kind on that file. I thought the problems with hGetContents more commonly involve operations on the file handle, e.g., hClose.

Ahh... I think you're right. However, this just illustrates the problem. The point is that the answer the question "what happens when I do " is "it depends". And to the obvious followup question "what does it depend on?" the answer is "well.... it's complicated".

...

Donn Cave, donn@drizzle.com

-- Rob Dockins Talk softly and drive a Sherman tank. Laugh hard, it's a long way to the bank. -- TMBG

Duncan Coutts

10:47 p.m.

On Fri, 2006-09-01 at 17:36 -0400, Robert Dockins wrote:

...

Perhaps I should be more clear. When I said "advanced" above I meant "any use whereby you treat a file as random access, read/write storage, or do any kind of directory manipulation (including deleting and or renaming files)". Lazy I/O (as it currently stands) doesn't play very nice with those use cases.

Indeed, it can't be used in that case.

...

I agree generally with the idea that lazy I/O is good. The problem is that it is a "leaky abstraction"; details are exposed to the user that should ideally be completely hidden. Unfortunately, the leaks aren't likely to get plugged without pretty tight operating system support, which I suspect won't be happening anytime soon.

Yes it is leaky.

...

Well, AFAIK, the behavior is officially undefined, which is my real beef. I agree that it _should_ throw an exception.

Ah, I had thought it was defined to simply truncate. It being undefined isn't good. It seems that it would be straightforward to define it to have the truncation behaviour. If Haskell-prime gets imprecise exceptions then that could be changed. Duncan

David Roundy

6 Sep 6 Sep

1:37 p.m.

On Fri, Sep 01, 2006 at 11:47:20PM +0100, Duncan Coutts wrote:

...

On Fri, 2006-09-01 at 17:36 -0400, Robert Dockins wrote:

...
Well, AFAIK, the behavior is officially undefined, which is my real beef. I agree that it _should_ throw an exception.

Ah, I had thought it was defined to simply truncate. It being undefined isn't good. It seems that it would be straightforward to define it to have the truncation behaviour. If Haskell-prime gets imprecise exceptions then that could be changed.

Fortunately, the undefined behavior in this case is unrelated to the lazy IO. On windows, the removal of the file will fail, while on posix systems there won't be any failure at all. The same behavior would show up if you opened the file for non-lazy reading, and tried to read part of the file, then delete it, then read the rest. The "undefinedness" in this example, isn't in the haskell language, but in the filesystem semantics, and that's not something we want the language specifying (since it's something over which it has no control). Lazy IO definitely works much more nicely with posix filesystems, but that's unsurprising, since posix filesystem semantics are much nicer than those of Windows. -- David Roundy

Esa Ilari Vuokko

8:30 p.m.

Hi On 9/6/06, David Roundy wrote:

...

Fortunately, the undefined behavior in this case is unrelated to the lazy IO. On windows, the removal of the file will fail, while on posix systems there won't be any failure at all. The same behavior would show up if you opened the file for non-lazy reading, and tried to read part of the file, then delete it, then read the rest.

This is not strictly speaking true. If all the handles opened to the file in question are in FILE_SHARE_DELETE-sharing mode, it can be marked for deletion when last handle to it is closed. It can also be moved and renamed. But it is true that removal might fail because of open handle, and it is true that it will fail as implemented currently for ghc (and probably for other compilers as well.)

...

The "undefinedness" in this example, isn't in the haskell language, but in the filesystem semantics, and that's not something we want the language specifying (since it's something over which it has no

Happily this isn't lazy IO-issue, it's just file IO issue for all files opened as specified by haskell98. Sharing mode would be really nice to have in Windows, as would security attributes. But as you say, these are hard things to specify because not everyone has those features. So, at least it works nicely in posixy-systems, eh? Best regards, --Esa Ilari Vuokko

Julien Oster

2 Sep 2 Sep

12:12 a.m.

Duncan Coutts wrote: Hi,

...

In practise I expect that most programs that deal with file IO strictly do not handle the file disappearing under them very well either. At best the probably throw an exception and let something else clean up.

And at least in Unix world, they just don't disappear. Normally, if you delete a file, you just delete its directory entry. If there still is something with an open handle to it, i.e. your program, the corresponding "inode" (that's basically the file itself without its name or names) still happily exists for your seeking, reading and writing. Then, when your program closes the file and there really is no remaining directory entry and no other process accessing it, the inode is removed as well. One trick for temporary files on unix is opening a new file, immediately deleting it but still using it to write and read data. So no problem here. But what happens when two processes use the same file and one process is writing into it using lazy IO which didn't happen yet? The other process wouldn't see its changes yet. I'm not sure if it matters, however, since sooner or later that IO will happen. And I believe that lazy IO still means that for one operation actually taking place, all prior operations take place in the right order beforehand as well, no? As for two processes writing to the same file at the same time, very bad things may happen anyway. Sure, lazy IO prevents doing communication between running processes using plain files, but why would you do something like that? Regards, Julien

6891

Age (days ago)

6896

Last active (days ago)

List overview

Download

9 comments

7 participants

participants (7)

David Roundy
Donn Cave
Duncan Coutts
Esa Ilari Vuokko
Julien Oster
Robert Dockins
Tamas K Papp