
But there's one significant difference between C and Haskell, which is applicable in the case of Matt's program. In C, any line-buffered output streams are automatically flushed when a read from an unbuffered or line-buffered stream can't be satisfied from its buffer.
Interesting. I didn't know this. Maybe we should match this behaviour, or provide a write-string-and-flush function. It seems like this issue is causing an undue amound of trouble.
I wrote GHC's IO library, and deliberately didn't include this feature. The previous version of the library did have such a feature, specifically for stdin/stdout. Note that the report doesn't say we must do this. The reason I didn't include the feature is because I can't see a way to do it right. Flushing *all* line-buffered handles (the ANSI C way) doesn't seem right. Flushing stdout just because we read from stdin is not right, because the two streams might refer to completely different I/O objects. Perhaps we should attempt to detect when there are two streams connected to the same I/O object (file, pipe, tty, whatever) and enable the magic flushing then. But now do I have to explain to people how this works? I suppose we could take the view that extra flushing is basically harmless, so it doesn't matter that we flush a bunch of Handles more often than we need to. The advantage of the current scheme is that it is easy to understand and explain; the library doesn't try to do clever stuff behind your back. The disadvantage is that it catches people out, and it sometimes requires you to import IO in an otherwise Prelude-only program. I'm more-or-less agnostic - if there's a way to avoid catching people out without introducing too much overhead or complicated rules that we have to explain, then I'll happily implement it. Cheers, Simon

I've long thought that I/O buffering behavior--not just in Haskell, but in most places I've encountered it--was unnecessarily complicated. Perhaps it could be simplified dramatically by treating it as strictly a performance optimization. Here's a sketch of the approach. Writing a sequence of characters across the interface I'm proposing is a request by the writing program for those characters to appear at their destination "soon". Ideally, "soon" would be "immediately"; however, the characters' appearance may deliberately be delayed ("buffered"), for efficiency, as long as such delay is "unobtrusive" to a human user of the program. Buffering timeouts would depend on the device; for a terminal, perhaps 50-100 ms would be appropriate. Such an interval would tend not to be noticeable to a human user but would be long enough to effectively collect, say, an entire line of output for output "in one piece". The use of a reasonable timeout would avoid the confusing behavior where a newline-less prompt doesn't appear until the prompted data is entered. With this scheme, I/O buffering no longer has any real semantic content. (In particular, the interface never guarantees indefinite delay in outputting written characters. Line buffering, if semantically important, needs to be done above the level of this interface.) Hence, buffering control could be completely eliminated from the interface. However, I would retain it to provide (non-semantic) control over buffering. The optional buffer size currently has such an effect. A timeout value could be added for fine tuning. (Note that a timeout of zero would have an effect similar to Haskell's current NoBuffering.) Lastly, the "flush" operation would remain, as a hint that it's not worth waiting even the limited timeout period before endeavoring to make the characters appear. Is such an approach feasible? Has it been implemented anywhere? Would such behavior best be implemented by the operating system? Could it be implemented by the runtime system? Dean

Dean Herington wrote:
I've long thought that I/O buffering behavior--not just in Haskell, but in most places I've encountered it--was unnecessarily complicated. Perhaps it could be simplified dramatically by treating it as strictly a performance optimization.
This isn't entirely possible; there will always be situations where it matters as to exactly when and how the data gets passed to the OS. My experience taught me that the simplest solution was never to use ANSI stdio buffering in such situations.
Here's a sketch of the approach.
Writing a sequence of characters across the interface I'm proposing is a request by the writing program for those characters to appear at their destination "soon". Ideally, "soon" would be "immediately"; however, the characters' appearance may deliberately be delayed ("buffered"), for efficiency, as long as such delay is "unobtrusive" to a human user of the program. Buffering timeouts would depend on the device; for a terminal, perhaps 50-100 ms would be appropriate. Such an interval would tend not to be noticeable to a human user but would be long enough to effectively collect, say, an entire line of output for output "in one piece". The use of a reasonable timeout would avoid the confusing behavior where a newline-less prompt doesn't appear until the prompted data is entered.
With this scheme, I/O buffering no longer has any real semantic content. (In particular, the interface never guarantees indefinite delay in outputting written characters. Line buffering, if semantically important, needs to be done above the level of this interface.)
That's already true, at least in C: if you output a line which is longer than the buffer, the buffer will be flushed before it contains a newline (i.e. the line won't be written atomically).
Hence, buffering control could be completely eliminated from the interface. However, I would retain it to provide (non-semantic) control over buffering. The optional buffer size currently has such an effect. A timeout value could be added for fine tuning. (Note that a timeout of zero would have an effect similar to Haskell's current NoBuffering.) Lastly, the "flush" operation would remain, as a hint that it's not worth waiting even the limited timeout period before endeavoring to make the characters appear.
Is such an approach feasible?
Possibly. As things stand, anyone who writes code which relies upon output being held back until a flush is asking for trouble. So, your approach wouldn't make it any harder to write correct code, although it might make it significantly more obvious if code was incorrect. AFAICT, the biggest problem would be providing an upper bound on the delay, as that implies some form of preemptive concurrency.
Has it been implemented anywhere?
Not that I know of.
Would such behavior best be implemented by the operating system?
No. The OS (i.e. kernel) doesn't know anything about user-space buffering. Furthermore, one of the main functions of user-space buffering is to minimise the number of system calls, so putting it into the OS would be pointless.
Could it be implemented by the runtime system?
It depends what you mean by "the runtime system"; it would have to be
implemented in user-space.
--
Glynn Clements
participants (3)
-
Dean Herington
-
Glynn Clements
-
Simon Marlow