Re: simpler I/O buffering [was: RE: An IO Question from a Newbie]

16 Sep 2003

      Dean Herington wrote:
...
I've long thought that I/O buffering behavior--not just in Haskell, but 
in most places I've encountered it--was unnecessarily complicated.  
Perhaps it could be simplified dramatically by treating it as strictly a 
performance optimization.
This isn't entirely possible; there will always be situations where it
matters as to exactly when and how the data gets passed to the OS. My
experience taught me that the simplest solution was never to use ANSI
stdio buffering in such situations.
...
Here's a sketch of the approach.
Writing a sequence of characters across the interface I'm proposing is a 
request by the writing program for those characters to appear at their 
destination "soon".  Ideally, "soon" would be "immediately"; however, the 
characters' appearance may deliberately be delayed ("buffered"), for 
efficiency, as long as such delay is "unobtrusive" to a human user of the 
program.  Buffering timeouts would depend on the device; for a terminal, 
perhaps 50-100 ms would be appropriate.  Such an interval would tend not 
to be noticeable to a human user but would be long enough to effectively 
collect, say, an entire line of output for output "in one piece".  The use 
of a reasonable timeout would avoid the confusing behavior where a 
newline-less prompt doesn't appear until the prompted data is entered.
With this scheme, I/O buffering no longer has any real semantic content.  
(In particular, the interface never guarantees indefinite delay in 
outputting written characters.  Line buffering, if semantically important, 
needs to be done above the level of this interface.)
That's already true, at least in C: if you output a line which is
longer than the buffer, the buffer will be flushed before it contains
a newline (i.e. the line won't be written atomically).
...
Hence, buffering 
control could be completely eliminated from the interface.  However, I 
would retain it to provide (non-semantic) control over buffering.  The 
optional buffer size currently has such an effect.  A timeout value could 
be added for fine tuning.  (Note that a timeout of zero would have an 
effect similar to Haskell's current NoBuffering.)  Lastly, the "flush" 
operation would remain, as a hint that it's not worth waiting even the 
limited timeout period before endeavoring to make the characters appear.
Is such an approach feasible?
Possibly.

As things stand, anyone who writes code which relies upon output being
held back until a flush is asking for trouble. So, your approach
wouldn't make it any harder to write correct code, although it might
make it significantly more obvious if code was incorrect.

AFAICT, the biggest problem would be providing an upper bound on the
delay, as that implies some form of preemptive concurrency.
...
Has it been implemented anywhere?
Not that I know of.
...
Would such behavior best be implemented by the operating system?
No. The OS (i.e. kernel) doesn't know anything about user-space
buffering. Furthermore, one of the main functions of user-space
buffering is to minimise the number of system calls, so putting it
into the OS would be pointless.
...
Could it be implemented by the runtime system?
It depends what you mean by "the runtime system"; it would have to be
implemented in user-space.

-- 
Glynn Clements