
1. A file is not a stream. It really isn't anything like a stream. Sure, you can _make_ a stream based on a file but that's a different thing. A file is a list (ignoring for the moment meta-information), accessible at any point. By contrast, streams access either incoming or outgoing entities, optionally with "end of stream" support. For incoming, one may 'skip' but not 'seek'. For outgoing, one may send a series of predefined 'zero' values. Call that "seek" if you want, I don't.
I couldn't agree more. I came across exactly these issues while rewriting GHC's I/O library recently, so now GHC's Handle type internally has two constructors: - FileHandle (a handle to a file, seekable, with a single file pointer and a single buffer that contains either pending read or write data but not both). - DuplexHandle (a read/write stream, not seekable, with two completely independent buffers, and the two ends can be closed independently). strangely, a FileHandle is also used for a uni-directional stream, because it only needs a single buffer. For Concurrent Haskell there's a locking issue here: you can't expect two threads to read and write simultaneously to the same file, but it is entirely reasonable for two threads to be simultaneously reading and writing on the same socket. Hence a FileHandle has a single lock, and a DuplexHandle has one lock for each channel. In effect there's one lock per buffer.
2. A file is not made of "Char"s. A file is made of octets ("bytes"), i.e. Word8s. What is a "Char" anyway? Sometimes it's a seven- or eight-bit quantity with a _vague_ implication of interpretation as textual character; sometimes it's a 16-, 20.087- or 31-bit quantity with a much stronger implication of interpretation as textual character (strictly, Unicode "codepoint"). Is an ASCII 'r' the same as an EBCDIC 'r'? Or is an ASCII code 57 the same as an EBCDIC code 57?
As for streams, mostly they are streams of octets. But of course streams of anything might be useful.
There's an implicit conversion step, between whatever is the on-disk encoding of character streams and Unicode. GHC currently only supports a straightforward ISO 8851 encoding. I agree there ought to be a way to get at the raw bytes too.
3. Output streams ("sinks") are different from input streams ("sources"). That POSIX entity known as "standard output" is a sink of octets. A "bi-directional" stream such as a TCP connection is nothing but a source and a sink considered together. Indeed, for TCP it's possible to send "end of [outgoing] stream" without affecting the incoming stream. This is rather different from a contrived "file-access"-type stream, where reading and writing are operations affect each other.
Yup, see above.
There is no such thing as "I/O" unless, as in Haskell, one means _all_ imperative action. There are various entities out in the world accessible in a variety of different ways. Sources, sinks, lists, etc. are but some of the models useful for accessing them.
Ok, I agree so far. Are you suggesting the IO library should be changed? How? I considered providing a different API for bidirectional streams, or perhaps requiring that bidirectional streams use separate Handles for read and write, but came to the conclusion that the user really doesn't care whether under the hood a single Handle is using separate buffers for read and write or just a single buffer, how much locking is going on or whatever. The fact that these things are awkward to implement shouldn't show through in the library interface. It's definitely more convenient from the programmer's point of view to be able to use the *same* handle object for both read and write, otherwise you have to explain to people why they can have a read/write file handle but not a read/write handle for a TCP socket. Cheers, Simon
participants (1)
-
Simon Marlow