
On 22 April 2005 11:35, Benjamin Franksen wrote:
On Friday 22 April 2005 10:38, Simon Marlow wrote:
Lifting this constraint in GHC is a bit tricky because we currenly use the OS's text<->binary translation to do I/O (doesn't matter on Unix, right now).
So what? If the handle is in text mode, you won't get the exact bytes as they are in the file, when calling hGetWord8 (at least on some systems). But that is exactly what I would expect if the handle is in text mode. If I need the exact binary representation of a text file, I have to use binary, of course.
So if you open a file in text mode and read it with hGetWord8, what bytes do you expect to get, exactly? Perhaps each byte of the UCS-4 representation? Big endian or little endian order? You could specify some meaning, but I don't think it would be a useful interface. And if you can't specify it precisely, it shouldn't be there.
I presume that hPutWord8 (fromIntegral (ord '\n')) should not flush a line-buffered Handle,
I can't see a reason not to flush it. Could you explain why you think it should not? What have terminal settings to do with file handle modes??
Well, I have the notion that EOL is a text concept, so doesn't belong in binary I/O. On the other hand, it's easy enough to specify that writing the byte 0xa to a line-buffered binary handle has the effect of flushing it, just as we say that writing the character '\n' to a line-buffered text handle triggers a flush. So I'm happy to leave things as they are. Cheers, Simon

On Friday 22 April 2005 14:19, you wrote:
On 22 April 2005 11:35, Benjamin Franksen wrote:
On Friday 22 April 2005 10:38, Simon Marlow wrote:
Lifting this constraint in GHC is a bit tricky because we currenly use the OS's text<->binary translation to do I/O (doesn't matter on Unix, right now).
So what? If the handle is in text mode, you won't get the exact bytes as they are in the file, when calling hGetWord8 (at least on some systems). But that is exactly what I would expect if the handle is in text mode. If I need the exact binary representation of a text file, I have to use binary, of course.
So if you open a file in text mode and read it with hGetWord8, what bytes do you expect to get, exactly? Perhaps each byte of the UCS-4 representation? Big endian or little endian order?
You could specify some meaning, but I don't think it would be a useful interface. And if you can't specify it precisely, it shouldn't be there.
Objection withdrawn. I haven't been thinking enough. A runtime error is probably better than an IO action that might produce random nonsense (or at least extremely unportable results) when used wrongly.
I presume that hPutWord8 (fromIntegral (ord '\n')) should not flush a line-buffered Handle,
I can't see a reason not to flush it. Could you explain why you think it should not? What have terminal settings to do with file handle modes??
Well, I have the notion that EOL is a text concept, so doesn't belong in binary I/O.
In principle I agree with this. It's only that we may want to layer character based functions on top of the binary ones. Although, from what Peter Simons writes, maybe I doesn't make much sense to do that...
On the other hand, it's easy enough to specify that writing the byte 0xa to a line-buffered binary handle has the effect of flushing it, just as we say that writing the character '\n' to a line-buffered text handle triggers a flush. So I'm happy to leave things as they are.
Maybe the best solution would be to make the behavior user configurable (per handle). Ben

Simon Marlow wrote:
Lifting this constraint in GHC is a bit tricky because we currenly use the OS's text<->binary translation to do I/O (doesn't matter on Unix, right now).
So what? If the handle is in text mode, you won't get the exact bytes as they are in the file, when calling hGetWord8 (at least on some systems). But that is exactly what I would expect if the handle is in text mode. If I need the exact binary representation of a text file, I have to use binary, of course.
So if you open a file in text mode and read it with hGetWord8, what bytes do you expect to get, exactly? Perhaps each byte of the UCS-4 representation? Big endian or little endian order?
No. You would get the raw stream of bytes from the file, except that
\r\n (Windows) or \r (Mac) would be converted to \n.
That should be the only difference between binary and text modes.
Similarly, actual text (i.e. Unicode) I/O should work in both binary
and text modes, with the only difference being EOL conversion.
--
Glynn Clements
participants (3)
-
Benjamin Franksen
-
Glynn Clements
-
Simon Marlow