RE: [Haskell-cafe] Re: Writing binary files?

On 16 September 2004 00:02, Glynn Clements wrote:
Which is why I'm suggesting changing Char to be a byte, so that we can have the basic, robust API now and wait for the more advanced API, rather than having to wait for a usable API while people sort out all of the issues.
An easier way is just to declare that the existing API assumes a Latin-1 encoding consistently. Later we might add a way to let the application pick another encoding, or request that the I/O library uses the locale encoding. Existing code continues to work, and there are no conceptual problems (a Char is still Unicode). You have to decide what happens when the programmer tries to output a Char that is out of range for Latin-1. The current behaviour is simply to take the code point mod 0x100, but we could also decide to raise an exception in this case. Cheers, Simon

Simon Marlow wrote:
Which is why I'm suggesting changing Char to be a byte, so that we can have the basic, robust API now and wait for the more advanced API, rather than having to wait for a usable API while people sort out all of the issues.
An easier way is just to declare that the existing API assumes a Latin-1 encoding consistently. Later we might add a way to let the application pick another encoding, or request that the I/O library uses the locale encoding.
But how do you do that without breaking stuff? If the application
changes the encoding to UTF-8 (either explicitly, or by using the
locale's encoding when it happens to be UTF-8), then code such as:
[filename] <- getArgs
openFile filename ReadMode
will fail if filename isn't a valid UTF-8 sequence. Similarly for the
other cases where the OS accepts/returns byte strings but the Haskell
interface uses String.
Currently, the use of String for byte strings doesn't cause problems
because decoding using ISO-8859-1 can't fail. Allowing the use of a
fallible decoder introduces a new set of issues.
E.g. what happens if you call getDirectoryContents for a directory
which contains filenames which aren't valid in the current encoding?
Does the call fail outright, or are invalid entries silently omitted?
I'm less concerned about the handling of streams, as you can
reasonably add a way to change the encoding before any data has been
read or written. I'm more concerned about FilePaths, argv, the
environment etc.
--
Glynn Clements
participants (2)
-
Glynn Clements
-
Simon Marlow