RE: [Haskell-cafe] Re: Writing binary files?

16 Sep 2004

      On 16 September 2004 10:35, Glynn Clements wrote:
...
Simon Marlow wrote:
...
...
Which is why I'm suggesting changing Char to be a byte, so that we
can have the basic, robust API now and wait for the more advanced
API, rather than having to wait for a usable API while people sort
out all of the issues.
An easier way is just to declare that the existing API assumes a
Latin-1 encoding consistently.  Later we might add a way to let the
application pick another encoding, or request that the I/O library
uses the locale encoding.
But how do you do that without breaking stuff? If the application
changes the encoding to UTF-8 (either explicitly, or by using the
locale's encoding when it happens to be UTF-8), then code such as:
[filename] <- getArgs
  openFile filename ReadMode
will fail if filename isn't a valid UTF-8 sequence. Similarly for the
other cases where the OS accepts/returns byte strings but the Haskell
interface uses String.
And that's the correct behaviour, isn't it?

Actually I hadn't really considered filenames, I was just talking about
data read & written via the IO library.
...
I'm less concerned about the handling of streams, as you can
reasonably add a way to change the encoding before any data has been
read or written. I'm more concerned about FilePaths, argv, the
environment etc.
Yes, these are interesting issues.  Filenames are stored as character
strings on some OSs (eg. Windows) and byte strings on others.  So the
Haskell portable API should probably use String, and do decoding based
on the locale (if the programmer asks for it).

Argv and the environment - I don't know.  Windows CreateProcess() allows
these to be UTF-16 strings, but I don't know what encoding/decoding
happens between CreateProcess() and what the target process sees in its
argv[] (can't be bothered to dig through MSDN right now).  I suspect
these should be Strings in Haskell too, with appropriate
decoding/encoding happening under the hood.

Cheers,
	Simon