
On 16 September 2004 10:35, Glynn Clements wrote:
Simon Marlow wrote:
Which is why I'm suggesting changing Char to be a byte, so that we can have the basic, robust API now and wait for the more advanced API, rather than having to wait for a usable API while people sort out all of the issues.
An easier way is just to declare that the existing API assumes a Latin-1 encoding consistently. Later we might add a way to let the application pick another encoding, or request that the I/O library uses the locale encoding.
But how do you do that without breaking stuff? If the application changes the encoding to UTF-8 (either explicitly, or by using the locale's encoding when it happens to be UTF-8), then code such as:
[filename] <- getArgs openFile filename ReadMode
will fail if filename isn't a valid UTF-8 sequence. Similarly for the other cases where the OS accepts/returns byte strings but the Haskell interface uses String.
And that's the correct behaviour, isn't it? Actually I hadn't really considered filenames, I was just talking about data read & written via the IO library.
I'm less concerned about the handling of streams, as you can reasonably add a way to change the encoding before any data has been read or written. I'm more concerned about FilePaths, argv, the environment etc.
Yes, these are interesting issues. Filenames are stored as character strings on some OSs (eg. Windows) and byte strings on others. So the Haskell portable API should probably use String, and do decoding based on the locale (if the programmer asks for it). Argv and the environment - I don't know. Windows CreateProcess() allows these to be UTF-16 strings, but I don't know what encoding/decoding happens between CreateProcess() and what the target process sees in its argv[] (can't be bothered to dig through MSDN right now). I suspect these should be Strings in Haskell too, with appropriate decoding/encoding happening under the hood. Cheers, Simon