
Glynn Clements wrote:
Marcin 'Qrczak' Kowalczyk wrote:
[...] Note that this needs to include all of the core I/O functions, not just reading/writing streams. E.g. FilePath is currently an alias for String, but (on Unix, at least) filenames are strings of bytes, not characters. Ditto for argv, environment variables, possibly other cases which I've overlooked.
I don't think so. They all are sequences of CChars, and C isn't particularly known for keeping bytes and chars apart. I believe, Windows NT has (alternate) filename handling functions that use unicode stringsr. This would strengthen the view that a filename is a sequence of characters. Ditto for argv, env, whatnot; they are typically entered from the shell and therefore are characters in the local encoding.
3. The default encoding is settable from Haskell, defaults to ISO-8859-1.
Agreed.
Oh no, please don't do that. A global, settable encoding is, well, dys-functional. Hidden state makes programs hard to understand and Haskell imho shouldn't go that route. And please don't introduce the notion of a "default" encoding. I'd like to see the following: - Duplicate the IO library. The duplicate should work with [Byte] everywhere where the old library uses String. Byte is some suitable unsigned integer, on most (all?) platforms this will be Word8 - Provide an explicit conversion between encodings. A simple conversion of type [Word8] -> String would suit me, iconv would provide all that is needed. - iconv takes names of encodings as arguments. Provide some names as constants: one name for the internal encoding (probably UCS4), one name for the canonical external encoding (probably locale dependent). - Then redefine the old IO API in terms of the new API and appropriate conversions. While we're at it, do away with the annoying CR/LF problem on Windows, this should simply be part of the local encoding. This way file can always be opened as binary, hSetBinary can be dropped. (This won't wont on ancient platforms where text files and binary files are genuinely different, but these are probably not interesting anyway.) The same thoughts apply to filenames. Make them [Word8] and convert explicitly. By the way, I think a path should be a list of names (that is of type [[Word8]]) and the library would be concerned with putting in the right path separator. Add functions to read and show pathnames in the local conventions and we'll never need to worry about path separators again.
There are limits to the extent to which this can be achieved. E.g. what happens if you set the encoding to UTF-8, then call getDirectoryContents for a directory which contains filenames which aren't valid UTF-8 strings?
Well, then you did something stupid, didn't you? If you don't know the encoding you shouldn't decode anything. That's a strong point against any implicit decoding, I think. Also, if efficiency is a concern, lists probably shouldn't be passed between filesystem operations and iconv. I think, we need a better representation here (like PackedString for Word8), not a convoluted API. Regards, Udo. -- If Perl is the solution, you're solving the wrong problem. -- Erik Naggum