
Glynn Clements
If you want text, well, tough; what comes out most system calls and core library functions (not just read()) are bytes.
Which need to be interpreted by the program depending on where these bytes come from.
They don't necessarily need to be interpreted.
I was thinking of data read from an fd.
A lot of data simply gets "routed" from one place to another. E.g. a program reads a filename from argv[i] and passes it to open(). It doesn't matter if the filename is in Klingon.
Right.
If you *need* an encoding, and don't have any better information, then the locale provides a last resort. Decoding bytes according to the locale for the sake of it just adds an unnecessary failure mode.
Right.
For case testing, locale-dependent sorting and the like, you need to convert to characters. [Although possibly only temporarily; you can sort a list of byte strings based upon their corresponding character strings using sortBy. This means that a decoding failure only means that the ordering will be wrong. This is essentially what happens with "ls" if you have filenames which aren't valid in the current locale.]
sortBy could only cope with single-byte encodings. Multi-byte encodings would need something else.
It's broken. Being able to represent filenames as byte strings is fundamental. Being able to convert them to or from character strings is useful but not essential. The only reason why the existing API doesn't cause serious problems is because the translation is currently hardwired to an encoding which can't fail.
Handling binary filenames is hardly fundamental. It isn't even very portable, see the posts about filename handling under modern Windows. It might be an important feature, but there are other programs out there (mostly GUIs) that expect filenames to be encoded according to the locale settings too.
By "core library functions", I was referring primarily to libc, not the Haskell library functions which were built upon them. The Haskell developers can change Haskell, they can't change libc.
And they don't need to change libc. Libc just passes bytes through. Gabriel.