
John Meacham wrote:
It doesn't affect functions added by the hierarchical libraries, i.e. those functions are safe only with the ASCII subset. (There is a vague plan to make Foreign.C.String conform to the FFI spec, which mandates locale-based encoding, and thus would change all those, but it's still up in the air.)
Hmm. I'm not convinced that automatically converting to the current locale is the ideal behaviour (it'd certianly break all my programs!). Certainly a function for converting into the encoding of the current locale would be useful for may users but it's important to be able to know the encoding with certainty.
It should only be the default, not the only option.
I'm not sure that it should be available at all.
It should be possible to specify the encoding explicitly.
Conversely, it shouldn't be possible to avoid specifying the encoding explicitly.
Personally, I wouldn't provide an all-in-one "convert String to CString using locale's encoding" function, just in case anyone was tempted to actually use it.
But this is exactly what is needed for most C library bindings.
I very much doubt that "most" is accurate.
C functions which take a "char*" fall into three main cases:
1. Unspecified encoding, i.e. it's a string of bytes, not characters.
2. Locale's encoding, as determined by nl_langinfo(CODESET);
essentially, whatever was set with setlocale(LC_CTYPE), defaulting to
C/POSIX if setlocale() hasn't been called.
3. Fixed encoding, e.g. UTF-8, ISO-2022, US-ASCII (or EBCDIC on IBM
mainframes).
Historically, library functions have tended to fall into category 1
unless they *need* to know the interpretation of a given byte or
sequence of bytes (e.g.