
On Friday 16 March 2007 00:15, Wolfgang Thaller wrote:
Indeed, paths and command-line arguments are becoming very string- like on Unix systems, too. On Mac OS X, the locale for file names is pretty much hardcoded to UTF-8. Mac OS X's native file system stores file names in UTF-16, but the POSIX layer sees it as UTF-8. [...]
I had a look how other languages and toolkits handle this issue. For those which do not completely ignore it, the consensus seems to be: * On Mac OS X, the POSIX layer indeed seems to use UTF-8, but in a *decomposed* form. This could be a little bit surprising, so some normalization is probably needed. * On Windows, the current ANSI code page is assumed, which could vary from installation to installation and can be changed by the user AFAIK. * For *nices the story is a bit tricky, but often a combination of * nl_laninfo(CODESET) * setlocale(LC_TYPE, 0) * the environment variables LC_ALL, LC_TYPE and LANG * iconv is used to figure out the current local encoding and use that. Depending on the distribution, it can be UTF-8 (e.g. recent SuSE distros), but it doesn't have to be. So I propose a compromise, we don't really have to be better than most languages/toolkits out there: Let's keep FilePath = String, but improve the real culprit, i.e. CString and friends. Currently, peekCString{,Len}, newCString{,Len} and withCString{,Len} simply use their "CA" ASCII counterparts. If we put the above common logic into Foreign.C.String, we could already achieve a lot. In addition, we might consider adding some e.g. ByteString-based API entries to the POSIX package for the real low-level stuff, but I think this is not a topmost priority. Opinions? Cheers, S.