
John Meacham wrote:
On Tue, Feb 07, 2006 at 04:25:35PM +0000, Ben Rudiak-Gould wrote:
Posix NT Win9x
pathnames bytes UTF-16 locale command line bytes UTF-16 locale file contents bytes bytes bytes pipes/sockets bytes bytes bytes
actually, Posix systems should be the following
pathnames locale UTF-16 locale command line locale UTF-16 locale file contents * bytes bytes pipes/sockets * bytes bytes
Although the Posix interface is in terms of bytes, the strings should always be interpreted via the locale specified in $LANG or $LC_CTYPE also, for file contents and pipes/sockets, if you are passing text, and in the absence of some overriding standard or protocol, you should be using the encoding specified in the locale too.
But that's an application-level convention; the kernel only knows about bytes. On Windows the encoding of pathnames and the command line is a requirement imposed by the kernel. I think assuming the locale encoding for the command line on Posix is a bad idea. Users are unlikely to pass a misencoded command line explicitly, but I want my-haskell-util `find .` to work even on a mounted volume that uses the wrong encoding. (And I also want your-haskell-util to work, even if you didn't write it with this situation in mind.) -- Ben