
On Wed, Feb 08, 2006 at 09:10:37PM +0000, Ben Rudiak-Gould wrote:
John Meacham wrote:
On Tue, Feb 07, 2006 at 04:25:35PM +0000, Ben Rudiak-Gould wrote:
Posix NT Win9x
pathnames bytes UTF-16 locale command line bytes UTF-16 locale file contents bytes bytes bytes pipes/sockets bytes bytes bytes
actually, Posix systems should be the following
pathnames locale UTF-16 locale command line locale UTF-16 locale file contents * bytes bytes pipes/sockets * bytes bytes
Although the Posix interface is in terms of bytes, the strings should always be interpreted via the locale specified in $LANG or $LC_CTYPE also, for file contents and pipes/sockets, if you are passing text, and in the absence of some overriding standard or protocol, you should be using the encoding specified in the locale too.
But that's an application-level convention; the kernel only knows about bytes. On Windows the encoding of pathnames and the command line is a requirement imposed by the kernel. I think assuming the locale encoding for the command line on Posix is a bad idea. Users are unlikely to pass a misencoded command line explicitly, but I want my-haskell-util `find .` to work even on a mounted volume that uses the wrong encoding. (And I also want your-haskell-util to work, even if you didn't write it with this situation in mind.)
when the command line is to be interpreted as a string, then interpreting it in the current locale is definitly the right thing to do. This is why we need two varieties of getArgs, one which returns [String] and one which returns [[Word8]]. though, I doubt the second form will be needed much since in general you usually think of command line arguments as strings, but it should be provided since it can't really be worked around. John -- John Meacham - ⑆repetae.net⑆john⑈