
On 7 November 2011 17:32, John Millikin
I am also not convinced that it is possible to correctly implement either of these functions if their behavior is dependent on the user's locale.
FWIW it's only dependent on the users locale because whether glibc iconv detects errors in the *from* sequence depends on what the *to* locale is. Clearly an invalid *from* sequence should be reported as invalid regardless of *to*. I know this isn't much comfort to you, though, since you do have to worry about broken behaviour in 7.2, and possible future breakage with changes in iconv. I understand your point that it would be better from a complexity point of view to just roundtrip the bytes as *bytes* without relying on all this escaping/unescaping code.
Please understand, I am not arguing against the existence of this encoding layer in general. It's a fine idea for a simplistic high-level filesystem interaction library. But it should be *optional*, not part of the compiler or "base.
The problem is that I *really really want* getArgs to decode the command line arguments. That's almost the whole point of this change, and it is what most users seem to expect. Given this constraint, the code has to be part of "base", and if getArgs has this behaviour then any file system function we ship that takes a FilePath (i.e. all the functions in base, directory, win32 and unix) must be prepared to handle these escape characters for consistency. I *would* be happy to expose an alternative file system API from the posix package that operates with ByteString paths. This package could provide a function :: FilePath -> ByteString that encodes the string with the fileSystemEncoding (removing escapes in the process) for interoperability with file names arriving via getArgs, and at that point the decision about whether to use the escaping/unescaping code would be (mostly) in the hands of the user. We could even have posix expose APIs to get command line arguments/environment variables as ByteStrings, and then you could avoid escape/unescape entirely. Which of these solutions (if any) would satisfy you? 1. The current situation, plus an alternative API exposed from "posix" along the lines described above 2. The current situation but with the escape/unescape modified so it allows true roundtripping (at the cost of weird "surrogate" Char values popping up now and again). If you have this you can reliably implement the alternative API on top of the String based one, assuming we got our escape/unescape code right I hope we can work together to find a solution here. Cheers, Max