
On 2 November 2011 09:37, Max Bolingbroke
On 1 November 2011 20:13, John Millikin
wrote: $ ghci-7.2.1 GHC> import System.Directory GHC> getDirectoryContents "path-test" ["\161\165","\61345\61349","..","."] GHC> readFile "path-test/\161\165" "world\n" GHC> readFile "path-test/\61345\61349" *** Exception: path-test/: openFile: does not exist (No such file or directory)
Thanks for the example! I can reproduce this on Linux (haven't tried OS X or Windows) and AFAICT this behaviour is just a straight-up bug and is *not* intended behaviour. I'm not sure why the tests aren't catching it.
I've tracked it down and this bug arises in the following situation: 1. You are not running on Windows 2. You are attempting to encode a string containing the private-use escape codepoints 3. You are using an iconv (such as the one in GNU libc) that, in contravention of the Unicode standard, does not signal EILSEQ if surrogate codepoints are encountered in a non-UTF16 input I've got a patch that will work around the issue in most situations by avoiding the iconv code path. With the patch everything will work OK as long as the system locale is one that we have a native-Haskell decoder for (i.e. basically UTF-8). So you will still be able to get the broken behaviour if the above 3 conditions are met AND your system locale is not UTF-8. I think the only way to fix this last case in general is to fix iconv itself, so I'm going to see if I can get a patch upstream. Fixing it for people with UTF-8 locales should be enough for 99% of users, though. Max