
John Millikin wrote:
In GHC 7.2 and later, file path handling in the platform libraries was changed to treat all paths as text (encoded according to locale). This does not work well on POSIX systems, because POSIX paths are byte sequences. There is no guarantee that any particular path will be valid in the user's locale encoding.
I've been dealing with this change too, but my current understanding is that GHC's handling of encoding for FilePath is documented to allow "arbitrary undecodable bytes to be round-tripped through it". As long as FilePaths are read using this file system encoding, any FilePath should be usable even if it does not match the user's encoding. For FFI, anything that deals with a FilePath should use this withFilePath, which GHC contains but doesn't export(?), rather than the old withCString or withCAString: import GHC.IO.Encoding (getFileSystemEncoding) import GHC.Foreign as GHC withFilePath :: FilePath -> (CString -> IO a) -> IO a withFilePath fp f = getFileSystemEncoding >>= \enc -> GHC.withCString enc fp f Code that reads or writes a FilePath to a Handle (including even to stdout!) must take care to set the right encoding too: fileEncoding :: Handle -> IO () fileEncoding h = hSetEncoding h =<< getFileSystemEncoding
* system-filepath has been converted from GHC's escaping rules to its own, more compatible rules. This lets it support file paths that cannot be represented in GHC 7.2's escape format.
I'm dobutful about adding yet another encoding to the mix. Things are complicated enough already! And in my tests, GHC 7.4's FilePath encoding does allow arbitrary bytes in FilePaths. BTW, GHC now also has RawFilePath. Parts of System.Directory could be usefully written to support that data type too. For example, the parent directory can be determined. Other things are more difficult to do with RawFilepath. -- see shy jo