
John Millikin wrote:
That was my understanding also, then QuickCheck found a counter-example. It turns out that there are cases where a valid path cannot be roundtripped in the GHC 7.2 encoding.
The issue is that [238,189,178] decodes to 0xEF72, which is within the 0xEF00-0xEFFF range that GHC uses to represent un-decodable bytes.
How did you deal with this in system-filepath? While no code points in the Supplementary Special-purpose Plane are currently assigned (http://www.unicode.org/roadmaps/ssp/), it is worrying that it's used, especially if filenames in a non-unicode encoding could be interpreted as containing characters really within this plane. I wonder why maxBound :: Char was not increased, and the addtional space after `\1114111' used for the un-decodable bytes?
For FFI, anything that deals with a FilePath should use this withFilePath, which GHC contains but doesn't export(?), rather than the old withCString or withCAString:
import GHC.IO.Encoding (getFileSystemEncoding) import GHC.Foreign as GHC
withFilePath :: FilePath -> (CString -> IO a) -> IO a withFilePath fp f = getFileSystemEncoding >>= \enc -> GHC.withCString enc fp f
If code uses either withFilePort or withCString, then the filenames withFilePath? written will depend on the user's locale. This is wrong. Filenames are either non-encoded text strings (Windows), UTF8 (OSX), or arbitrary bytes (non-OSX POSIX). They must not change depending on the locale.
This is exactly how GHC 7.4 handles them. For example: openDirStream :: FilePath -> IO DirStream openDirStream name = withFilePath name $ \s -> do dirp <- throwErrnoPathIfNullRetry "openDirStream" name $ c_opendir s return (DirStream dirp) removeLink :: FilePath -> IO () removeLink name = withFilePath name $ \s -> throwErrnoPathIfMinus1_ "removeLink" name (c_unlink s) I do not see any locale-dependant behavior in the filename bytes read/written.
Code that reads or writes a FilePath to a Handle (including even to stdout!) must take care to set the right encoding too:
fileEncoding :: Handle -> IO () fileEncoding h = hSetEncoding h =<< getFileSystemEncoding
This is also wrong. A "file path" cannot be written to a handle with any hope of correct behavior. If it's to be displayed to the user, a path should be converted to text first, then displayed.
Sure it can. See find(1). Its output can be read as FilePaths once the Handle is set up as above. If you prefer your program not crash with an encoding error when an arbitrary FilePath is putStr, but instead perhaps output bytes that are not valid in the current encoding, that's also a valid choice. You might be writing a program, like find, that again needs to output any possible FilePath including badly encoded ones. Filesystem.Path.CurrentOS.toText is a nice option if you want validly encoded output though. Thanks for that!
This is new in 7.4, and won't be backported, right? I tried compiling the new "unix" package in 7.2 to get proper file path support, but it failed with an error about some new language extension.
The RawFilePath is just a ByteString, so your existing converters for that in system-filepath might work. -- see shy jo