
On Wed, 2009-08-26 at 16:14 +0300, Yitzchak Gale wrote:
Johan Tibell wrote:
Perhaps the only solution is to have System.FilePath.Posix.toString and System.FilePath.Windows.toString with different type signatures.
I'm not sure there's any point. As Duncan pointed out, we are not just talking about the file system, we are talking about interaction between the file system and a user interface - how file paths should appear to users. So it also depends on what UI you are using.
Mmm, this stuff is complex :-( In general I like the idea of the proposal that we have functions for converting between String and FilePath. As it says in the proposal, it gets us closer to being able to treat FilePath as abstract. Of course the devil is in the detail. Getting it right, and making it portable and usable is hard.
I am now beginning to lean towards Ketil's suggestion that on POSIX platforms we should always use UTF-8. We then need a prominent warning in the documentation that if you need something else, like the current locale, decode it yourself.
That's nice in that it makes the function pure, or equivalently so that it does not need a locale parameter.
Note that it is becoming increasingly rare for people to use non-UTF-8 locales anywhere in the world, and even then it's likely ignored by many UIs. So I'm inclined against cluttering the API with convenience functions for other encodings, as Johan is suggesting.
As a way forward - I propose:
1. Accept Judah's patch, modified always to use UTF-8.
If we don't have the locale stuff then doesn't the API become a lot simpler? Instead of: filePathToString :: FilePath -> IO String getFilePathToStringFunc :: IO (FilePath -> String) We'd have: filePathToString :: FilePath -> String Presumably on POSIX we will follow the glib approach of using '?' replacement chars, since the conversion to string is aimed at human consumption. Doing this makes the function total but lossy. And I didn't notice anything in the proposal about the other direction, converting String to FilePath. Surely we need both. stringToFilePath :: String -> FilePath A nice thing about using UTF8 on POSIX is we know this function cannot fail, unlike conversions into a locale encoding. Presumably on POSIX this does not do any kind of Unicode canonicalisation, while on OSX and Windows it would do the appropriate kind. At this point I expect Johan to jump up and down and say these should be: import qualified System.FilePath as FilePath FilePath.toString :: FilePath -> String FilePath.fromString :: String -> FilePath In principle I guess it'd be ok to add versions in the System.FilePath.Posix module that take an extra encoding parameter, but it can't be the portable version since the encoding is fixed for OSX and Windows. It's also jolly inconvenient, and as you've pointed out, of diminishing importance.
2. Add strident warnings in the documentation that:
o If you need a different encoding on POSIX, do it yourself.
o If FilePath does not come from the file system, it may not match the actual file path used in the file system due to Unicode canonicalization.
Similar points apply to trying to round-trip via toString . fromString :: String -> String fromString . toString :: FilePath -> FilePath The String -> String transform would do some Unicode canonicalisation on Windows and OSX. The FilePath -> FilePath would be identity on Windows and OSX for strings coming from the file system. On POSIX however we can get utf8 decoding errors which will give us replacement chars. So the advice in this section of the documentation should probably be similar to the glib docs, where it says that you should keep both forms in some circumstances. You can present the file name to the user though a graphical or command line ui, but also so you can still access the same file later (eg to save it). Especially in document-oriented GUI apps, it's very annoying if you open, edit and save, but saving either fails because it cannot re-encode, or ends up writing a different file (different in Unicode canonicalisation or having replacement chars). Duncan