
Johan Tibell wrote:
Perhaps the only solution is to have System.FilePath.Posix.toString and System.FilePath.Windows.toString with different type signatures.
I'm not sure there's any point. As Duncan pointed out, we are not just talking about the file system, we are talking about interaction between the file system and a user interface - how file paths should appear to users. So it also depends on what UI you are using. For example, GTK2 on Unix always uses UTF-8 to display file paths no matter what the current locale - unless you've set a certain environment variable. Most X terminals display file paths using the current locale. I'm not sure what the current situation is in Qt. On Mac OS X, HFS+ stores file names as UTF-16, and file paths in POSIX calls are interpreted as UTF-8. But canonical Unicode is used, so the actual file path might not be the same as what you provided if it includes combining characters. I think that Windows also converts the file path to (some kind of) canonical Unicode in the presence of combining characters. So we should probably add stringToFilePath as well - encode on vanilla POSIX, canoncialize and encode on Mac OS X, canonicalize on Windows. We need to research exactly which canonical form is used on each platform. Unfortunately, that may depend upon the file system. Also, based on past experience, I fear that on Windows "canonical" may mean something different than anything published. I am now beginning to lean towards Ketil's suggestion that on POSIX platforms we should always use UTF-8. We then need a prominent warning in the documentation that if you need something else, like the current locale, decode it yourself. Note that it is becoming increasingly rare for people to use non-UTF-8 locales anywhere in the world, and even then it's likely ignored by many UIs. So I'm inclined against cluttering the API with convenience functions for other encodings, as Johan is suggesting. As a way forward - I propose: 1. Accept Judah's patch, modified always to use UTF-8. 2. Add strident warnings in the documentation that: o If you need a different encoding on POSIX, do it yourself. o If FilePath does not come from the file system, it may not match the actual file path used in the file system due to Unicode canonicalization. 3. Open a feature request for stringToFilePath as described above. Regards, Yitz