
On Wed, Mar 30, 2011 at 11:01, Alistair Bayley
On 30 March 2011 20:53, Max Bolingbroke
wrote: On 30 March 2011 07:52, Michael Snoyman
wrote: I could manually do something like (utf8Decode . S8.pack), but that presumes that the character encoding on the system in question is UTF8. So two questions:
Funnily enough I have been thinking about this quite hard recently, and the situation is kind of a mess and short of implementing PEP383 (http://www.python.org/dev/peps/pep-0383/) in GHC I can't see how to make it easier on the programmer. As Jason points out the best you can really do is probably:
1. Treat Strings that represent filenames as raw byte sequences, even though they claim to be strings
2. When presenting such Strings to the user, re-decode them by using the current locale encoding (which will typically be UTF-8). You probably want to have some means of avoiding decoding errors here too -- ignoring or replacing undecodable bytes -- but presently this is not so straightforward. If you happen to be on a system with GNU Iconv you can use it's "C//TRANSLIT//IGNORE" encoding to achieve this, however.
http://www.haskell.org/pipermail/libraries/2009-August/012493.html
I took from this discussion that FilePath really should be a pair of the actual filename ByteString, and the printable String (decoded from the ByteString, with encoding specified by the user's locale). The conversion from ByteString to String (and vice versa) is not guaranteed to be lossless, so you need to remember both.
I'm not sure that I agree with that. Why does it have to be loss-less? The problem, more likely, is the fact that FilePath is just a simple string. Maybe we should go the way of Java where cross-platform file access is based upon a File (or the new Path) type? That way the internal representation could use whatever necessary to ensure a unique reference to a file or directory while at the same time providing a way to get a human-readable representation. Going from strings to file/path types would need the correct encodings to work. Cheers, -Tako PS: Just lurking here most of the time because I'm still a total Haskell noob, you can ignore me without risk.