On 30 March 2011 20:53, Max Bolingbroke <batterseapower@hotmail.com> wrote:
On 30 March 2011 07:52, Michael Snoyman <michael@snoyman.com> wrote:
> I could
> manually do something like (utf8Decode . S8.pack), but that presumes
> that the character encoding on the system in question is UTF8. So two
> questions:

Funnily enough I have been thinking about this quite hard recently,
and the situation is kind of a mess and short of implementing PEP383
(http://www.python.org/dev/peps/pep-0383/) in GHC I can't see how to
make it easier on the programmer. As Jason points out the best you can
really do is probably:

 1. Treat Strings that represent filenames as raw byte sequences, even
though they claim to be strings

 2. When presenting such Strings to the user, re-decode them by using
the current locale encoding (which will typically be UTF-8). You
probably want to have some means of avoiding decoding errors here too
-- ignoring or replacing undecodable bytes -- but presently this is
not so straightforward. If you happen to be on a system with GNU Iconv
you can use it's "C//TRANSLIT//IGNORE" encoding to achieve this,
however.


http://www.haskell.org/pipermail/libraries/2009-August/012493.html

I took from this discussion that FilePath really should be a pair of the actual filename ByteString, and the printable String (decoded from the ByteString, with encoding specified by the user's locale). The conversion from ByteString to String (and vice versa) is not guaranteed to be lossless, so you need to remember both.

Alistair