
On 31 March 2011 09:13, Ketil Malde
-- | Try to decode a FilePath to Text, using the current locale encoding. If -- the filepath is invalid in the current locale, it is decoded as ASCII and -- any non-ASCII bytes are replaced with a placeholder.
Why not map them to individual placeholders, i.e. in a private Unicode area?
This way, the conversion could be invertible, and possibly even sensibly displayable given a guesstimated alternative locale (e.g. as Latin 1 if the locale is a West European one).
This is what Python's PEP 383 proposes, and it is implemented in Python 3. This means that you can treat file names as strings uniformly (which is really nice), but does have disadvantages: for example, printing a string to a UTF-8 console can throw an exception if the string contains one of these "surrogate characters". Cheers, Max