
On Fri, 2007-03-16 at 15:42 +0100, Sven Panne wrote:
The main point here is that UTF-8 => Unicode => UTF-8 is lossless, and the same holds for my proposed change for *nices, too, as long as the local encoding is invertible in this sense. I am not sure if there are encodings in use out there which do not have this property, but even if they are: All e.g. Qt-based programs would share the same problems.
Gtk+/GNOME programs are fairly careful in this regard. When loading a file they keep *both* the original sequence of bytes that is the file name and they also interpret it in a particular locale and try to convert that to Unicode to display in the GUI. If that conversion fails it will do a best-effort conversion using replacement characters or just display "unknown file name" or somethin. However when saving the file again they always use the original file name which is just the raw sequence of bytes. When saving a new file and taking a Unicode string from the user they try to convert it to a locale encoding and if that conversion fails it asks the user to use a different name. See: http://developer.gnome.org/doc/API/2.0/glib/glib-Character-Set-Conversion.ht... the section near the top on "File Name Encodings" So a hypothetical FilePath ADT might keep both the raw and displayable unicode versions of a file name. Duncan