
Stefan Monnier
The various UTF encodings do not have this particular problem; if a UTF string is valid, then it is a unique representation of a unicode string. However, decoding is still a partial function and can fail.
And while it is partly true, it is qualified by the problems relative to canonicalization (an "é" in Unicode can both be represented as "é" or as two chars (an e and an accent) and they should (ideally) compare equal).
In what sense "equal"? They are supposed to be equivalent as far as the semantics of the text is concerned, but representations are clearly different and most programs distinguish them. In particular they are different filenames on both Unix and Windows. AFAIK MacOS normalizes filenames, but using a slightly different algorithm than Unicode (perhaps just an older version). IMHO it makes no sense to pretend that they are exactly the same when strings consist of code points or lower level units (and I don't believe another choice for the default string type would be practical). -- __("< Marcin Kowalczyk \__/ qrczak@knm.org.pl ^^ http://qrnik.knm.org.pl/~qrczak/