Re: [Haskell-cafe] Re: File path programme

30 Jan 2005

      Marcin 'Qrczak' Kowalczyk wrote:
...
...
...
The various UTF encodings do not have this particular problem; if a UTF
string is valid, then it is a unique representation of a unicode string.
However, decoding is still a partial function and can fail.
And while it is partly true, it is qualified by the problems relative to
canonicalization (an "-B�" in Unicode can both be represented as "�" or as two-A
chars (an e and an accent) and they should (ideally) compare equal).
In what sense "equal"? They are supposed to be equivalent as far
as the semantics of the text is concerned, but representations are
clearly different and most programs distinguish them. In particular
they are different filenames on both Unix and Windows. AFAIK MacOS
normalizes filenames, but using a slightly different algorithm than
Unicode (perhaps just an older version).
IMHO it makes no sense to pretend that they are exactly the same when
strings consist of code points or lower level units (and I don't
believe another choice for the default string type would be practical).
Well, at least you and I agree on that.

Once you start down the "semantic equivalence" route, you will quickly
run into issues like "�" == "ss", and it only gets worse from there
on.

-- 
Glynn Clements