New subject: Unicode in source

21 Aug 2002


      At 2002-08-21 00:17, Ketil Z. Malde wrote:
...
...
\#00E1 [LATIN SMALL LETTER A WITH ACUTE]
...
or
...
\#0061 [LATIN SMALL LETTER A] + \#0301 [COMBINING ACUTE ACCENT]
I guess they must be treated the same, too?  That is, the length of
the strings should be the same, they should compare equal, etc etc.
In my opinion no. As far as String is concerned, since it is simply 
[Char], it should be considered as simply a list of codepoints without 
further interpretation. So 'length' and its instance for Eq should be the 
same as for any other list.
...
Or is it an alternative to just ignore the issue, and simply think of
the latter as two characters?
Consider the latter as two codepoints, and don't worry about characters. 
There should be separate functions for doing such things as decomposition 
and equivalence.

-- 
Ashley Yakeley, Seattle WA

Re: [Haskell-i18n] Surrogate pairs?

Ashley Yakeley

ketil＠ii.uib.no

tags

participants (2)