30 Sep
2001
30 Sep
'01
10:36 p.m.
At 2001-09-30 07:29, Marcin 'Qrczak' Kowalczyk wrote:
Some time ago the Unicode Consortium slowly began switching to the point of view that abstract characters are denoted by numbers in the range U+0000..10FFFF.
It's worth mentioning that these are 'codepoints', not 'characters'. Sometimes a character will be made up of two codepoints, for instance an 'a' with a dot above is a single character that can be made from the codepoints LATIN SMALL LETTER A and COMBINING DOT ABOVE. Perhaps this makes the UTF-16 'surrogate' problem a bit less serious, since there never was a one-to-one correspondence between any kind of n-bit unit and displayed characters. -- Ashley Yakeley, Seattle WA