
On October 19, 2010 19:35:33 Duncan Coutts wrote:
Right, that's a very common misunderstanding of Unicode. A Unicode code point (type Char) does not correspond 1:1 with the human notion of a character. It would be nice if it did, but unfortunately it is not something we can ignore. Because of this it is better not to think of operations on individual Chars but on short sequences of Chars. In any case, when processing text (even ASCII where Chars do match characters) many of the most common operations that you want are substring not element based.
I read the wikipedia article on code points, but still do not feel I have a firm grasp as to what exactly you are referring to. If you have a few minutes, would you mind providing a short example to clarify this with a specific example (e.g., a specific code point that gives issues with a 1:1 model and what those issues are). Thanks! -Tyson