
Dylan Thurston
Right. In Unicode, the concept of a "character" is not really so useful;
After reading a bit about it, I'm certainly confused. Unicode/ISO-10646 contains a lot of things that aren'r really one character, e.g. ligatures.
most functions that traditionally operate on characters (e.g., uppercase or display-width) fundamentally need to operate on strings. (This is due to properties of particular languages, not any design flaw of Unicode.)
I think an argument could be put forward that Unicode is trying to be more than just a character set. At least at first glance, it seems to try to be both a character set and a glyph map, and incorporate things like transliteration between character sets (or subsets, now that Unicode contains them all), directionality of script, and so on.
toUpper, toLower - Not OK. There are cases where upper casing a character yields two characters.
I though title case was supposed to handle this. I'm probably confused, though.
etc. Any program using this library is bound to get confused on Unicode strings. Even before Unicode, there is much functionality missing; for instance, I don't see any way to compare strings using a localized order.
And you can't really use list functions like "length" on strings, since one item can be two characters (Lj, ij, fi) and several items can compose one character (combining characters). And "map (==)" can't compare two Strings since, e.g. in the presence of combining characters. How are other systems handling this? It may be that Unicode isn't flawed, but it's certainly extremely complex. I guess I'll have to delve a bit deeper into it. -kzm -- If I haven't seen further, it is by standing in the footprints of giants