Fri, 5 Oct 2001 23:23:50 +1000, Andrew J Bromage
There is a set of one million (more correctly, 1M) Unicode characters which are only accessible using surrogate pairs (i.e. two UTF-16 codes). There are currently none of these codes assigned,
This information is out of date. AFAIR about 40000 of them is assigned. Most for Chinese (current, not historic).
So rare, in fact, that the cost of strings taking up twice the space that the currently do simply isn't worth the cost.
In Haskell strings already have high overhead. In GHC a Char# value (inside Char object) always takes the same size as the pointer (32 or 64 bits), no matter how much of it is used.
It just goes to show that strings are not merely arrays of characters like some languages would have you believe.
In Haskell String = [Char]. It's true that Char values don't necessarily correspond to glyphs, but Strings are composed of Chars. -- __("< Marcin Kowalczyk * qrczak@knm.org.pl http://qrczak.ids.net.pl/ \__/ ^^ SYGNATURA ZASTÊPCZA QRCZAK