
On 2007-09-27, Duncan Coutts
In message
wnoise@ofb.net writes: On 2007-09-27, Deborah Goldsmith
wrote: On Sep 26, 2007, at 11:06 AM, Aaron Denney wrote:
UTF-16 has no advantage over UTF-8 in this respect, because of surrogate pairs and combining characters.
Good point.
Well, not so much. As Duncan mentioned, it's a matter of what the most common case is. UTF-16 is effectively fixed-width for the majority of text in the majority of languages. Combining sequences and surrogate pairs are relatively infrequent.
Infrequent, but they exist, which means you can't seek x/2 bytes ahead to seek x characters ahead. All such seeking must be linear for both UTF-16 *and* UTF-8.
And in [Char] for all these years, yet I don't hear people complaining. Most string processing is linear and does not need random access to characters.
Yeah. I'm saying the differences between them are going to be in the constant factors, and that these constant factors will differ between workloads. -- Aaron Denney -><-