[Haskell-cafe] Re: PROPOSAL: New efficient Unicode string library.

28 Sep 2007


      On 2007-09-27, Duncan Coutts  wrote:
...
In message  wnoise@ofb.net writes:
...
On 2007-09-27, Deborah Goldsmith  wrote:
...
On Sep 26, 2007, at 11:06 AM, Aaron Denney wrote:
...
...
UTF-16 has no advantage over UTF-8 in this respect, because of  
surrogate
pairs and combining characters.
Good point.
Well, not so much. As Duncan mentioned, it's a matter of what the most  
common case is. UTF-16 is effectively fixed-width for the majority of  
text in the majority of languages. Combining sequences and surrogate  
pairs are relatively infrequent.
Infrequent, but they exist, which means you can't seek x/2 bytes ahead
to seek x characters ahead.  All such seeking must be linear for both
UTF-16 *and* UTF-8.
And in [Char] for all these years, yet I don't hear people complaining. Most
string processing is linear and does not need random access to characters.
Yeah.  I'm saying the differences between them are going to be in the
constant factors, and that these constant factors will differ between 
workloads.  

-- 
Aaron Denney
-><-