[Haskell-cafe] Re: PROPOSAL: New efficient Unicode string library.

26 Sep 2007

      On 2007-09-26, Deborah Goldsmith  wrote:
...
From an implementation point of view, UTF-16 is the most efficient  
representation for processing Unicode.
This depends on the characteristics of the text being processed.
Spacewise, English stays 1 byte/char in UTF-8.  Most European languages
go up to at most 2, and on average only a bit above 1.  Greek and
Cyrillic are 2 bytes/char.  It's really only the Asian, African, Arabic,
etc, that lose space-wise.

It's true that time-wise there are definite issues in finding character
boundaries.

-- 
Aaron Denney
-><-