
26 Sep
2007
26 Sep
'07
3:10 a.m.
On 2007-09-26, Deborah Goldsmith
From an implementation point of view, UTF-16 is the most efficient representation for processing Unicode.
This depends on the characteristics of the text being processed. Spacewise, English stays 1 byte/char in UTF-8. Most European languages go up to at most 2, and on average only a bit above 1. Greek and Cyrillic are 2 bytes/char. It's really only the Asian, African, Arabic, etc, that lose space-wise. It's true that time-wise there are definite issues in finding character boundaries. -- Aaron Denney -><-