
26 Sep
2007
26 Sep
'07
7:05 a.m.
I'll look over the proposal more carefully when I get time, but the most important issue is to not let the storage type leak into the interface.
Agreed,
From an implementation point of view, UTF-16 is the most efficient representation for processing Unicode. It's the native Unicode representation for Windows, Mac OS X, and the ICU open source i18n library. UTF-8 is not very efficient for anything except English. Its most valuable property is compatibility with software that thinks of character strings as byte arrays, and in fact that's why it was invented.
If UTF-16 is what's used by everyone else (how about Java? Python?) I think that's a strong reason to use it. I don't know Unicode well enough to say otherwise.