
On Wed, 2007-09-26 at 09:05 +0200, Johan Tibell wrote:
I'll look over the proposal more carefully when I get time, but the most important issue is to not let the storage type leak into the interface.
Agreed,
From an implementation point of view, UTF-16 is the most efficient representation for processing Unicode. It's the native Unicode representation for Windows, Mac OS X, and the ICU open source i18n library. UTF-8 is not very efficient for anything except English. Its most valuable property is compatibility with software that thinks of character strings as byte arrays, and in fact that's why it was invented.
If UTF-16 is what's used by everyone else (how about Java? Python?) I think that's a strong reason to use it. I don't know Unicode well enough to say otherwise.
I disagree. I realize I'm a dissenter in this regard, but my position is: excellent Unix support first, portability second, excellent support for Win32/MacOS a distant third. That seems to be the opposite of every language's position. Unix absolutely needs UTF-8 for backward compatibility. jcc