Re: [Haskell-cafe] PROPOSAL: New efficient Unicode string library.

26 Sep 2007


      ...
I'll look over the proposal more carefully when I get time, but the
most important issue is to not let the storage type leak into the
interface.
Agreed,
...
From an implementation point of view, UTF-16 is the most efficient
representation for processing Unicode. It's the native Unicode
representation for Windows, Mac OS X, and the ICU open source i18n
library. UTF-8 is not very efficient for anything except English. Its
most valuable property is compatibility with software that thinks of
character strings as byte arrays, and in fact that's why it was
invented.
If UTF-16 is what's used by everyone else (how about Java? Python?) I
think that's a strong reason to use it. I don't know Unicode well
enough to say otherwise.

Re: [Haskell-cafe] PROPOSAL: New efficient Unicode string library.

Johan Tibell