But UTF-16 (apart from being an abomination for creating a hole in the
codepoint space and making it impossible to ever etxend it) is slow to
process compared with UTF-32 - you can't get the nth character in
constant time, so it seems an odd choice to me.
Aside: Getting the nth character isn't very useful when working with Unicode text:
* Most text processing is linear.
* What we consider a character and what Unicode considers a character differs a bit e.g. since Unicode uses combining characters.
Cheers,
Johan