
On 18 August 2010 15:04, Michael Snoyman
For me, the whole point of this discussion was to determine whether we should attempt porting to UTF-8, which as I understand it would be a rather large undertaking.
And the answer to that is, yes but only if we have good reason to believe it will actually be faster, and that's where we're most interested in benchmarks rather than hand waving. As Johan and others have said, the original choice to use UTF16 was based on benchmarks showing it was faster (than UTF8 or UTF32). So if we want to counter that then we need either to argue that these were the wrong choice of benchmarks that do not reflect real usage, or that with better implementations that the balance would shift. Now there is an interesting argument to claim that we spend more time shovling strings about than we do actually processing them in any interesting way and therefore that we should pick benchmarks that reflect that. This would then shift the balance to favour the internal representation being identical to some particular popular external representation --- even if that internal representation is slower for many processing tasks. Duncan