On Tue, Aug 17, 2010 at 6:19 PM, John Millikin <jmillikin@gmail.com> wrote:
Ruby, which has an enormous Japanese userbase, solved the problem by
essentially defining Text = (Encoding, ByteString), and then
re-implementing text logic for each encoding. This allows very
efficient operation with every possible encoding, at the cost of
increased complexity (caching decoded characters, multi-byte handling,
etc).

This code introduce overhead as each function call needs to dispatch on the encoding, which is unlikely to be known statically. I don't know if this matters or not (yet another thing that needs to be measured).

-- Johan