On Tue, Aug 17, 2010 at 9:30 PM, Donn Cave <donn@avvanta.com> wrote:
Quoth John Millikin <jmillikin@gmail.com>,
Ruby actually comes from the CJK world in a way, doesn't it?
> Ruby, which has an enormous Japanese userbase, solved the problem by
> essentially defining Text = (Encoding, ByteString), and then
> re-implementing text logic for each encoding. This allows very
> efficient operation with every possible encoding, at the cost of
> increased complexity (caching decoded characters, multi-byte handling,
> etc).
Even if efficient per-encoding manipulation is a tough nut to crack,
it at least avoids the fixed cost of bulk decoding, so an application
designer doesn't need to think about the pay-off for a correct text
approach vs. `binary'/ASCII, and the language/library designer doesn't
need to think about whether genome data is a representative case etc.