
20 Apr
2006
20 Apr
'06
4:51 a.m.
On Wed, Apr 19, 2006 at 02:52:53PM -0700, John Meacham wrote:
I'd recommend just always using utf8 under the hood (since it shouldn't matter what encoding is used internally) and have two integers stored with the pointer, the number of bytes and the number of characters. when these are the same you know you have straight ASCII, plus it gives you O(1) length for free. I have very optimized utf8 fold operators in the jhc version of PackedString you can steal. they get speed by assuming the data is always valid UTF8 so don't do error checking, which the constructors always enforce. (yay for ADTs)
Nice trick, but you loose constant time indexing operations, don't you? Best regards Tomasz