
On Wed, Apr 19, 2006 at 06:04:58PM +0400, Bulat Ziganshin wrote:
1. it don't support Unicode. there are at least two libs (Simon's and from JHC) that uses UTF-8 to do this. of course, they will be not so efficient on some operations. i think that it is essential to general-purpose library and Data.PackedString replacement. Simon's lib already implements utf-8, latin-1, ucs-2 and ucs-4 encoding. may be it's possible to join them all together in one lib that uses prerocessing or some other technique to implement differences between utf-8 and fixed-width encoding
Indeed. I was excited about the prospect of using FastPackedString until I saw it didn't support the full character range. that is too bad. I'd recommend just always using utf8 under the hood (since it shouldn't matter what encoding is used internally) and have two integers stored with the pointer, the number of bytes and the number of characters. when these are the same you know you have straight ASCII, plus it gives you O(1) length for free. I have very optimized utf8 fold operators in the jhc version of PackedString you can steal. they get speed by assuming the data is always valid UTF8 so don't do error checking, which the constructors always enforce. (yay for ADTs) John -- John Meacham - ⑆repetae.net⑆john⑈