
Hello John, Thursday, April 20, 2006, 3:03:23 AM, you wrote:
On Thu, Apr 20, 2006 at 12:47:49AM +0200, Marcin 'Qrczak' Kowalczyk wrote:
I'd recommend just always using utf8 under the hood
Or have two cases of the representation: an array of bytes if every character is U+00FF or below, or an array of 32-bit words otherwise.
The complexity of multiple cases and encodings never seemed worth it to me. the code gets bigger and you have to have switches depending on the representation that slows things down. Just plain old utf8 always seems the best for a FastPackedString library at least. But others opinions differ on the matter.
i'm not sure. utf-8 encoding has his own drawbacks (i mean complexity of code and slowness). moreover if Donald will use Simon's technique for implementing ucs1..ucs4 packed strings, this module will just call existing functions. one problem, though, is what we will lose utf-8 support for mapped files and any other C strings -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com