
On Tue, Aug 17, 2010 at 10:34, Bulat Ziganshin
Hello Johan,
Tuesday, August 17, 2010, 12:20:37 PM, you wrote:
I agree, Data.Text is great. Unfortunately, its internal use of UTF-16 makes it inefficient for many purposes.
It's not clear to me that using UTF-16 internally does make Data.Text noticeably slower.
not slower but require 2x more memory. speed is the same since Unicode contains 2^20 codepoints
This is not entirely correct because it all depends on your data. For western languages is normally holds true that UTF16 occupies twice the memory of UTF8, but for other languages code points might take up to 3 bytes (I thought even 4, but the wikipedia page only mentions 3: http://en.wikipedia.org/wiki/UTF-8). That wikipedia page is a nice read anyway, it mentions some of the advantages and disadvantages of the different encodings. (The complexity of the code that determines the length of an UTF string depends on the encoding for example) Cheers, -Tako