
3 Oct
2007
3 Oct
'07
6:30 a.m.
On Tue, 2007-10-02 at 21:45 -0400, Brandon S. Allbery KF8NH wrote:
Due to the additional complexity of handling UTF-8 -- EVEN IF the actual text processed happens all to be US-ASCII -- will UTF-8 perhaps be less efficient than UTF-16, or only as fast?
UTF8 will be very slightly faster in the all-ASCII case, but quickly blows chunks if you have *any* characters that require multibyte.
What benchmarks are you basing this on? Doubling your data size is going to cost you if you are doing simple operations (searching, say), but I don't see UTF-8 being particularly expensive - somebody (forget who) implemented UTF-8 on top of ByteString, and IIRC, the benchmarks numbers didn't change all that much from the regular Char8. -k