
13 Aug
2010
13 Aug
'10
8:51 p.m.
On Fri, Aug 13, 2010 at 6:41 PM, Brandon S Allbery KF8NH
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 8/13/10 16:37 , Kevin Jardine wrote:
Surely efficient Unicode text should always be the default? And if the
Efficient for what? The most efficient Unicode representation for Latin-derived strings is UTF-8, but the most efficient for CJK is UTF-16.
I have an app that is using Data.Text, however I'm thinking of switching to UTF8 bytestrings. The reasons are that there are two main things I do with text: pass it to a C API to display, and parse it. The C API expects UTF8, and the parser libraries with a reputation for being fast all seem to have bytestring inputs, but not Data.Text (I'm using unpack -> parsec, which is not optimal).