Re: [Haskell-cafe] Re: String vs ByteString

17 Aug 2010

      On Tue, Aug 17, 2010 at 03:21:32PM +0200, Daniel Peebles wrote:
...
Sounds to me like we need a lazy Data.Text variation that allows UTF-8 and
UTF-16 "segments" in it list of strict text elements :) Then big chunks of
western text will be encoded efficiently, and same with CJK! Not sure what
to do about strict Data.Text though :)
If space is really a concern, there should be a varient that uses LZO or
some other fast compression algorithm that allows concatination as the
back end. 

<ranty thing to follow>
That said, there is never a reason to use UTF-16, it is a vestigial
remanent from the brief period when it was thought 16 bits would be
enough for the unicode standard, any defense of it nowadays is after the
fact justification for having accidentally standardized on it back in
the day. When people chose to use the 16 bit representation, it was
because they wanted a one-to-one mapping between codepoints and units of
computation, which has many advantages. However, this is no longer true,
if the one-to-one mapping is important then nowadays you use ucs-4,
otherwise, you use utf8. If space is very important then you work with
compressed text. In practice a mix of the two is fairly ideal.

        John

-- 
John Meacham - ⑆repetae.net⑆john⑈ - http://notanumber.net/