Re: [Haskell-cafe] Re: String vs ByteString

18 Aug 2010

      On Wed, Aug 18, 2010 at 2:39 PM, Johan Tibell wrote:
...
On Wed, Aug 18, 2010 at 2:12 AM, John Meacham  wrote:
...
<ranty thing to follow>
That said, there is never a reason to use UTF-16, it is a vestigial
remanent from the brief period when it was thought 16 bits would be
enough for the unicode standard, any defense of it nowadays is after the
fact justification for having accidentally standardized on it back in
the day.
This is false. Text uses UTF-16 internally as early benchmarks indicated
that it was faster. See Tom Harper's response to the other thread that was
spawned of this thread by Ketil.
Text continues to be UTF-16 today because
* no one has written a benchmark that shows that UTF-8 would be faster
*for use in Data.Text*, and
    * no one has written a patch that converts Text to use UTF-8
internally.
I'm quite frustrated by this whole discussion; there's lots of talking, no
coding, and only a little benchmarking (of web sites, not code). This will
get us nowhere.
Here's my response to the two points:
* I haven't written a patch showing that Data.Text would be faster using
UTF-8 because that would require fulfilling the second point (I'll get to in
a second). I *have* shown where there are huge performance differences
between text and ByteString/String. Unfortunately, the response has been
"don't use bytestring, it's the wrong datatype, text will get fixed," which
is quite underwhelming.

* Since the prevailing attitude has been such a disregard to any facts shown
thus far, it seems that the effort required to learn the internals of the
text package and attempt a patch would be wasted. In the meanwhile, Jasper
has released blaze-builder which does an amazing job at producing UTF-8
encoded data, which for the moment is my main need. As much as I'll be
chastised by the community, I'll stick with this approach for the moment.

Now if you tell me that text would consider applying a UTF-8 patch, that
would be a different story. But I don't have the time to maintain a separate
UTF-8 version of text. For me, the whole point of this discussion was to
determine whether we should attempt porting to UTF-8, which as I understand
it would be a rather large undertaking.

Michael

Re: [Haskell-cafe] Re: String vs ByteString

Michael Snoyman