Re: [Haskell-cafe] Google Summer of Code: BlazeHTML RFC

27 May 2010

      On Thu, May 27, 2010 at 11:40 AM, Ivan Miljenovic 
...
wrote:
...
On 27 May 2010 18:33, Michael Snoyman  wrote:
...
I don't do any string concatenation (look closely), I was very careful to
avoid it. I tried with lazy text as well: it was slower. This isn't
surprising, since lazy text- under the surface- is just a list of strict
text. And the benchmark itself already has a lazy list of strict text.
Using
lazy text would just be adding a layer of wrapping.
I don't know what you mean by "explicitly using Text values"; you mean
calling pack manually? That's really all that OverloadedStrings does.
You can try out lots of different variants on that benchmark. I did that
already, and found this to be the fastest version.
Fair enough.  Now that I think about it, I recall once trying to have
pretty generate Text values rather than String for graphviz (by using
fullRender, so it was still using String under the hood until it came
time to render) and it too was much slower than String (unfortunately,
I didn't record a patch with these changes so I can't just go back and
play with it anymore as I reverted them all :s).
Maybe Bryan can chime in with some best-practices for using Text?
Here's my guess at an explanation for what's happening in my benchmark:
text will clearly beat String in memory usage, that's what it's designed
for. However, the compiler is still generating String values which are being
encoded to Text as runtime.

Now, this is the same process for bytestrings. However, bytestrings never
have to be decoded: the IO routines simply read the character buffer. In the
case of text, however, the encoded data must be decoded again to a
bytestring.

In other words, here's what I think the three different benchmarks are
really doing:

* String: generates a list of Strings, passes each String to a relatively
inefficient IO routine.
* ByteString: encodes Strings one by one into ByteStrings, generates a list
of these ByteStrings, and passes each ByteString to a very efficient IO
routine.
: Text: encodes Strings one by one into Texts, generates a list of these
Texts, calls a UTF-8 decoding function to decode each Text into a
ByteString, and passes each resulting ByteString to a very efficient IO
routine.

In the case of ASCII data to be output as UTF-8, uses the
Data.ByteString.Char8.pack function will most likely always be the most
efficient choice, and thus it seems like something BlazeHtml should support.
I'm considering releasing a Hamlet 0.3 based entirely on UTF-8 encoded
ByteStrings, but I'd also like to hear from Bryan about this.

Michael