Re: [Haskell-cafe] Google Summer of Code: BlazeHTML RFC

On Thu, May 27, 2010 at 11:16 AM, Ivan Miljenovic wrote: On 27 May 2010 17:55, Michael Snoyman Two comments:
* The exclamation point seems good enough for attributes. I copied that
for
Hamlet as well.
* If you're standardizing on UTF-8, why not support bytestrings? I'm
aware
that a user could shoot him/herself in the foot by passing in non-UTF8
data,
but I would imagine the performance gains would outweigh this. My recent
benchmarks on the BigTable benchmark[1] imply a huge performance gap
between
ByteStrings and other contenders. Wow, I find it rather surprising that String out-performs Text; any
idea why that is? I wonder if you're just using it wrong... Could be, I'd be very happy if that were the case. All of the benchmarks
are available on Github, and the bytestring[1], text[2] and string[3]
versions are all rather short. Michael
[1]
http://github.com/snoyberg/benchmarks/blob/master/bigtable/cgi/bytestring.hs
[2] http://github.com/snoyberg/benchmarks/blob/master/bigtable/cgi/text.hs
[3] http://github.com/snoyberg/benchmarks/blob/master/bigtable/cgi/string.hs

On Thu, May 27, 2010 at 10:23 AM, Michael Snoyman
On Thu, May 27, 2010 at 11:16 AM, Ivan Miljenovic < ivan.miljenovic@gmail.com> wrote:
Wow, I find it rather surprising that String out-performs Text; any idea why that is? I wonder if you're just using it wrong...
Could be, I'd be very happy if that were the case. All of the benchmarks are available on Github, and the bytestring[1], text[2] and string[3] versions are all rather short.
Do you include the cost of encoding the result as e.g. UTF-8? The hope would be that the more compact Text would be faster to traverse, and thus encode, than the list based String.

On Thu, May 27, 2010 at 12:57 PM, Johan Tibell
On Thu, May 27, 2010 at 10:23 AM, Michael Snoyman
wrote: On Thu, May 27, 2010 at 11:16 AM, Ivan Miljenovic < ivan.miljenovic@gmail.com> wrote:
Wow, I find it rather surprising that String out-performs Text; any idea why that is? I wonder if you're just using it wrong...
Could be, I'd be very happy if that were the case. All of the benchmarks are available on Github, and the bytestring[1], text[2] and string[3] versions are all rather short.
Do you include the cost of encoding the result as e.g. UTF-8? The hope would be that the more compact Text would be faster to traverse, and thus encode, than the list based String.
No, but this is done on purpose. One of my goals in this benchmark was to determine whether I should consider switching Hamlet to ByteStrings. If I were to do so, then the UTF-8 encoding would be done at compile-time instead of run-time.
You're correct that a fair comparison would be to UTF-8 encode the Strings as well. However, that's not what most users are going to do most of the time: when dealing with ASCII data, a straight Char8.pack encoding will do the same as UTF-8. I'm simply pointing out that I think Blaze should support this style. Michael
participants (2)
-
Johan Tibell
-
Michael Snoyman