Google Summer of Code: BlazeHTML RFC

Dear all, BlazeHtml started out on ZuriHac 2010. Now, Jasper Van der Jeugt is working on it as a student to Google Summer of Code for haskell.org. His mentors are Simon Meier and Johan Tibell. The goal is to create a high-performance HTML generation library. In the past few weeks, we have been exploring the performance and design of different drafts of this library. Now, the time has come to ask some questions to the Haskell community — more specifically the future users of BlazeHtml as well as current users of other HTML generation libraries. We have written an RFC to gather feedback from the community: HTML version: http://jaspervdj.be/posts/2010-05-27-blazehtml-rfc.html Plain version: http://github.com/jaspervdj/BlazeHtml/raw/develop/doc/RFC.lhs The easiest way of sending us feedback, comments or criticism is replying to the haskell-cafe thread here. Alternatively, drop a comment at the bottom of the HTML version or at reddit. Looking forward to your feedback, Kind regards, Simon Meier Jasper Van der Jeugt

Two comments:
* The exclamation point seems good enough for attributes. I copied that for
Hamlet as well.
* If you're standardizing on UTF-8, why not support bytestrings? I'm aware
that a user could shoot him/herself in the foot by passing in non-UTF8 data,
but I would imagine the performance gains would outweigh this. My recent
benchmarks on the BigTable benchmark[1] imply a huge performance gap between
ByteStrings and other contenders.
As we've discussed before, I think combining BlazeHtml and Hamlet would be
very nice, though I'm dubious that a BlazeHtml backend for Hamlet would be
faster than a raw backend.
Looking forward to hearing more progress, good luck!
Michael
[1] http://www.snoyman.com/blog/entry/bigtable-benchmarks/
On Thu, May 27, 2010 at 10:16 AM, Jasper Van der Jeugt
Dear all,
BlazeHtml started out on ZuriHac 2010. Now, Jasper Van der Jeugt is working on it as a student to Google Summer of Code for haskell.org. His mentors are Simon Meier and Johan Tibell. The goal is to create a high-performance HTML generation library.
In the past few weeks, we have been exploring the performance and design of different drafts of this library. Now, the time has come to ask some questions to the Haskell community — more specifically the future users of BlazeHtml as well as current users of other HTML generation libraries.
We have written an RFC to gather feedback from the community:
HTML version: http://jaspervdj.be/posts/2010-05-27-blazehtml-rfc.html Plain version: http://github.com/jaspervdj/BlazeHtml/raw/develop/doc/RFC.lhs
The easiest way of sending us feedback, comments or criticism is replying to the haskell-cafe thread here. Alternatively, drop a comment at the bottom of the HTML version or at reddit.
Looking forward to your feedback, Kind regards, Simon Meier Jasper Van der Jeugt _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On 27 May 2010 17:55, Michael Snoyman
Two comments: * The exclamation point seems good enough for attributes. I copied that for Hamlet as well. * If you're standardizing on UTF-8, why not support bytestrings? I'm aware that a user could shoot him/herself in the foot by passing in non-UTF8 data, but I would imagine the performance gains would outweigh this. My recent benchmarks on the BigTable benchmark[1] imply a huge performance gap between ByteStrings and other contenders.
Wow, I find it rather surprising that String out-performs Text; any idea why that is? I wonder if you're just using it wrong... -- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com IvanMiljenovic.wordpress.com

On Thu, May 27, 2010 at 11:16 AM, Ivan Miljenovic wrote: On 27 May 2010 17:55, Michael Snoyman Two comments:
* The exclamation point seems good enough for attributes. I copied that
for
Hamlet as well.
* If you're standardizing on UTF-8, why not support bytestrings? I'm
aware
that a user could shoot him/herself in the foot by passing in non-UTF8
data,
but I would imagine the performance gains would outweigh this. My recent
benchmarks on the BigTable benchmark[1] imply a huge performance gap
between
ByteStrings and other contenders. Wow, I find it rather surprising that String out-performs Text; any
idea why that is? I wonder if you're just using it wrong... Could be, I'd be very happy if that were the case. All of the benchmarks
are available on Github, and the bytestring[1], text[2] and string[3]
versions are all rather short. Michael
[1]
http://github.com/snoyberg/benchmarks/blob/master/bigtable/cgi/bytestring.hs
[2] http://github.com/snoyberg/benchmarks/blob/master/bigtable/cgi/text.hs
[3] http://github.com/snoyberg/benchmarks/blob/master/bigtable/cgi/string.hs

On 27 May 2010 18:23, Michael Snoyman
On Thu, May 27, 2010 at 11:16 AM, Ivan Miljenovic
wrote: On 27 May 2010 17:55, Michael Snoyman
wrote: Two comments: * The exclamation point seems good enough for attributes. I copied that for Hamlet as well. * If you're standardizing on UTF-8, why not support bytestrings? I'm aware that a user could shoot him/herself in the foot by passing in non-UTF8 data, but I would imagine the performance gains would outweigh this. My recent benchmarks on the BigTable benchmark[1] imply a huge performance gap between ByteStrings and other contenders.
Wow, I find it rather surprising that String out-performs Text; any idea why that is? I wonder if you're just using it wrong...
Could be, I'd be very happy if that were the case. All of the benchmarks are available on Github, and the bytestring[1], text[2] and string[3] versions are all rather short.
Does using lazy Text values improve this? I find it a little strange that you concatenate so many individual Strings that much. Also, how about explicitly using Text values rather than OverloadedStrings? -- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com IvanMiljenovic.wordpress.com

On Thu, May 27, 2010 at 11:28 AM, Ivan Miljenovic wrote: On 27 May 2010 18:23, Michael Snoyman On Thu, May 27, 2010 at 11:16 AM, Ivan Miljenovic
On 27 May 2010 17:55, Michael Snoyman Two comments:
* The exclamation point seems good enough for attributes. I copied that for
Hamlet as well.
* If you're standardizing on UTF-8, why not support bytestrings? I'm
aware
that a user could shoot him/herself in the foot by passing in non-UTF8
data,
but I would imagine the performance gains would outweigh this. My
recent
benchmarks on the BigTable benchmark[1] imply a huge performance gap
between
ByteStrings and other contenders. Wow, I find it rather surprising that String out-performs Text; any
idea why that is? I wonder if you're just using it wrong... Could be, I'd be very happy if that were the case. All of the benchmarks
are
available on Github, and the bytestring[1], text[2] and string[3]
versions
are all rather short. Does using lazy Text values improve this? I find it a little strange
that you concatenate so many individual Strings that much. Also, how
about explicitly using Text values rather than OverloadedStrings? I don't do any string concatenation (look closely), I was very careful to
avoid it. I tried with lazy text as well: it was slower. This isn't
surprising, since lazy text- under the surface- is just a list of strict
text. And the benchmark itself already has a lazy list of strict text. Using
lazy text would just be adding a layer of wrapping. I don't know what you mean by "explicitly using Text values"; you mean
calling pack manually? That's really all that OverloadedStrings does.
You can try out lots of different variants on that benchmark. I did that
already, and found this to be the fastest version.
Michael

On 27 May 2010 18:33, Michael Snoyman
I don't do any string concatenation (look closely), I was very careful to avoid it. I tried with lazy text as well: it was slower. This isn't surprising, since lazy text- under the surface- is just a list of strict text. And the benchmark itself already has a lazy list of strict text. Using lazy text would just be adding a layer of wrapping. I don't know what you mean by "explicitly using Text values"; you mean calling pack manually? That's really all that OverloadedStrings does. You can try out lots of different variants on that benchmark. I did that already, and found this to be the fastest version.
Fair enough. Now that I think about it, I recall once trying to have pretty generate Text values rather than String for graphviz (by using fullRender, so it was still using String under the hood until it came time to render) and it too was much slower than String (unfortunately, I didn't record a patch with these changes so I can't just go back and play with it anymore as I reverted them all :s). Maybe Bryan can chime in with some best-practices for using Text? -- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com IvanMiljenovic.wordpress.com

On Thu, May 27, 2010 at 11:40 AM, Ivan Miljenovic wrote: On 27 May 2010 18:33, Michael Snoyman I don't do any string concatenation (look closely), I was very careful to
avoid it. I tried with lazy text as well: it was slower. This isn't
surprising, since lazy text- under the surface- is just a list of strict
text. And the benchmark itself already has a lazy list of strict text.
Using
lazy text would just be adding a layer of wrapping.
I don't know what you mean by "explicitly using Text values"; you mean
calling pack manually? That's really all that OverloadedStrings does.
You can try out lots of different variants on that benchmark. I did that
already, and found this to be the fastest version. Fair enough. Now that I think about it, I recall once trying to have
pretty generate Text values rather than String for graphviz (by using
fullRender, so it was still using String under the hood until it came
time to render) and it too was much slower than String (unfortunately,
I didn't record a patch with these changes so I can't just go back and
play with it anymore as I reverted them all :s). Maybe Bryan can chime in with some best-practices for using Text? Here's my guess at an explanation for what's happening in my benchmark: text will clearly beat String in memory usage, that's what it's designed
for. However, the compiler is still generating String values which are being
encoded to Text as runtime.
Now, this is the same process for bytestrings. However, bytestrings never
have to be decoded: the IO routines simply read the character buffer. In the
case of text, however, the encoded data must be decoded again to a
bytestring.
In other words, here's what I think the three different benchmarks are
really doing:
* String: generates a list of Strings, passes each String to a relatively
inefficient IO routine.
* ByteString: encodes Strings one by one into ByteStrings, generates a list
of these ByteStrings, and passes each ByteString to a very efficient IO
routine.
: Text: encodes Strings one by one into Texts, generates a list of these
Texts, calls a UTF-8 decoding function to decode each Text into a
ByteString, and passes each resulting ByteString to a very efficient IO
routine.
In the case of ASCII data to be output as UTF-8, uses the
Data.ByteString.Char8.pack function will most likely always be the most
efficient choice, and thus it seems like something BlazeHtml should support.
I'm considering releasing a Hamlet 0.3 based entirely on UTF-8 encoded
ByteStrings, but I'd also like to hear from Bryan about this.
Michael

On Thu, May 27, 2010 at 10:53 AM, Michael Snoyman
In other words, here's what I think the three different benchmarks are really doing:
* String: generates a list of Strings, passes each String to a relatively inefficient IO routine. * ByteString: encodes Strings one by one into ByteStrings, generates a list of these ByteStrings, and passes each ByteString to a very efficient IO routine. : Text: encodes Strings one by one into Texts, generates a list of these Texts, calls a UTF-8 decoding function to decode each Text into a ByteString, and passes each resulting ByteString to a very efficient IO routine.
If Text used UTF-8 internally rather than UTF-16 we could create Texts from string literals much more efficiently, in the same manner as done in Char8.pack for bytestrings: {-# RULES "FPS pack/packAddress" forall s . pack (unpackCString# s) = inlinePerformIO (B.unsafePackAddress s) #-} This rule skips the creation of an intermediate String when packing a string literal by having the created ByteString point directly to the memory GHC allocates (outside the heap) for the string literal. This rule could be added directly to a builder monoid for lazy Texts so that no copying is done at all. In addition, if Text was internally represented using UTF-8 encodeUtf8 would be free. Johan

On Thu, May 27, 2010 at 10:23 AM, Michael Snoyman
On Thu, May 27, 2010 at 11:16 AM, Ivan Miljenovic < ivan.miljenovic@gmail.com> wrote:
Wow, I find it rather surprising that String out-performs Text; any idea why that is? I wonder if you're just using it wrong...
Could be, I'd be very happy if that were the case. All of the benchmarks are available on Github, and the bytestring[1], text[2] and string[3] versions are all rather short.
Do you include the cost of encoding the result as e.g. UTF-8? The hope would be that the more compact Text would be faster to traverse, and thus encode, than the list based String.

On Thu, May 27, 2010 at 12:57 PM, Johan Tibell
On Thu, May 27, 2010 at 10:23 AM, Michael Snoyman
wrote: On Thu, May 27, 2010 at 11:16 AM, Ivan Miljenovic < ivan.miljenovic@gmail.com> wrote:
Wow, I find it rather surprising that String out-performs Text; any idea why that is? I wonder if you're just using it wrong...
Could be, I'd be very happy if that were the case. All of the benchmarks are available on Github, and the bytestring[1], text[2] and string[3] versions are all rather short.
Do you include the cost of encoding the result as e.g. UTF-8? The hope would be that the more compact Text would be faster to traverse, and thus encode, than the list based String.
No, but this is done on purpose. One of my goals in this benchmark was to determine whether I should consider switching Hamlet to ByteStrings. If I were to do so, then the UTF-8 encoding would be done at compile-time instead of run-time.
You're correct that a fair comparison would be to UTF-8 encode the Strings as well. However, that's not what most users are going to do most of the time: when dealing with ASCII data, a straight Char8.pack encoding will do the same as UTF-8. I'm simply pointing out that I think Blaze should support this style. Michael

** Advertisement ** Have you tried the library I have written, Data.Rope ? ** End of advertisement ** The algorithmic complexity of most operations on ropes is way better than on bytestrings : log n for all operations, except traversals, of course. Cheers, PE El 27/05/2010, a las 06:01, Michael Snoyman escribió:
On Thu, May 27, 2010 at 12:57 PM, Johan Tibell
wrote: On Thu, May 27, 2010 at 10:23 AM, Michael Snoyman wrote: On Thu, May 27, 2010 at 11:16 AM, Ivan Miljenovic wrote: Wow, I find it rather surprising that String out-performs Text; any idea why that is? I wonder if you're just using it wrong... Could be, I'd be very happy if that were the case. All of the benchmarks are available on Github, and the bytestring[1], text[2] and string[3] versions are all rather short.
Do you include the cost of encoding the result as e.g. UTF-8? The hope would be that the more compact Text would be faster to traverse, and thus encode, than the list based String.
No, but this is done on purpose. One of my goals in this benchmark was to determine whether I should consider switching Hamlet to ByteStrings. If I were to do so, then the UTF-8 encoding would be done at compile-time instead of run-time.
You're correct that a fair comparison would be to UTF-8 encode the Strings as well. However, that's not what most users are going to do most of the time: when dealing with ASCII data, a straight Char8.pack encoding will do the same as UTF-8. I'm simply pointing out that I think Blaze should support this style.
Michael _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Thu, May 27, 2010 at 2:44 PM, Pierre-Etienne Meunier
** Advertisement ** Have you tried the library I have written, Data.Rope ? ** End of advertisement ** The algorithmic complexity of most operations on ropes is way better than on bytestrings : log n for all operations, except traversals, of course. Cheers, PE
How is a Data.Rope.Rope different from a Data.Sequence.Seq Char? --Max

About as much different as a Data.ByteString from an Array Int Char : there are several optimizations over a Data.Sequence.Seq that are specific to characters, for instance file IO using mmap, and the use of blocks (which would have been possible with about any constant size "Storable" type, of course). Moreover, there are tricks in a Data.Rope to hide the mutability of the blocks, which I do not believe appear in Data.Sequence : you can modify a rope once without copying anything (in incomplete blocks for instance). Of course referential transparency is preserved : if you remodify it afterwards, something will have to get copied, but I found it most useful when building large strings before writing them to files. And last but not least, a data.sequence.seq char is called rope in the litterature ;-) Anyway, good point, cause now I want to benchmark it against Data.Sequence. PE El 28/05/2010, a las 14:22, Max Rabkin escribió:
On Thu, May 27, 2010 at 2:44 PM, Pierre-Etienne Meunier
wrote: ** Advertisement ** Have you tried the library I have written, Data.Rope ? ** End of advertisement ** The algorithmic complexity of most operations on ropes is way better than on bytestrings : log n for all operations, except traversals, of course. Cheers, PE
How is a Data.Rope.Rope different from a Data.Sequence.Seq Char?
--Max

Michael Snoyman
* If you're standardizing on UTF-8, why not support bytestrings?
+1
I'm aware that a user could shoot him/herself in the foot by passing in non-UTF8 data, but I would imagine the performance gains would outweigh this.
Wrap them in a (new)type? -k -- If I haven't seen further, it is by standing in the footprints of giants

Q14: Do you see any problems with respect to integrating BlazeHtml in your favourite web-framework/server? How about also providing an enumerator back-end? http://hackage.haskell.org/packages/archive/iteratee/0.3.5/doc/html/Data-Ite... Then your library can integrate more easily with the snap framework: http://snapframework.com Regards, Bas

Hey Bas,
How about also providing an enumerator back-end? http://hackage.haskell.org/packages/archive/iteratee/0.3.5/doc/html/Data-Ite...
Then your library can integrate more easily with the snap framework: http://snapframework.com
Sure, I can do that. But I already tested integration with the snap
framework, the best path here seems to call the `writeLBS` function
from the snap framework on the `L.ByteString` that BlazeHtml produces
(`writeLBS` internally uses an enumerator).
Kind regards,
Jasper Van der Jeugt
On Thu, May 27, 2010 at 10:38 AM, Bas van Dijk
Q14: Do you see any problems with respect to integrating BlazeHtml in your favourite web-framework/server?
How about also providing an enumerator back-end? http://hackage.haskell.org/packages/archive/iteratee/0.3.5/doc/html/Data-Ite...
Then your library can integrate more easily with the snap framework: http://snapframework.com
Regards,
Bas

On Thu, May 27, 2010 at 10:48 AM, Jasper Van der Jeugt
How about also providing an enumerator back-end? http://hackage.haskell.org/packages/archive/iteratee/0.3.5/doc/html/Data-Ite...
Then your library can integrate more easily with the snap framework: http://snapframework.com
Sure, I can do that. But I already tested integration with the snap framework, the best path here seems to call the `writeLBS` function from the snap framework on the `L.ByteString` that BlazeHtml produces (`writeLBS` internally uses an enumerator).
I think it's worth analyzing if using enumerators directly gives a significant performance improvement over converting from lazy ByteStrings. Looking a the conversion functions in the snap framework it looks like we can avoid some intermediate lists for one: writeLBS :: L.ByteString -> Snap () writeLBS s = addToOutput $ enumLBS s addToOutput :: (forall a . Enumerator a) -> Snap () addToOutput enum = modifyResponse $ modifyResponseBody (>. enum) enumLBS :: (Monad m) => L.ByteString -> Enumerator m a enumLBS lbs iter = foldM k iter enums where enums = map (enumPure1Chunk . WrapBS) $ L.toChunks lbs k i e = e i -- from iteratee: enumPure1Chunk :: (SC.StreamChunk s el, Monad m) => s el -> EnumeratorGM s el m a enumPure1Chunk str iter = runIter iter (Chunk str) >>= checkIfDone return Regards, Bas

As a user, I have too many HTML generators, a few of them with Ajax and none
with server-side event handling (like ASPX or JSPX). Ajax is complicated
but server side event handling is what I really miss because it is simple
from the user point of view, my ervents could be handled in haskell code
rather than in javaScript and I implicitly could use the advantages of
dinamic HTML and Ajax without the need to know them at all.
Imagine a dynamic Web application with 100% haskell code made with dynamic
widgets created by third party developers.
So, anyone want to create a HTML templating system with server side event
handling? It is not terribly hard to do. (I refer to ASP.NET documentation
or the JavaServer Faces framework).
By the way, I vote for XML templating or else, combinator templating that
produce XHML templating because it can be handled by a future graphical IDE.
2010/5/27 Jasper Van der Jeugt
Hey Bas,
How about also providing an enumerator back-end?
http://hackage.haskell.org/packages/archive/iteratee/0.3.5/doc/html/Data-Ite...
Then your library can integrate more easily with the snap framework: http://snapframework.com
Sure, I can do that. But I already tested integration with the snap framework, the best path here seems to call the `writeLBS` function from the snap framework on the `L.ByteString` that BlazeHtml produces (`writeLBS` internally uses an enumerator).
Kind regards, Jasper Van der Jeugt
On Thu, May 27, 2010 at 10:38 AM, Bas van Dijk
wrote: Q14: Do you see any problems with respect to integrating BlazeHtml in your favourite web-framework/server?
How about also providing an enumerator back-end?
http://hackage.haskell.org/packages/archive/iteratee/0.3.5/doc/html/Data-Ite...
Then your library can integrate more easily with the snap framework: http://snapframework.com
Regards,
Bas
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
participants (9)
-
Alberto G. Corona
-
Bas van Dijk
-
Ivan Miljenovic
-
Jasper Van der Jeugt
-
Johan Tibell
-
Ketil Malde
-
Max Rabkin
-
Michael Snoyman
-
Pierre-Etienne Meunier