Re: [Haskell-cafe] blaze-builder and FlexibleInstances in code that aims to become part of the Haskell platform

20 May 2011

      2011/5/19 Antoine Latter :
...
On Thu, May 19, 2011 at 3:06 PM, Simon Meier  wrote:
...
The core problem that drove me towards this solution is the abundance
of different IntX and WordX types. Each of them requiring a separate
Write for big-endian, little-endian, host-endian, lower-case-hex, and
uper-case-hex encodings; i.e., currently, there are
int8BE   :: Write Int8
int16BE :: Write Int16
int32BE :: Write Int32
...
hexLowerInt8 :: Write Int8
...
and so on. As you can see
(http://hackage.haskell.org/packages/archive/blaze-builder/0.3.0.1/doc/html/B...)
this approach clutters the public API quite a bit. Hence, I'm thinking
of using a separate type-class for each encoding; i.e.,
If Johan's work on Data.Binary and rewrite rules works out, then it
would cut the exposed API in half, which helps.
We could then use the module and package system to further keep the
API clean, with builders which output a specific encoding could live
in separate modules. This could also keep the names of the functions
short, as well.
That would require coming up with logical divisions for the functions
you're creating, and I don't understand the big picture enough to help
with that.
...
 class BigEndian a where
   bigEndian :: Write a
This collapses the big-endian encodings of all 10 bounded-size (signed
and unsigned) integer types under a single name with a well-defined
semantics. Moreover, it's standard Haskell 98. For the hex-encodings,
I'm thinking about providing type-classes
 class HexLower a where
   hexLower :: Write a
 class HexLowerNoLead a where
   hexLowerNoLead :: Write a
 ...
for ASCII encoding and each of the standard Unicode encodings in a
separate module. The user can then select the right ones using
qualified imports. In most cases, he won't even need qualification, as
mixing different character encodings is seldomly used.
I think we may be at cross-purposes here, and might not even be
discussing the same thing - I would imagine that any sort of 'Builder'
type included in the bytestring package would only provide the core
combinators for packing data into low-level binary formats, so
discussions about text encoding issues, converting to hexidecimal and
Html escaping are going above my head.
This seems like what the 'text' package was written for - to separate
out the construction of textual data from choosing its encoding.
Are there use-cases where the 'text' package is too slow for this sort
of approach?
Take care,
Antoine
...
What do you think about such an interface? Is there another catch
hidden, I'm not seeing? BTW, note that Writes are a pure compile time
abstraction and are thought to be completely inlined. In typical, uses
cases there's no efficiency overhead stemming from these typeclasses.
best regards,
Simon
Yes, for example using the current 'text' package is sup-optimal for
dyamically generating UTF-8 encoded HTML pages. The job is simple: the
data which is originally held in standard Haskell types (e.g., String)
needs to be HTML escaped and UTF-8 encoded and sprinkled with tags in
between.

For blaze-html using blaze-builder the cost for a tag is a memcpy of
the corresponding tag and the cost for a single character is one call
to the nested case statement determining if the char needs to be
escaped (one memcpy of its escaped version) or what bytes need to be
written for UTF-8 encoding the char. This solution works with a single
output buffer.

For a solution using the text library the cost of creating the
underlying UTF-16 array is similar to the cost for blaze-builder.
However, you now also need to UTF-8 encode the UTF-16 array. This
costs you more than double, as now you also have to inspect every
character of every tag. For ~50% of your data you suddenly have to
spend a lot more effort!

I agree that the text library is a good choice for representing
Unicode data of an application. However, for high-performance
applications it pays off to think of its output in binary form and
exploit the offered shortcuts. That's where blaze-builder and the like
come in.

thanks for your input,
Simon