Re: [Haskell-cafe] blaze-builder and FlexibleInstances in code that aims to become part of the Haskell platform

19 May 2011

      Hi Antoine, thanks for your feedback.

2011/5/18 Antoine Latter :
...
On Wed, May 18, 2011 at 12:32 PM, Simon Meier  wrote:
...
Hello Haskell-Cafe,
...
There are many providers of Writes. Each bounded-length-encoding of a
standard Haskell value is likely to have a corresponding Write. For
example, encoding an Int32 as a big-endian, little-endian, and
host-endian byte-sequence is currently achieved with the following
three functions.
 writeInt32BE :: Write Int32
 writeInt32LE :: Write Int32
 writeInt32HE :: Write Int32
I would like to avoid naming all these encodings individually.
Especially, as the situation becomes worse for more elaborate
encodings like hexadecimal encodings. There, we encounter encodings
like the utf8-encoding of the hexadecimal-encoding with lower-case
letters of an Int32.
 writeInt32HexLowerUtf8 :: Write Int32
I really don't like that. Therefore, I'm thinking about the following
solution based on type-classes. We introduce a single typeclass
 class Writable a where
     write :: Write a
and use a bunch of newtypes to denote our encodings.
 newtype Ascii7   a = Ascii7   { unAscii7   :: a }
 newtype Utf8     a = Utf8     { unUtf8     :: a }
 newtype HexUpper a = HexUpper { unHexUpper :: a }
 newtype HexLower a = HexLower { unHexLower :: a }
 ...
Assuming FlexibleInstnaces, we can write encodings like the above
hex-encoding as instances
 instance Write (Utf8 (HexLower Int32)) where
   write = ...
This composes rather nicely and allows the implementations to exploit
special properties of the involved data. For example, if we also had a
HTML escaping marker
 newtype Html     a = Html     { unHtml     :: a }
Then, the instance
 instance Write (Utf8 (HTML (HexLower Int32))) where
   write (Utf8 (HTML (HexLower i))) = write (Utf8 (HexLower i))
If I were authoring the above code, I don't see why that code is any
easier to write or easier to read than:
...
urf8HtmlHexLower i = utf8HexLower i
And if I were using the encoding functions, I would much prefer to see:
...
urf8HtmlHexLower magicNumber
In my code, instead of:
...
write $ Utf8 $ HTML $ HexLower magicNumber
In addition, this would be difficult for me as a developer using the
proposed library, because I would have no way to know which
combinations of newtypes are valid from reading the haddocks.
Maybe I'm missing something fundamental, but this approach seems more
cumbersome to me as a library author (more boilerplate) and as the
user of the library (less clarity in the docs and in the resultant
code).
Hmm, that's a valid point you raise here. Especially, the
documentation issue bothers me.

The core problem that drove me towards this solution is the abundance
of different IntX and WordX types. Each of them requiring a separate
Write for big-endian, little-endian, host-endian, lower-case-hex, and
uper-case-hex encodings; i.e., currently, there are

int8BE   :: Write Int8
int16BE :: Write Int16
int32BE :: Write Int32
...
hexLowerInt8 :: Write Int8
...

and so on. As you can see
(http://hackage.haskell.org/packages/archive/blaze-builder/0.3.0.1/doc/html/B...)
this approach clutters the public API quite a bit. Hence, I'm thinking
of using a separate type-class for each encoding; i.e.,

  class BigEndian a where
    bigEndian :: Write a

This collapses the big-endian encodings of all 10 bounded-size (signed
and unsigned) integer types under a single name with a well-defined
semantics. Moreover, it's standard Haskell 98. For the hex-encodings,
I'm thinking about providing type-classes

  class HexLower a where
    hexLower :: Write a

  class HexLowerNoLead a where
    hexLowerNoLead :: Write a

  ...

for ASCII encoding and each of the standard Unicode encodings in a
separate module. The user can then select the right ones using
qualified imports. In most cases, he won't even need qualification, as
mixing different character encodings is seldomly used.

What do you think about such an interface? Is there another catch
hidden, I'm not seeing? BTW, note that Writes are a pure compile time
abstraction and are thought to be completely inlined. In typical, uses
cases there's no efficiency overhead stemming from these typeclasses.

best regards,
Simon