Adding a builder to the "bytestring" package?

Most people who work with binary data have had to construct bytestrings at some point. The most common solution is to use a "Builder", a monoid representing how to construct a bytestring. There are currently three packages (that I know of) which include builder implementations: binary, cereal, and blaze-builder. However, all of these libraries have additional dependencies beyond just "bytestring". All three depend on "array" and "containers", and blaze-builder additionally depends on "text" (and thus "deepseq"). Since the current implementation of GHC uses static linking, every additional dependency adds to the final size of a binary. Obviously the "Builder" concept is very useful, as it has been implemented at least three times. How about adding it to the "bytestring" package itself? We could have a module Data.ByteString.Builder, with functions (at minimum): toByteString :: Builder -> Data.ByteString.ByteString toLazyByteString :: Builder -> Data.ByteString.Lazy.ByteString fromByteString :: Data.ByteString.ByteString -> Builder fromLazyByteString :: Data.ByteString.Lazy.ByteString -> Builder empty :: Builder append :: Builder -> Builder -> Builder Plus whatever implementation details might be useful to expose. Existing libraries could then add their extra features (word -> builder for binary and cereal, UTF/HTTP for blaze-builder) on top of the existing types. Is this something the community is interested in? Is there any work currently aimed at this goal?

On Wed, Jan 19, 2011 at 6:51 PM, John Millikin
Most people who work with binary data have had to construct bytestrings at some point. The most common solution is to use a "Builder", a monoid representing how to construct a bytestring. There are currently three packages (that I know of) which include builder implementations: binary, cereal, and blaze-builder.
However, all of these libraries have additional dependencies beyond just "bytestring". All three depend on "array" and "containers", and blaze-builder additionally depends on "text" (and thus "deepseq"). Since the current implementation of GHC uses static linking, every additional dependency adds to the final size of a binary.
Obviously the "Builder" concept is very useful, as it has been implemented at least three times. How about adding it to the "bytestring" package itself? We could have a module Data.ByteString.Builder, with functions (at minimum):
toByteString :: Builder -> Data.ByteString.ByteString toLazyByteString :: Builder -> Data.ByteString.Lazy.ByteString
fromByteString :: Data.ByteString.ByteString -> Builder fromLazyByteString :: Data.ByteString.Lazy.ByteString -> Builder
empty :: Builder append :: Builder -> Builder -> Builder
Plus whatever implementation details might be useful to expose.
Existing libraries could then add their extra features (word -> builder for binary and cereal, UTF/HTTP for blaze-builder) on top of the existing types.
Is this something the community is interested in? Is there any work currently aimed at this goal?
I think both Duncan and I agree that we should move Data.Binary.Builder (which doesn't have any extra dependencies) to bytestring. I've already added Data.Text.Lazy.Builder to text. Johan

On Wed, Jan 19, 2011 at 9:32 PM, Johan Tibell
On Wed, Jan 19, 2011 at 6:51 PM, John Millikin
wrote: Most people who work with binary data have had to construct bytestrings at some point. The most common solution is to use a "Builder", a monoid representing how to construct a bytestring. There are currently three packages (that I know of) which include builder implementations: binary, cereal, and blaze-builder.
However, all of these libraries have additional dependencies beyond just "bytestring". All three depend on "array" and "containers", and blaze-builder additionally depends on "text" (and thus "deepseq"). Since the current implementation of GHC uses static linking, every additional dependency adds to the final size of a binary.
Obviously the "Builder" concept is very useful, as it has been implemented at least three times. How about adding it to the "bytestring" package itself? We could have a module Data.ByteString.Builder, with functions (at minimum):
toByteString :: Builder -> Data.ByteString.ByteString toLazyByteString :: Builder -> Data.ByteString.Lazy.ByteString
fromByteString :: Data.ByteString.ByteString -> Builder fromLazyByteString :: Data.ByteString.Lazy.ByteString -> Builder
empty :: Builder append :: Builder -> Builder -> Builder
Plus whatever implementation details might be useful to expose.
Existing libraries could then add their extra features (word -> builder for binary and cereal, UTF/HTTP for blaze-builder) on top of the existing types.
Is this something the community is interested in? Is there any work currently aimed at this goal?
I think both Duncan and I agree that we should move Data.Binary.Builder (which doesn't have any extra dependencies) to bytestring. I've already added Data.Text.Lazy.Builder to text.
Johan
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Isn't Simon Meier working on migrating his code from blaze-builder into binary? I agree with John that it would make more sense to go in bytestring. Assuming that happens, would the builder from text end up being based on it? Michael

On Wed, Jan 19, 2011 at 8:37 PM, Michael Snoyman
Isn't Simon Meier working on migrating his code from blaze-builder into binary?
So I heard (although not directly from Simon). I think it would be nice to port the blaze-builder implementation to binary, but to keep binaries current interface (for now).
I agree with John that it would make more sense to go in bytestring. Assuming that happens, would the builder from text end up being based on it?
ByteString and Text don't share an underlying data structure at the moment (one uses pinned ForeignPtrs and one unpinned ByteArray#s) so they can use the same builder efficiently. Some day perhaps. Johan

On Wed, Jan 19, 2011 at 12:04, Johan Tibell
On Wed, Jan 19, 2011 at 8:37 PM, Michael Snoyman
wrote: Isn't Simon Meier working on migrating his code from blaze-builder into binary?
So I heard (although not directly from Simon). I think it would be nice to port the blaze-builder implementation to binary, but to keep binaries current interface (for now).
From the perspective of a library user, if there's to be code shuffling, I'd much rather have it be one-time (blaze-builder -> bytestring) than having multiple merges/ports going on. Especially since building bytestrings is a much more generic operation than binary serialisation.
Regarding the interface, I think that as long as the same *basic* operations are available, it's fine to have extra operations as well. Several of blaze-builder's special-case functions (toByteStringIO , fromWriteList) allow more efficient operation than the generic interface.
I agree with John that it would make more sense to go in bytestring. Assuming that happens, would the builder from text end up being based on it?
ByteString and Text don't share an underlying data structure at the moment (one uses pinned ForeignPtrs and one unpinned ByteArray#s) so they can use the same builder efficiently. Some day perhaps.
Can any of the blaze-builder optimizations be translated to the Text builder? When I benchmark it against binary and cereal, blaze-builder is approximately 2-3 times faster for most use cases.

On Wed, Jan 19, 2011 at 9:51 PM, John Millikin
Can any of the blaze-builder optimizations be translated to the Text builder? When I benchmark it against binary and cereal, blaze-builder is approximately 2-3 times faster for most use cases.
Yes, but I haven't had time to do so yet. I even wrote a blog post about not having time to do it. ;) http://blog.johantibell.com/2011/01/haskell-library-improvements-id-like-to....

On Wed, Jan 19, 2011 at 10:04 PM, Johan Tibell
On Wed, Jan 19, 2011 at 8:37 PM, Michael Snoyman
wrote: Isn't Simon Meier working on migrating his code from blaze-builder into binary?
So I heard (although not directly from Simon). I think it would be nice to port the blaze-builder implementation to binary, but to keep binaries current interface (for now).
What's the advantage to moving in into binary as opposed to bytestring?
I agree with John that it would make more sense to go in bytestring. Assuming that happens, would the builder from text end up being based on it?
ByteString and Text don't share an underlying data structure at the moment (one uses pinned ForeignPtrs and one unpinned ByteArray#s) so they can use the same builder efficiently. Some day perhaps.
I hadn't realized, I had just assumed it was using ByteString underneath. I'll pay more attention next time, thanks for informing me :). Michael

On Wed, Jan 19, 2011 at 14:06, Johan Tibell
On Wed, Jan 19, 2011 at 10:30 PM, Michael Snoyman
wrote: What's the advantage to moving in into binary as opposed to bytestring?
To test that the implementation can indeed be ported to that interface. We could of course skip that step if we want to.
blaze-builder already implements the "binary" builder interface, minus the putWord* functions. I think those would be trivial to reimplement on top of Write. Since it sounds like everyone agrees with / has already thought of moving Builder into bytestring, I'll start poking at a patch. Who is the current patch-reviewer for binary and bytestring?

On Thu, Jan 20, 2011 at 12:16 AM, John Millikin
blaze-builder already implements the "binary" builder interface, minus the putWord* functions. I think those would be trivial to reimplement on top of Write.
Since it sounds like everyone agrees with / has already thought of moving Builder into bytestring, I'll start poking at a patch. Who is the current patch-reviewer for binary and bytestring?
I'd suggest addressing the patch to Don Stewart, Duncan Coutts, and Lennart Kolmodin. Johan

Patch done and sent to the bytestring maintainers. For the interested,
here's the benchmark chart for binary, cereal, and
blaze-builder/bytestring:
http://i.imgur.com/xw3TL.png
On Wed, Jan 19, 2011 at 15:30, Johan Tibell
On Thu, Jan 20, 2011 at 12:16 AM, John Millikin
wrote: blaze-builder already implements the "binary" builder interface, minus the putWord* functions. I think those would be trivial to reimplement on top of Write.
Since it sounds like everyone agrees with / has already thought of moving Builder into bytestring, I'll start poking at a patch. Who is the current patch-reviewer for binary and bytestring?
I'd suggest addressing the patch to Don Stewart, Duncan Coutts, and Lennart Kolmodin.
Johan

On 24 January 2011 07:29, John Millikin
Patch done and sent to the bytestring maintainers. For the interested, here's the benchmark chart for binary, cereal, and blaze-builder/bytestring:
Can has units? Conrad.
On Wed, Jan 19, 2011 at 15:30, Johan Tibell
wrote: On Thu, Jan 20, 2011 at 12:16 AM, John Millikin
wrote: blaze-builder already implements the "binary" builder interface, minus the putWord* functions. I think those would be trivial to reimplement on top of Write.
Since it sounds like everyone agrees with / has already thought of moving Builder into bytestring, I'll start poking at a patch. Who is the current patch-reviewer for binary and bytestring?
I'd suggest addressing the patch to Don Stewart, Duncan Coutts, and Lennart Kolmodin.
Johan
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

No units -- I generated the chart with Progression, which by default
normalises the data so the first library (here, "binary") in each
benchmark is 1.0. It can also generate absolute-time charts:
Runtime in seconds, grouped by benchmark: http://i.imgur.com/f0EOa.png
Runtime in seconds, grouped by library: http://i.imgur.com/PXW97.png
Benchmark source files attached, if you'd like to poke at them.
On Sun, Jan 23, 2011 at 17:21, Conrad Parker
On 24 January 2011 07:29, John Millikin
wrote: Patch done and sent to the bytestring maintainers. For the interested, here's the benchmark chart for binary, cereal, and blaze-builder/bytestring:
Can has units?
Conrad.
participants (4)
-
Conrad Parker
-
Johan Tibell
-
John Millikin
-
Michael Snoyman