The state of binary (de)serialization

newer
cabal install ghc-mod installs 3...

Nicolas Trangez

25 Feb 2013 25 Feb '13

12:30 p.m.

All, In order to implement some network protocol clients recently, I needed binary serialization of commands and deserialization of responses ('Command -> ByteString' and 'ByteString -> Response' functions, preferably for both strict as well as lazy ByteStrings). My go-to packages have always been 'binary' and 'cereal', but I was wondering about the current (and future) state/goals: - cereal supports chunk-based 'partial' parsing (runGetPartial). It looks like support for this is introduced in recent versions of 'binary' as well (runGetIncremental) - cereal can output a strict bytestring (runPut) or a lazy one (runPutLazy), whilst binary only outputs lazy ones (runPut) - Next to binary and cereal, there's bytestring's Builder interface for serialization, and Simon Meier's "blaze-binary" prototype There are some blog posts and comments out there about merging cereal and binary, is this what's the goal/going on (cfr runGetIncremental)? In my use-case I think using Builder instead of binary/cereal's PutM monad shouldn't be a major problem. Is this advisable performance-wise? Overall: what's the advised future-proof strategy of handling binary (de)serialization? Thanks, Nicolas

Show replies by date

Johan Tibell

25 Feb 25 Feb

7:59 p.m.

On Mon, Feb 25, 2013 at 4:30 AM, Nicolas Trangez wrote:

...

- cereal supports chunk-based 'partial' parsing (runGetPartial). It looks like support for this is introduced in recent versions of 'binary' as well (runGetIncremental)

Yes. Binary now support an incremental interface. We intend to make sure binary has all the same functionality as cereal. We'd like to move away from having two packages if possible and since binary has the larger installed user base we're trying to make that the go-to package.

...

- cereal can output a strict bytestring (runPut) or a lazy one (runPutLazy), whilst binary only outputs lazy ones (runPut)

The lazy one is more general and you can use toStrict (from bytestring) to get a strict ByteString from a lazy one, without loss of performance.

...

- Next to binary and cereal, there's bytestring's Builder interface for serialization, and Simon Meier's "blaze-binary" prototype

Simon's builder (originally developed in blaze-binary) has been merged into the bytestring package. In the future binary will just re-export that builder.

...

There are some blog posts and comments out there about merging cereal and binary, is this what's the goal/going on (cfr runGetIncremental)?

It's most definitely the goal and it's basically done. The only thing I don't think we'll adopt from cereal is the instances from container types.

...

In my use-case I think using Builder instead of binary/cereal's PutM monad shouldn't be a major problem. Is this advisable performance-wise?

You can go ahead and use the builder directly if you like.

...

Overall: what's the advised future-proof strategy of handling binary (de)serialization?

Use binary or the builder from bytestring whenever you can. Since the builder in bytestring was recently added you might have to fall back to blaze-builder if you believe your users can't rely on the latest version of bytestring. -- Johan

Ozgun Ataman

8:15 p.m.

On Monday, February 25, 2013 at 2:59 PM, Johan Tibell wrote:

...

On Mon, Feb 25, 2013 at 4:30 AM, Nicolas Trangez wrote:

...
- cereal supports chunk-based 'partial' parsing (runGetPartial). It looks like support for this is introduced in recent versions of 'binary' as well (runGetIncremental)

Yes. Binary now support an incremental interface. We intend to make sure binary has all the same functionality as cereal. We'd like to move away from having two packages if possible and since binary has the larger installed user base we're trying to make that the go-to package. As a minor side note: Just wanted to point out that safecopy (http://hackage.haskell.org/package/safecopy) provides a nice migration framework for production use cases and is based on cereal. Migration becomes an issue in production sooner or later, and I think safecopy is a nice alternative to the approach Google's protocol buffers takes (for example). For eventual unification on binary, it may be a good idea to port it (and other useful libs that build on cereal) as well.

...
- cereal can output a strict bytestring (runPut) or a lazy one (runPutLazy), whilst binary only outputs lazy ones (runPut)

The lazy one is more general and you can use toStrict (from bytestring) to get a strict ByteString from a lazy one, without loss of performance.

...
- Next to binary and cereal, there's bytestring's Builder interface for serialization, and Simon Meier's "blaze-binary" prototype

Simon's builder (originally developed in blaze-binary) has been merged into the bytestring package. In the future binary will just re-export that builder.

...
There are some blog posts and comments out there about merging cereal and binary, is this what's the goal/going on (cfr runGetIncremental)?

It's most definitely the goal and it's basically done. The only thing I don't think we'll adopt from cereal is the instances from container types.

...
In my use-case I think using Builder instead of binary/cereal's PutM monad shouldn't be a major problem. Is this advisable performance-wise?

You can go ahead and use the builder directly if you like.

...
Overall: what's the advised future-proof strategy of handling binary (de)serialization?

Use binary or the builder from bytestring whenever you can. Since the builder in bytestring was recently added you might have to fall back to blaze-builder if you believe your users can't rely on the latest version of bytestring.

-- Johan

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org (mailto:Haskell-Cafe@haskell.org) http://www.haskell.org/mailman/listinfo/haskell-cafe

Alexander Solla

26 Feb 26 Feb

12:51 a.m.

On Mon, Feb 25, 2013 at 11:59 AM, Johan Tibell wrote:

...

There are some blog posts and comments out there about merging cereal

...
and binary, is this what's the goal/going on (cfr runGetIncremental)?

It's most definitely the goal and it's basically done. The only thing I don't think we'll adopt from cereal is the instances from container types.

Why not? Those instances are useful. Without instances defined in binary/cereal, pretty much every Happstack (or, better said, every ixset/acidstate/safecopy stack) user will have to have orphan instances. Also, cereal has a generic instance. Will the new binary?

Johan Tibell

1:06 a.m.

On Mon, Feb 25, 2013 at 4:51 PM, Alexander Solla wrote:

...

On Mon, Feb 25, 2013 at 11:59 AM, Johan Tibell wrote:

...
There are some blog posts and comments out there about merging cereal

...
and binary, is this what's the goal/going on (cfr runGetIncremental)?

It's most definitely the goal and it's basically done. The only thing I don't think we'll adopt from cereal is the instances from container types.

Why not? Those instances are useful. Without instances defined in binary/cereal, pretty much every Happstack (or, better said, every ixset/acidstate/safecopy stack) user will have to have orphan instances.

I will have to give a bit more context to answer this one. After the binary package was created we've realized that it should really have been two packages: * One package for serialization and deserialization of basic types, that have a well-defined serialization format even outside the package e.g. little and big endian integers, IEEE floats, etc. This package would correspond to Data.Binary.Get, Data.Binary.Builder, and Data.Binary.Put. * One package that defines a particular binary format useful for serializing arbitrary Haskell values. This package would correspond to Data.Binary. For the latter we need to decide what guarantees we make. For example, is the format stable between releases? Is the format public (such that other libraries can parse the output of binary)? Right now these two questions are left unanswered in both binary and cereal, making those packages less useful. Before we answer those questions we don't want to 1) add more dependencies to binary and 2) define serialization formats that we might break in the next release. So perhaps once we've settled these issues we'll include instances for containers. Also, cereal has a generic instance. Will the new binary?

...

That sounds reasonable. If someone sends a pull request Lennart or I will review and merge it. -- Johan

Alexander V Vershilov

5:33 a.m.

...

That sounds reasonable. If someone sends a pull request Lennart or I will review and merge it.

Doesn't binary already have it? http://hackage.haskell.org/packages/archive/binary/0.6.4.0/doc/html/Data-Bin... On 26 February 2013 05:06, Johan Tibell wrote:

...

On Mon, Feb 25, 2013 at 4:51 PM, Alexander Solla wrote:

...
On Mon, Feb 25, 2013 at 11:59 AM, Johan Tibell wrote:

...
...
There are some blog posts and comments out there about merging cereal and binary, is this what's the goal/going on (cfr runGetIncremental)?

It's most definitely the goal and it's basically done. The only thing I don't think we'll adopt from cereal is the instances from container types.

Why not? Those instances are useful. Without instances defined in binary/cereal, pretty much every Happstack (or, better said, every ixset/acidstate/safecopy stack) user will have to have orphan instances.

I will have to give a bit more context to answer this one. After the binary package was created we've realized that it should really have been two packages:

* One package for serialization and deserialization of basic types, that have a well-defined serialization format even outside the package e.g. little and big endian integers, IEEE floats, etc. This package would correspond to Data.Binary.Get, Data.Binary.Builder, and Data.Binary.Put.

* One package that defines a particular binary format useful for serializing arbitrary Haskell values. This package would correspond to Data.Binary.

For the latter we need to decide what guarantees we make. For example, is the format stable between releases? Is the format public (such that other libraries can parse the output of binary)? Right now these two questions are left unanswered in both binary and cereal, making those packages less useful.

Before we answer those questions we don't want to 1) add more dependencies to binary and 2) define serialization formats that we might break in the next release.

So perhaps once we've settled these issues we'll include instances for containers.

...
Also, cereal has a generic instance. Will the new binary?

That sounds reasonable. If someone sends a pull request Lennart or I will review and merge it.

-- Johan

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

-- Alexander

Lennart Kolmodin

1 Mar 1 Mar

12:10 p.m.

Hey guys, I didn't see this thread at first, thanks to Johan for bringing it to my attention. cereal is a fork of binary, and provided a incremental interface before binary did. It also has a few additional combinators like "isolate" and "label", which is the reason why safecopy uses cereal instead of binary (at least I know it uses "label"). As an experiment, I've wrapped the api of Data.ByteString.Builder and and re-exported it as Data.Binary.Builder, but it turned out that performance got worse. I have yet to look into why. Once it all seems ok, binary will just wrap and re-export bytestrings builder. If you use binary or builder doesn't really matter, the basic APIs are very similar. builder can offer some more options if you want to spend more time in tuning for speed. binary is also already in the HP, since it is bundled with GHC (GHC depends on binary). In other words, depending on binary should be future-proof. On another note, binary-0.7 is out, get it while it's hot! :) Lennart 2013/2/26 Johan Tibell

...

On Mon, Feb 25, 2013 at 4:51 PM, Alexander Solla wrote:

...
On Mon, Feb 25, 2013 at 11:59 AM, Johan Tibell wrote:

...
There are some blog posts and comments out there about merging cereal

...
and binary, is this what's the goal/going on (cfr runGetIncremental)?

It's most definitely the goal and it's basically done. The only thing I don't think we'll adopt from cereal is the instances from container types.

Why not? Those instances are useful. Without instances defined in binary/cereal, pretty much every Happstack (or, better said, every ixset/acidstate/safecopy stack) user will have to have orphan instances.

I will have to give a bit more context to answer this one. After the binary package was created we've realized that it should really have been two packages:

* One package for serialization and deserialization of basic types, that have a well-defined serialization format even outside the package e.g. little and big endian integers, IEEE floats, etc. This package would correspond to Data.Binary.Get, Data.Binary.Builder, and Data.Binary.Put.

* One package that defines a particular binary format useful for serializing arbitrary Haskell values. This package would correspond to Data.Binary.

For the latter we need to decide what guarantees we make. For example, is the format stable between releases? Is the format public (such that other libraries can parse the output of binary)? Right now these two questions are left unanswered in both binary and cereal, making those packages less useful.

Before we answer those questions we don't want to 1) add more dependencies to binary and 2) define serialization formats that we might break in the next release.

So perhaps once we've settled these issues we'll include instances for containers.

Also, cereal has a generic instance. Will the new binary?

...
That sounds reasonable. If someone sends a pull request Lennart or I will review and merge it.

-- Johan

Vincent Hanquez

27 Feb 27 Feb

7:17 a.m.

On Mon, Feb 25, 2013 at 11:59:42AM -0800, Johan Tibell wrote:

...

...
- cereal can output a strict bytestring (runPut) or a lazy one (runPutLazy), whilst binary only outputs lazy ones (runPut)

The lazy one is more general and you can use toStrict (from bytestring) to get a strict ByteString from a lazy one, without loss of performance.

Two major problems of lazy bytestrings is that: * you can't pass it to a C bindings easily. * doing IO with it without rewriting the chunks, can sometimes (depending how the lazy bytestring has been produced) result in a serious degradation of performance calling syscalls on arbitrary and small chunks (e.g. socket's 'send'). Personally, i also like the (obvious) stricter behavior of strict bytestring. -- Vincent

Johan Tibell

5:32 p.m.

On Tue, Feb 26, 2013 at 11:17 PM, Vincent Hanquez wrote:

...

On Mon, Feb 25, 2013 at 11:59:42AM -0800, Johan Tibell wrote:

...
...
- cereal can output a strict bytestring (runPut) or a lazy one (runPutLazy), whilst binary only outputs lazy ones (runPut)

The lazy one is more general and you can use toStrict (from bytestring) to get a strict ByteString from a lazy one, without loss of performance.

Two major problems of lazy bytestrings is that:

* you can't pass it to a C bindings easily. * doing IO with it without rewriting the chunks, can sometimes (depending how the lazy bytestring has been produced) result in a serious degradation of performance calling syscalls on arbitrary and small chunks (e.g. socket's 'send').

Personally, i also like the (obvious) stricter behavior of strict bytestring.

My point was rather that all cereal does for you is to concat the lazy chunks it already has to a strict bytestring before returning them. If you want that behavior with binary just call concat yourself. The benefit of not concatenating by default is that it costs O(n) time, which you might avoid if you can consume the lazy bytestring directly (e.g. through writev).

Nicolas Trangez

28 Feb 28 Feb

10:49 a.m.

On Mon, 2013-02-25 at 11:59 -0800, Johan Tibell wrote:

...

On Mon, Feb 25, 2013 at 4:30 AM, Nicolas Trangez wrote:

...
- cereal supports chunk-based 'partial' parsing (runGetPartial). It looks like support for this is introduced in recent versions of 'binary' as well (runGetIncremental)

Yes. Binary now support an incremental interface. We intend to make sure binary has all the same functionality as cereal. We'd like to move away from having two packages if possible and since binary has the larger installed user base we're trying to make that the go-to package.

This will certainly make things more obvious (and maybe ready for HP inclusion?).

...

...
- cereal can output a strict bytestring (runPut) or a lazy one (runPutLazy), whilst binary only outputs lazy ones (runPut)

The lazy one is more general and you can use toStrict (from bytestring) to get a strict ByteString from a lazy one, without loss of performance.

Sure. Turned out I was using lazy bs' anyway so switched to 'binary' for deserialization.

...

...
- Next to binary and cereal, there's bytestring's Builder interface for serialization, and Simon Meier's "blaze-binary" prototype

Simon's builder (originally developed in blaze-binary) has been merged into the bytestring package. In the future binary will just re-export that builder.

I was referring to https://github.com/meiersi/blaze-binary

...

...
Overall: what's the advised future-proof strategy of handling binary (de)serialization?

Use binary or the builder from bytestring whenever you can. Since the builder in bytestring was recently added you might have to fall back to blaze-builder if you believe your users can't rely on the latest version of bytestring.

I switched to Builder for serialization. It seems to create 'more strict' lazy bytestrings than the cereal based code (as in: cereal seems to create a new Chunk whenever appending a lazy bytestring, whilst Builder concats them into a single chunk, at least for the short strings I've been using). The Monoidal interface feels very natural, maybe even more natural than the Monad interface of PutM in binary/cereal: instance Argument a => Argument [a] where put l = word32LE cnt <> s where (cnt, s) = foldr (\e (c, m) -> (c + 1, put e <> m)) (0, mempty) l Thanks, Nicolas

Andrew Cowie

1 Mar 1 Mar

3:11 a.m.

On Mon, 2013-02-25 at 11:59 -0800, Johan Tibell wrote:

...

Simon's builder (originally developed in blaze-binary) has been merged into the bytestring package.

I've been meaning to ask: does this mean that ByteString's concat and append functions will now be implemented in terms of Builder internally, or does one will need to use Builder exclusively until it's finally time to create a ByteString for passing to $whatever? AfC Sydney

Vincent Hanquez

27 Feb 27 Feb

6:49 a.m.

On Mon, Feb 25, 2013 at 01:30:40PM +0100, Nicolas Trangez wrote:

...

All,

In order to implement some network protocol clients recently, I needed binary serialization of commands and deserialization of responses ('Command -> ByteString' and 'ByteString -> Response' functions, preferably for both strict as well as lazy ByteStrings).

My go-to packages have always been 'binary' and 'cereal', but I was wondering about the current (and future) state/goals:

- cereal supports chunk-based 'partial' parsing (runGetPartial). It looks like support for this is introduced in recent versions of 'binary' as well (runGetIncremental) - cereal can output a strict bytestring (runPut) or a lazy one (runPutLazy), whilst binary only outputs lazy ones (runPut) - Next to binary and cereal, there's bytestring's Builder interface for serialization, and Simon Meier's "blaze-binary" prototype

There are some blog posts and comments out there about merging cereal and binary, is this what's the goal/going on (cfr runGetIncremental)?

In my use-case I think using Builder instead of binary/cereal's PutM monad shouldn't be a major problem. Is this advisable performance-wise?

Overall: what's the advised future-proof strategy of handling binary (de)serialization?

I've been looking at the same thing lately, and i've been quite surprised, to say the least, by the usual go-to packages (cereal, binary). Performance wise this is hard to summarize, but if you serialize something small and have a easy to compute size (e.g. fixed size structure), i would advise against using any kind of builder structure (builder,cereal,binary), and go directly at the Storable level, if performance need to be on-par other languages. My initial interpretation is that the builder initial cost is quite high, and only get amortized if the number of operations is quite high (and have less bytestrings). So if you have many structures encoded in one encoding operation it's probably ok-ish. I've made the following benchmark when i was doing my experiments, that shows basic serialization of bytestring-y data structures: * "bclass" is a simple function that use bytestring concat or append * "bclass+io" is a simple function that use mutable bytestring + poke to create the bytestring * "cereal" is cereal's encode function * "binary" is binary's encode function * "builder" is bytestring's builder. * simple bytestring of constant size: <sz> * n bytestrings of same size: n*<sz> * n bytestrings of different size: <sz>+<sz2>+.. * n bytestrings plus a w32 prefixed size: len+n*<sz> Obviously, caveat emptor: http://tab.snarc.org/others/benchmark-bytestring-serialization.html Let me know if anyone want the source file. -- Vincent

Nicolas Trangez

28 Feb 28 Feb

10:54 a.m.

On Wed, 2013-02-27 at 07:49 +0100, Vincent Hanquez wrote:

...

On Mon, Feb 25, 2013 at 01:30:40PM +0100, Nicolas Trangez wrote: ... I've been looking at the same thing lately, and i've been quite surprised, to say the least, by the usual go-to packages (cereal, binary). Performance wise this is hard to summarize, but if you serialize something small and have a easy to compute size (e.g. fixed size structure), i would advise against using any kind of builder structure (builder,cereal,binary), and go directly at the Storable level, if performance need to be on-par other languages.

My initial interpretation is that the builder initial cost is quite high, and only get amortized if the number of operations is quite high (and have less bytestrings). So if you have many structures encoded in one encoding operation it's probably ok-ish.

I've made the following benchmark when i was doing my experiments, that shows basic serialization of bytestring-y data structures:

* "bclass" is a simple function that use bytestring concat or append * "bclass+io" is a simple function that use mutable bytestring + poke to create the bytestring * "cereal" is cereal's encode function * "binary" is binary's encode function * "builder" is bytestring's builder.

* simple bytestring of constant size: <sz> * n bytestrings of same size: n*<sz> * n bytestrings of different size: <sz>+<sz2>+.. * n bytestrings plus a w32 prefixed size: len+n*<sz>

Obviously, caveat emptor:

http://tab.snarc.org/others/benchmark-bytestring-serialization.html

Let me know if anyone want the source file.

These are some really interesting (and very consistent) results, thanks! I guess I should do some benchmarking myself and maybe change some thing around (heck, now I'm using Builder to serialize constants :-P). It might be worth to share these benchmarks with the 'binary' and 'bytestring'/'blaze-builder' maintainers? Nicolas

4509

Age (days ago)

4513

Last active (days ago)

List overview

Download

12 comments

8 participants

participants (8)

Alexander Solla
Alexander V Vershilov
Andrew Cowie
Johan Tibell
Lennart Kolmodin
Nicolas Trangez
Ozgun Ataman
Vincent Hanquez