ANNOUNCE: binary: high performance, pure binary serialisation

dons＠cse.unsw.edu.au

26 Jan 2007 26 Jan '07

2:51 a.m.

Binary: high performance, pure binary serialisation for Haskell ---------------------------------------------------------------------- The Binary Strike Team is pleased to announce the release of a new, pure, efficient binary serialisation library for Haskell, now available from Hackage: tarball: http://hackage.haskell.org/cgi-bin/hackage-scripts/package/binary/0.2 darcs: darcs get http://darcs.haskell.org/binary haddocks: http://www.cse.unsw.edu.au/~dons/binary/Data-Binary.html The 'binary' package provides efficient serialisation of Haskell values to and from lazy ByteStrings. ByteStrings constructed this way may then be written to disk, written to the network, or further processed (e.g. stored in memory directly, or compressed in memory with zlib or bzlib). Encoding and decoding are achieved by the functions: encode :: Binary a => a -> ByteString decode :: Binary a => ByteString -> a which mirror the read/show functions. Convenience functions for serialising to disk are also provided: encodeFile :: Binary a => FilePath -> a -> IO () decodeFile :: Binary a => FilePath -> IO a To serialise your Haskell data, all you need do is write an instance of Binary for your type. For example, suppose in an interpreter we had the data type: import Data.Binary import Control.Monad data Exp = IntE Int | OpE String Exp Exp We can serialise this to bytestring form with the following instance: instance Binary Exp where put (IntE i) = putWord8 0 >> put i put (OpE s e1 e2) = putWord8 1 >> put s >> put e1 >> put e2 get = do tag <- getWord8 case tag of 0 -> liftM IntE get 1 -> liftM3 OpE get get get The binary library has been heavily tuned for performance, particularly for writing speed. Throughput of up to 160M/s has been achieved in practice, and in general speed is on par or better than NewBinary, with the advantage of a pure interface. Efforts are underway to improve performance still further. Plans are also taking shape for a parser combinator library on top of binary, for bit parsing and foreign structure parsing (e.g. network protocols). Several projects are using binary already for serialisation: lambdabot : state file serialisation hmp3 : mp3 file database hpaste.org : pastes are stored in memory as compressed bytestrings, and serialised to disk on MACID checkpoints Binary was developed by a team of 8 during the Haskell Hackathon, Hac 07, and received 200+ commits over that period. You can see the commit graph here: http://www.cse.unsw.edu.au/~dons/images/commits/community/binary-commits.png The use of QuickCheck was critical to the rapid, safe development of the library. The API was developed in conjunction with the QuickCheck properties that checked the API for sanity. We were thus able to improve performance while maintaining stability. We feel that QuickCheck should be an integral part of the development strategy for all new Haskell libraries. Don't write code without it! Binary is portable, using the foreign function interface and cpp, and is tested with Hugs and GHC. Happy hacking! The Binary Strike Team, Lennart Kolmodin Duncan Coutts Don Stewart Spencer Janssen David Himmelstrup Bjorn Bringert Ross Paterson Einar Karttunen

Show replies by date

dons＠cse.unsw.edu.au

26 Jan 26 Jan

3:07 a.m.

New subject: [Haskell] ANNOUNCE: binary: high performance, pure binary serialisation

dons:

...

Binary: high performance, pure binary serialisation for Haskell ----------------------------------------------------------------------

The Binary Strike Team is pleased to announce the release of a new, pure, efficient binary serialisation library for Haskell, now available from Hackage:

Ok, I forgot one point. It is possible to automatically derive instances of Binary for your custom types, if they inhabit Data and Typeable, using an SYB trick. Load tools/derive/BinaryDerive.hs into ghci, and bring your type into scope, then run: *Main> mapM_ putStrLn . lines $ derive (undefined :: Drinks) To have the source for the Binary instance for the type Drinks derivied for you: *Main> mapM_ putStrLn . lines $ derive (undefined :: Drinks) instance Binary Main.Drinks where put (Beer a) = putWord8 0 >> put a put Coffee = putWord8 1 put Tea = putWord8 2 put EnergyDrink = putWord8 3 put Water = putWord8 4 put Wine = putWord8 5 put Whisky = putWord8 6 get = do tag_ <- getWord8 case tag_ of 0 -> get >>= \a -> return (Beer a) 1 -> return Coffee 2 -> return Tea 3 -> return EnergyDrink 4 -> return Water 5 -> return Wine 6 -> return Whisky The use of SYB techniques to provide a 'deriving' script along with a new typeclass seems to be quite handy. -- Don

Arie Peterson

2:40 p.m.

Hello, Donald Bruce Stewart wrote:

...

Ok, I forgot one point. It is possible to automatically derive instances of Binary for your custom types, if they inhabit Data and Typeable, using an SYB trick. Load tools/derive/BinaryDerive.hs into ghci, and bring your type into scope, then run:

*Main> mapM_ putStrLn . lines $ derive (undefined :: Drinks)

It would seem that one needs to rerun the script every time the type is changed. That would be unfortunate. Perhaps I could have a go at writing a template haskell function to derive those instances? I also fear that the existing script does not handle types with more than 256 constructors correctly. While uncommon, those are not unrealistic. Using DrIFT would probably automate the deriving just as well, but in my particular situation TH support is easier to maintain than DrIFT support. Greetings, Arie

Joel Reymont

2:43 p.m.

On Jan 26, 2007, at 2:40 PM, Arie Peterson wrote:

...

Using DrIFT would probably automate the deriving just as well, but in my particular situation TH support is easier to maintain than DrIFT support.

May I ask why TH is easier to maintain than DrIFT? I'm not familiar with DrIFT. Why would I prefer one over the other? Thanks, Joel -- http://wagerlabs.com/

Arie Peterson

3:16 p.m.

Joel Reymont wrote:

...

May I ask why TH is easier to maintain than DrIFT?

I'm not familiar with DrIFT.

The reason is personal, and very silly. I only use ghc, so TH is available automatically. Like you, I have never used DrIFT, so I would have to get to know it, and install it everywhere I want to compile my program. From a very quick look at the DrIFT homepage, installation might be nontrivial on a windows machine without some cygwin-like environment. At any rate, *for me* it's more work than using TH, because I'm familiar with the latter and already depend on its presence.

...

Why would I prefer one over the other?

I wouldn't know. Please do not let my prejudice influence your preference :-). Greetings, Arie

Dougal Stanton

3:12 p.m.

Quoth Arie Peterson, nevermore,

...

I also fear that the existing script does not handle types with more than 256 constructors correctly. While uncommon, those are not unrealistic.

"256 constructors ought to be enough for anybody"? ;-) Seriously though, the thought of a type definition that heavyweight quite terrifies me. I would be interested to see if such a thing could be warranted and not more sensibly broken down into smaller (sets of) units. I like to think of types as being a bit like functions; and there is no way I would ever think about a function with 256+ parameters. For a start, my screen isn't wide enough for that kind of thing... But, well done to the people responsible for the binary stuff. It looks fab. D. -- Dougal Stanton http://brokenhut.livejournal.com Word attachments considered harmful.

Tomasz Zielonka

3:35 p.m.

On Fri, Jan 26, 2007 at 03:12:29PM +0000, Dougal Stanton wrote:

...

Quoth Arie Peterson, nevermore,

...
I also fear that the existing script does not handle types with more than 256 constructors correctly. While uncommon, those are not unrealistic.

"256 constructors ought to be enough for anybody"? ;-)

Seriously though, the thought of a type definition that heavyweight quite terrifies me.

Think about simple enumerations, eg. for keywords in a programming language: data Keyword = IF | THEN | ELSE | BEGIN | END ... http://www.cs.vu.nl/grammars/cobol/: Number of keywords: 420 Perhaps such examples could be treated differently, but I think it's better to have a more general solution and not have to assume unneccesary restrictions on user's datatypes.

...

I would be interested to see if such a thing could be warranted and not more sensibly broken down into smaller (sets of) units.

I think in the above example the most sensible thing is to have all the keywords in the same datatype. Best regards Tomasz

John Meacham

9:29 p.m.

On Fri, Jan 26, 2007 at 03:40:42PM +0100, Arie Peterson wrote:

...

Using DrIFT would probably automate the deriving just as well, but in my particular situation TH support is easier to maintain than DrIFT support.

DrIFT as of 2.2.1 now supports binary for this package. using it is as simple as this:

...

import Data.Binary

data Foo = Foo Int Char | Bar Foo {-!derive: Binary -}

and then compiling with the following extra options to ghc ghc -pgmF drift-ghc -F ... now everything will be taken care of automatically. John -- John Meacham - ⑆repetae.net⑆john⑈

Duncan Coutts

27 Jan 27 Jan

9:45 p.m.

New subject: ANNOUNCE: binary: high performance, pure binary serialisation

On Fri, 2007-01-26 at 15:40 +0100, Arie Peterson wrote:

...

Hello,

Donald Bruce Stewart wrote:

...
Ok, I forgot one point. It is possible to automatically derive instances of Binary for your custom types, if they inhabit Data and Typeable, using an SYB trick. Load tools/derive/BinaryDerive.hs into ghci, and bring your type into scope, then run:

*Main> mapM_ putStrLn . lines $ derive (undefined :: Drinks)

It would seem that one needs to rerun the script every time the type is changed. That would be unfortunate. Perhaps I could have a go at writing a template haskell function to derive those instances?

I also fear that the existing script does not handle types with more than 256 constructors correctly. While uncommon, those are not unrealistic.

Feel free to send in a patch. All it needs to do is check if there are more than 2^8 constructors and if so encode the tag in a Word16 rather than Word8. Duncan

Neil Mitchell

10:02 p.m.

...

...
I also fear that the existing script does not handle types with more than 256 constructors correctly. While uncommon, those are not unrealistic.

Feel free to send in a patch. All it needs to do is check if there are more than 2^8 constructors and if so encode the tag in a Word16 rather than Word8.

I've already fixed this and sent some new code to you. As long as you have less than 4 billion constructors you should be fine now. Thanks Neil

John Meacham

26 Jan 26 Jan

3:11 a.m.

New subject: [Haskell] ANNOUNCE: binary: high performance, pure binary serialisation

Yay! I knew if I waited long enough someone would write this. Is the binary format portable? I need the produced files to work on both 32 and 64 bit architectures and with big and little endian machines. And of course, between different versions of a compiler or different compilers. John -- John Meacham - ⑆repetae.net⑆john⑈

dons＠cse.unsw.edu.au

3:16 a.m.

New subject: [Haskell] ANNOUNCE: binary: high performance, pure binary serialisation

john:

...

Yay! I knew if I waited long enough someone would write this.

Is the binary format portable? I need the produced files to work on both 32 and 64 bit architectures and with big and little endian machines. And of course, between different versions of a compiler or different compilers.

We believe so, and its a bug if this is not the case. The src documents the encoding format used for each type (we were unable to attach haddocks to instances.. grr.) All data is encoded in Network order, and extended to 64 bits for word sized values (like Int). It should be possible to encode a structure with ghc on x86, and decode it on a sparc64 running hugs. -- Don

Tomasz Zielonka

3:36 p.m.

New subject: [Haskell] ANNOUNCE: binary: high performance, pure binary serialisation

On Fri, Jan 26, 2007 at 02:16:22PM +1100, Donald Bruce Stewart wrote:

...

We believe so, and its a bug if this is not the case.

The src documents the encoding format used for each type (we were unable to attach haddocks to instances.. grr.)

All data is encoded in Network order, and extended to 64 bits for word sized values (like Int). It should be possible to encode a structure with ghc on x86, and decode it on a sparc64 running hugs.

Did you consider using an encoding which uses variable number of bytes? If yes, I would be interested to know your reason for not choosing such an encoding. Efficiency? Best regards Tomasz

Malcolm Wallace

4:56 p.m.

New subject: [Haskell] ANNOUNCE: binary: high performance, pure binary serialisation

Tomasz Zielonka wrote:

...

Did you consider using an encoding which uses variable number of bytes? If yes, I would be interested to know your reason for not choosing such an encoding. Efficiency?

My Binary implementation (from 1998) used a type-specific number of bits to encode the constructor - exactly as many as needed. (If you were writing custom instances, you could even use a variable number of bits for the constructor, e.g. using Huffman encoding to make the more common constructors have the shortest representation.) The latter certainly imposes an extra time overhead on decoding, because you cannot just take a fixed-size chunk of bits and have the value. But I would have thought that in the regular case, using a type-specific (but not constructor-specific) size for representing the constructor would be very easy and have no time overhead at all. Regards, Malcolm

Neil Davies

5:41 p.m.

New subject: [Haskell] ANNOUNCE: binary: high performance, pure binary serialisation

existing ecoding system - both the BER (Basic Encoding Rules) and the PER (Packed Encoding Rules). If you are looking to target a well supported standard - this would be the one. Neil On 26/01/07, Malcolm Wallace wrote:

...

Tomasz Zielonka wrote:

...
Did you consider using an encoding which uses variable number of bytes? If yes, I would be interested to know your reason for not choosing such an encoding. Efficiency?

My Binary implementation (from 1998) used a type-specific number of bits to encode the constructor - exactly as many as needed. (If you were writing custom instances, you could even use a variable number of bits for the constructor, e.g. using Huffman encoding to make the more common constructors have the shortest representation.)

The latter certainly imposes an extra time overhead on decoding, because you cannot just take a fixed-size chunk of bits and have the value. But I would have thought that in the regular case, using a type-specific (but not constructor-specific) size for representing the constructor would be very easy and have no time overhead at all.

Regards, Malcolm _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Tony Finch

28 Jan 28 Jan

9:36 p.m.

New subject: [Haskell] ANNOUNCE: binary: high performance, pure binary serialisation

On Fri, 26 Jan 2007, Neil Davies wrote:

...

existing ecoding system - both the BER (Basic Encoding Rules) and the PER (Packed Encoding Rules). If you are looking to target a well supported standard - this would be the one.

I'd say that ASN.1 encoding rules are badly, but widely supported. A surprisingly large number of security problems have been caused by ASN.1 code, and similar bugs have turned up in independent implementations so it isn't just one widespread shoddy implementation. OTOH ASN.1 is used by TLS, LDAP, Kerberos, SNMP, S/MIME, H.323, etc. etc. Tony. -- f.a.n.finch http://dotat.at/ IRISH SEA: NORTHWEST 3 OR 4, OCCASIONALLY 5 AT FIRST. SLIGHT OR MODERATE. SHOWERS. GOOD.

John Meacham

26 Jan 26 Jan

8:21 p.m.

New subject: [Haskell] ANNOUNCE: binary: high performance, pure binary serialisation

On Fri, Jan 26, 2007 at 04:36:50PM +0100, Tomasz Zielonka wrote:

...

Did you consider using an encoding which uses variable number of bytes? If yes, I would be interested to know your reason for not choosing such an encoding. Efficiency?

I am testing/benchmarking one right now I wrote for 'Integer', so far, I think it may be better in time _and_ space! cache effects no doubt. A nice thing about it is that for the common case, short ascii strings, the serialized form takes up exactly as much as they would in C, very nice. :) John -- John Meacham - ⑆repetae.net⑆john⑈

dons＠cse.unsw.edu.au

11:37 p.m.

New subject: [Haskell] ANNOUNCE: binary: high performance, pure binary serialisation

tomasz.zielonka:

...

On Fri, Jan 26, 2007 at 02:16:22PM +1100, Donald Bruce Stewart wrote:

...
We believe so, and its a bug if this is not the case.

The src documents the encoding format used for each type (we were unable to attach haddocks to instances.. grr.)

All data is encoded in Network order, and extended to 64 bits for word sized values (like Int). It should be possible to encode a structure with ghc on x86, and decode it on a sparc64 running hugs.

Did you consider using an encoding which uses variable number of bytes? If yes, I would be interested to know your reason for not choosing such an encoding. Efficiency?

Yes, efficiency. If you look in tests/ there's a pretty heavy duty benchmark we use to compare against C. Sticking to word sized writes where possible is a big one (up to 10 fold). Interestingly, I did write an aligned-only, host-endian layer, and it was only some 10% faster on x86 over network order code. -- Don

John Meacham

3:27 a.m.

New subject: [Haskell] ANNOUNCE: binary: high performance, pure binary serialisation

On Thu, Jan 25, 2007 at 07:11:55PM -0800, John Meacham wrote:

...

Is the binary format portable? I need the produced files to work on both 32 and 64 bit architectures and with big and little endian machines. And of course, between different versions of a compiler or different compilers.

Sorry to reply to myself, looking at the code, I see that it is. however, Ints appear to be stored as 64 bits always, this seems like a mistake. The Haskell standard only specifies Ints must have at least 30 bits of precision so programs that rely on more than that are not portable anyway. Plus, it is unlikely that any compilers ever will have Ints > 32 bits, ghc does at the moment by accident of design and it is considered a misfeature that will be fixed at some point. It would be an ugly wart to be stuck with going forward... John -- John Meacham - ⑆repetae.net⑆john⑈

dons＠cse.unsw.edu.au

3:39 a.m.

New subject: [Haskell] ANNOUNCE: binary: high performance, pure binary serialisation

john:

...

On Thu, Jan 25, 2007 at 07:11:55PM -0800, John Meacham wrote:

...
Is the binary format portable? I need the produced files to work on both 32 and 64 bit architectures and with big and little endian machines. And of course, between different versions of a compiler or different compilers.

Sorry to reply to myself, looking at the code, I see that it is. however, Ints appear to be stored as 64 bits always, this seems like a mistake. The Haskell standard only specifies Ints must have at least 30 bits of precision so programs that rely on more than that are not portable anyway. Plus, it is unlikely that any compilers ever will have Ints > 32 bits, ghc does at the moment by accident of design and it is considered a misfeature that will be fixed at some point. It would be an ugly wart to be stuck with going forward...

This was perhaps the only issue of contention during development. It was felt that those wishing to serialise precisely would use an explicit word sized type, such as Int64 or Word32, and that having things work correctly on ghc/amd64 right now was critical. If the Int/Int64 issue is resolved in the future, we can revisit this. Its fairly painless to upgrade files from one version of a Binary instance to another too (you just read in using the old 'get' method, and write back out using the new 'put' method). -- Don

John Meacham

5:17 a.m.

New subject: ANNOUNCE: DrIFT 2.2.1 - support for Data.Binary

DrIFT 2.2.1 is out and now has support for the Data.Binary module. The old 'Binary' has been moved to 'BitsBinary' and 'Binary' now refers to the new 'Data.Binary' version of the library. the homepage is at: http://repetae.net/~john/computer/haskell/DrIFT/ the current list of deriving rules it knows about is: Binary: Binary Data.Binary binary encoding of terms BitsBinary efficient binary encoding of terms GhcBinary byte sized binary encoding of terms Debugging: Arbitrary Derive reasonable Arbitrary for QuickCheck Observable HOOD observable General: NFData provides 'rnf' to reduce to normal form (deepSeq) Typeable derive Typeable for Dynamic Generics: FunctorM derive reasonable fmapM implementation HFoldable Strafunski hfoldr Monoid derive reasonable Data.Monoid implementation RMapM derive reasonable rmapM implementation Term Strafunski representation via Dynamic Prelude: Bounded Enum Eq Ord Read Show Representation: ATermConvertible encode terms in the ATerm format Haskell2Xml encode terms as XML (HaXml<=1.13) XmlContent encode terms as XML (HaXml>=1.14) Utility: Parse parse values back from standard 'Show' Query provide a QueryFoo class with 'is', 'has', 'from', and 'get' routines from provides fromFoo for each constructor get for label 'foo' provide foo_g to get it has hasfoo for record types is provides isFoo for each constructor test output raw data for testing un provides unFoo for unary constructors update for label 'foo' provides 'foo_u' to update it and foo_s to set it John -- John Meacham - ⑆repetae.net⑆john⑈

Simon Peyton-Jones

8:54 a.m.

New subject: ANNOUNCE: binary: high performance, pure binary serialisation

Congratulations, guys! Fast serialisation is one of the things that comes up over and over again, so an easy-to-use fast solution is a great step forward. (Credit too to earlier pioneers, notably Bulat.) Simon | -----Original Message----- | From: haskell-cafe-bounces@haskell.org [mailto:haskell-cafe-bounces@haskell.org] On Behalf Of Donald | Bruce Stewart | Sent: 26 January 2007 02:51 | To: haskell@haskell.org | Cc: haskell-cafe@haskell.org | Subject: [Haskell-cafe] ANNOUNCE: binary: high performance, pure binary serialisation | | | Binary: high performance, pure binary serialisation for Haskell | ---------------------------------------------------------------------- | | The Binary Strike Team is pleased to announce the release of a new, | pure, efficient binary serialisation library for Haskell, now available | from Hackage: | | tarball: http://hackage.haskell.org/cgi-bin/hackage-scripts/package/binary/0.2 | darcs: darcs get http://darcs.haskell.org/binary | haddocks: http://www.cse.unsw.edu.au/~dons/binary/Data-Binary.html

Henning Thielemann

3:31 p.m.

On Fri, 26 Jan 2007, Donald Bruce Stewart wrote:

...

Binary: high performance, pure binary serialisation for Haskell ----------------------------------------------------------------------

The Binary Strike Team is pleased to announce the release of a new, pure, efficient binary serialisation library for Haskell, now available from Hackage:

tarball: http://hackage.haskell.org/cgi-bin/hackage-scripts/package/binary/0.2 darcs: darcs get http://darcs.haskell.org/binary haddocks: http://www.cse.unsw.edu.au/~dons/binary/Data-Binary.html

I want to write out data in the machine's endianess, because that data will be post-processed by sox, which reads data in the machine's endianess. Is this also planned for the package?

Tomasz Zielonka

3:42 p.m.

On Fri, Jan 26, 2007 at 04:31:28PM +0100, Henning Thielemann wrote:

...

On Fri, 26 Jan 2007, Donald Bruce Stewart wrote:

...
Binary: high performance, pure binary serialisation for Haskell ----------------------------------------------------------------------

The Binary Strike Team is pleased to announce the release of a new, pure, efficient binary serialisation library for Haskell, now available from Hackage:

tarball: http://hackage.haskell.org/cgi-bin/hackage-scripts/package/binary/0.2 darcs: darcs get http://darcs.haskell.org/binary haddocks: http://www.cse.unsw.edu.au/~dons/binary/Data-Binary.html

I want to write out data in the machine's endianess, because that data will be post-processed by sox, which reads data in the machine's endianess. Is this also planned for the package?

I also have to use a specific serialisation format. I guess we could both simply use putWord8, but then we'll probably lose most of the benefits of using the library. Perhaps we could think about introducing some "encoding contexts", with a default encoding that can be automatically derived, but also with the ability to create one's own encodings? Best regards Tomasz

John Meacham

8:36 p.m.

On Fri, Jan 26, 2007 at 04:42:48PM +0100, Tomasz Zielonka wrote:

...

I also have to use a specific serialisation format. I guess we could both simply use putWord8, but then we'll probably lose most of the benefits of using the library.

Perhaps we could think about introducing some "encoding contexts", with a default encoding that can be automatically derived, but also with the ability to create one's own encodings?

one can use newtypes they would be faster in any case, I was thinking something like:

...

newtype XDRInt = XDRInt Int newtype XDRSTring sz = ...

and so forth, now if you build up a structure

...

data NfsSattr = NfsSattr { mode :: XdrUnsigend, -- protection mode bits uid :: XdrUnsigned, -- owner user id gid :: XdrUnsigned, -- owner group id size :: XdrUnsigned, -- file size in bytes atime :: XdrNfsTime, -- time of last access mtime :: XdrNfsTime -- time of last modification }

now you can speak nfs directly by serializing right from and to your socket! :) a whole filesystem implemented in haskell in not so many lines. very nice. actually, I probably will write Data.Binary.Protocol.Xdr (better location?). I actually do have a NFS server written in haskell in a much more clunky way I could revive. now, the only new primitives I would need are:

...

alignTo :: Word8 -> Int -> Put alignTo _ _ = ...

setAlignment :: Int -> Put setAlignment _ = ...

where alignTo would output some number of bytes in order to bring the stream to the next alignment boundry specified, and setAlignment would force the current alignment to be some value, without outputing any bytes. Would these be doable? They would open up a lot of possibilities. John -- John Meacham - ⑆repetae.net⑆john⑈

dons＠cse.unsw.edu.au

11:27 p.m.

john:

...

On Fri, Jan 26, 2007 at 04:42:48PM +0100, Tomasz Zielonka wrote:

...
I also have to use a specific serialisation format. I guess we could both simply use putWord8, but then we'll probably lose most of the benefits of using the library.

Perhaps we could think about introducing some "encoding contexts", with a default encoding that can be automatically derived, but also with the ability to create one's own encodings?

one can use newtypes they would be faster in any case, I was thinking something like:

...
newtype XDRInt = XDRInt Int newtype XDRSTring sz = ...

and so forth, now if you build up a structure

...
data NfsSattr = NfsSattr { mode :: XdrUnsigend, -- protection mode bits uid :: XdrUnsigned, -- owner user id gid :: XdrUnsigned, -- owner group id size :: XdrUnsigned, -- file size in bytes atime :: XdrNfsTime, -- time of last access mtime :: XdrNfsTime -- time of last modification }

now you can speak nfs directly by serializing right from and to your socket! :) a whole filesystem implemented in haskell in not so many lines. very nice.

actually, I probably will write Data.Binary.Protocol.Xdr (better location?). I actually do have a NFS server written in haskell in a much more clunky way I could revive.

now, the only new primitives I would need are:

...
alignTo :: Word8 -> Int -> Put alignTo _ _ = ...

setAlignment :: Int -> Put setAlignment _ = ...

where alignTo would output some number of bytes in order to bring the stream to the next alignment boundry specified, and setAlignment would force the current alignment to be some value, without outputing any bytes. Would these be doable? They would open up a lot of possibilities.

I think a StateT over Put/Get that carries around a count of the bytes written, and the alignment, could be used. In general we envisage monad transformers over Get/Put for adding things like bitwise writes, aligned writes and so on. -- Don

dons＠cse.unsw.edu.au

11:25 p.m.

tomasz.zielonka:

...

On Fri, Jan 26, 2007 at 04:31:28PM +0100, Henning Thielemann wrote:

...
On Fri, 26 Jan 2007, Donald Bruce Stewart wrote:

...
Binary: high performance, pure binary serialisation for Haskell ----------------------------------------------------------------------

The Binary Strike Team is pleased to announce the release of a new, pure, efficient binary serialisation library for Haskell, now available from Hackage:

tarball: http://hackage.haskell.org/cgi-bin/hackage-scripts/package/binary/0.2 darcs: darcs get http://darcs.haskell.org/binary haddocks: http://www.cse.unsw.edu.au/~dons/binary/Data-Binary.html

I want to write out data in the machine's endianess, because that data will be post-processed by sox, which reads data in the machine's endianess. Is this also planned for the package?

I also have to use a specific serialisation format. I guess we could both simply use putWord8, but then we'll probably lose most of the benefits of using the library.

Perhaps we could think about introducing some "encoding contexts", with a default encoding that can be automatically derived, but also with the ability to create one's own encodings?

Note that using Binary directly for non-Haskell structures is a bit like using Read/Show instances for parsing non-Haskell structures: possible, but not optimal. It would be better to use the underlying Get/Put monads available in binary, with the low level support for explicit endian and word sized writes/reads, to build a combinator library on top for these more flexible parsing/binary requirements (like layering Parsec over ReadP or HughesPJ). -- Don

dons＠cse.unsw.edu.au

11:23 p.m.

lemming:

...

On Fri, 26 Jan 2007, Donald Bruce Stewart wrote:

...
Binary: high performance, pure binary serialisation for Haskell ----------------------------------------------------------------------

The Binary Strike Team is pleased to announce the release of a new, pure, efficient binary serialisation library for Haskell, now available from Hackage:

tarball: http://hackage.haskell.org/cgi-bin/hackage-scripts/package/binary/0.2 darcs: darcs get http://darcs.haskell.org/binary haddocks: http://www.cse.unsw.edu.au/~dons/binary/Data-Binary.html

I want to write out data in the machine's endianess, because that data will be post-processed by sox, which reads data in the machine's endianess. Is this also planned for the package?

The underlying Get and Put monads support explicit endian writes and reads, which you can add to your instances explicitly: http://www.cse.unsw.edu.au/~dons/binary/Data-Binary-Get.html#5 http://www.cse.unsw.edu.au/~dons/binary/Data-Binary-Put.html#5 So you can do that now. -- Don

Chris Kuklewicz

11:27 p.m.

New subject: ANNOUNCE: binary: high performance, pure binary serialisation

Donald Bruce Stewart wrote:

...

lemming:

...
On Fri, 26 Jan 2007, Donald Bruce Stewart wrote:

...
Binary: high performance, pure binary serialisation for Haskell ----------------------------------------------------------------------

The Binary Strike Team is pleased to announce the release of a new, pure, efficient binary serialisation library for Haskell, now available from Hackage:

tarball: http://hackage.haskell.org/cgi-bin/hackage-scripts/package/binary/0.2 darcs: darcs get http://darcs.haskell.org/binary haddocks: http://www.cse.unsw.edu.au/~dons/binary/Data-Binary.html

I want to write out data in the machine's endianess, because that data will be post-processed by sox, which reads data in the machine's endianess. Is this also planned for the package?

The underlying Get and Put monads support explicit endian writes and reads, which you can add to your instances explicitly:

http://www.cse.unsw.edu.au/~dons/binary/Data-Binary-Get.html#5 http://www.cse.unsw.edu.au/~dons/binary/Data-Binary-Put.html#5

So you can do that now.

-- Don

The documentation has a small organization bug: http://www.cse.unsw.edu.au/~dons/binary/Data-Binary-Get.html#5 The "Big-endian reads" section has both big endian and little endian functions. Same for the "Little-endian reads" section. The page for Put is okay. -- Chris

Henning Thielemann

27 Jan 27 Jan

6:11 p.m.

On Sat, 27 Jan 2007, Donald Bruce Stewart wrote:

...

lemming:

...
On Fri, 26 Jan 2007, Donald Bruce Stewart wrote:

...
Binary: high performance, pure binary serialisation for Haskell ----------------------------------------------------------------------

The Binary Strike Team is pleased to announce the release of a new, pure, efficient binary serialisation library for Haskell, now available from Hackage:

tarball: http://hackage.haskell.org/cgi-bin/hackage-scripts/package/binary/0.2 darcs: darcs get http://darcs.haskell.org/binary haddocks: http://www.cse.unsw.edu.au/~dons/binary/Data-Binary.html

I want to write out data in the machine's endianess, because that data will be post-processed by sox, which reads data in the machine's endianess. Is this also planned for the package?

The underlying Get and Put monads support explicit endian writes and reads, which you can add to your instances explicitly:

http://www.cse.unsw.edu.au/~dons/binary/Data-Binary-Get.html#5 http://www.cse.unsw.edu.au/~dons/binary/Data-Binary-Put.html#5

So you can do that now.

Of course, I can check for the machine's endianess, then decide whether to use putWord16be and putWord16le. But I assume that it is more efficient, if there would be some function putWord16native, which writes out in the machine's native endianess.

Duncan Coutts

9:40 p.m.

New subject: ANNOUNCE: binary: high performance, pure binary serialisation

On Sat, 2007-01-27 at 19:11 +0100, Henning Thielemann wrote:

...

On Sat, 27 Jan 2007, Donald Bruce Stewart wrote:

...

...
The underlying Get and Put monads support explicit endian writes and reads, which you can add to your instances explicitly:

http://www.cse.unsw.edu.au/~dons/binary/Data-Binary-Get.html#5 http://www.cse.unsw.edu.au/~dons/binary/Data-Binary-Put.html#5

So you can do that now.

Of course, I can check for the machine's endianess, then decide whether to use putWord16be and putWord16le. But I assume that it is more efficient, if there would be some function putWord16native, which writes out in the machine's native endianess.

Note that actually taking advantage of aligned word reads/writes is not especially simple since one has to guarantee alignment. So that means no byte writes, everything, chars, bools, record tags would have to be word sized. While it's not so hard to arrange alignment for writes (since we get to allocate the buffers) for reads we have no guarantee since the user supplies the byte string input and they could have uses say 'tail' so the data may be all unaligned. Dealing with both aligned and unaligned makes it more complex. We did a performance experiment with this and found that it gained us very little. There are bigger performance bottlenecks at the moment than the penalty of always doing byte reads/writes. Duncan

dons＠cse.unsw.edu.au

28 Jan 28 Jan

12:46 a.m.

lemming:

...

On Sat, 27 Jan 2007, Donald Bruce Stewart wrote:

...
lemming:

...
On Fri, 26 Jan 2007, Donald Bruce Stewart wrote:

...
Binary: high performance, pure binary serialisation for Haskell ----------------------------------------------------------------------

The Binary Strike Team is pleased to announce the release of a new, pure, efficient binary serialisation library for Haskell, now available from Hackage:

tarball: http://hackage.haskell.org/cgi-bin/hackage-scripts/package/binary/0.2 darcs: darcs get http://darcs.haskell.org/binary haddocks: http://www.cse.unsw.edu.au/~dons/binary/Data-Binary.html

I want to write out data in the machine's endianess, because that data will be post-processed by sox, which reads data in the machine's endianess. Is this also planned for the package?

The underlying Get and Put monads support explicit endian writes and reads, which you can add to your instances explicitly:

http://www.cse.unsw.edu.au/~dons/binary/Data-Binary-Get.html#5 http://www.cse.unsw.edu.au/~dons/binary/Data-Binary-Put.html#5

So you can do that now.

Of course, I can check for the machine's endianess, then decide whether to use putWord16be and putWord16le. But I assume that it is more efficient, if there would be some function putWord16native, which writes out in the machine's native endianess.

Ah yes, I actually did have the putWord*native in the code at one point, in my branch for doing aligned-only writes. The speed up wasn't signficant so I didn't commit it, but the host-order primitives are useful, so will appear in the next version of the library. -- Don

dons＠cse.unsw.edu.au

4:23 a.m.

dons:

...

lemming:

...
On Sat, 27 Jan 2007, Donald Bruce Stewart wrote:

...
lemming:

...
On Fri, 26 Jan 2007, Donald Bruce Stewart wrote:

...
Binary: high performance, pure binary serialisation for Haskell ----------------------------------------------------------------------

The Binary Strike Team is pleased to announce the release of a new, pure, efficient binary serialisation library for Haskell, now available from Hackage:

tarball: http://hackage.haskell.org/cgi-bin/hackage-scripts/package/binary/0.2 darcs: darcs get http://darcs.haskell.org/binary haddocks: http://www.cse.unsw.edu.au/~dons/binary/Data-Binary.html

I want to write out data in the machine's endianess, because that data will be post-processed by sox, which reads data in the machine's endianess. Is this also planned for the package?

The underlying Get and Put monads support explicit endian writes and reads, which you can add to your instances explicitly:

http://www.cse.unsw.edu.au/~dons/binary/Data-Binary-Get.html#5 http://www.cse.unsw.edu.au/~dons/binary/Data-Binary-Put.html#5

So you can do that now.

Of course, I can check for the machine's endianess, then decide whether to use putWord16be and putWord16le. But I assume that it is more efficient, if there would be some function putWord16native, which writes out in the machine's native endianess.

I've added raw primitives for: {put,get}Wordhost {put,get}Word16host {put,get}Word32host {put,get}Word64host which do unaligned, host-sized, host-endian packing of data. Writing is some 15% faster for Words, a bit less for reading (which is a bit unoptimised at the moment), and a bit less for other types. Needless to say, data serialised this way is only portable to the same architecture. Now available from specially selected bookshops and darcs repositories. Please enjoy responsibly! -- Don

Conrad Parker

30 Jan 30 Jan

12:30 a.m.

On 28/01/07, Donald Bruce Stewart wrote:

...

I've added raw primitives for:

{put,get}Wordhost {put,get}Word16host {put,get}Word32host {put,get}Word64host

which do unaligned, host-sized, host-endian packing of data.

Writing is some 15% faster for Words, a bit less for reading (which is a bit unoptimised at the moment), and a bit less for other types. Needless to say, data serialised this way is only portable to the same architecture.

If the data header stores the alignment/size/endianness, then there's no reason for the data to be unportable. The normal get* instances (not get*host) could suffice for reading. Conrad.

Neil Mitchell

12:35 a.m.

Hi Conrad,

...

If the data header stores the alignment/size/endianness, then there's no reason for the data to be unportable. The normal get* instances (not get*host) could suffice for reading.

That requires the stream to have a header. Which means that any arbitrary slice within the ByteString is not equal - the first one contains essential information which isn't available anywhere else. It seems like a lot of complexity, defining multiple parameterised file types, when the simple case is probably neater and cheaper in terms of both runtime and developer time. Thanks Neil

Ross Paterson

29 Jan 29 Jan

3:38 p.m.

On Fri, Jan 26, 2007 at 01:51:01PM +1100, Donald Bruce Stewart wrote:

...

Binary: high performance, pure binary serialisation for Haskell ----------------------------------------------------------------------

The Binary Strike Team is pleased to announce the release of a new, pure, efficient binary serialisation library for Haskell, now available from Hackage:

tarball: http://hackage.haskell.org/cgi-bin/hackage-scripts/package/binary/0.2 darcs: darcs get http://darcs.haskell.org/binary haddocks: http://www.cse.unsw.edu.au/~dons/binary/Data-Binary.html

Remind me again: why do you need a Put monad, which always seems to have the argument type ()? Monoids really are underappreciated.

Bjorn Bringert

8 p.m.

On Jan 29, 2007, at 16:38 , Ross Paterson wrote:

...

On Fri, Jan 26, 2007 at 01:51:01PM +1100, Donald Bruce Stewart wrote:

...
Binary: high performance, pure binary serialisation for Haskell

--------------------------------------------------------------------- -

The Binary Strike Team is pleased to announce the release of a new, pure, efficient binary serialisation library for Haskell, now available from Hackage:

tarball: http://hackage.haskell.org/cgi-bin/hackage-scripts/ package/binary/0.2 darcs: darcs get http://darcs.haskell.org/binary haddocks: http://www.cse.unsw.edu.au/~dons/binary/Data-Binary.html

Remind me again: why do you need a Put monad, which always seems to have the argument type ()? Monoids really are underappreciated.

I don't know if that's the main reason, but monads give you do- notation, which can be very nice when you are doing a long sequence of puts. Here's an example from the tar package [1]: putHeaderNoChkSum :: TarHeader -> Put putHeaderNoChkSum hdr = do let (filePrefix, fileSuffix) = splitLongPath 100 (tarFileName hdr) putString 100 $ fileSuffix putOct 8 $ tarFileMode hdr putOct 8 $ tarOwnerID hdr putOct 8 $ tarGroupID hdr putOct 12 $ tarFileSize hdr putOct 12 $ let TOD s _ = tarModTime hdr in s fill 8 $ ' ' -- dummy checksum putTarFileType $ tarFileType hdr putString 100 $ tarLinkTarget hdr -- FIXME: take suffix split at / if too long putString 6 $ "ustar " putString 2 $ " " -- strange ustar version putString 32 $ tarOwnerName hdr putString 32 $ tarGroupName hdr putOct 8 $ tarDeviceMajor hdr putOct 8 $ tarDeviceMinor hdr putString 155 $ filePrefix fill 12 $ '\NUL' I guess mconcat [putX, putY, ... ] would work too, but the syntax is not quite as nice. /Björn [1] http://www.cs.chalmers.se/~bringert/darcs/tar/

dons＠cse.unsw.edu.au

10:52 p.m.

ross:

...

On Fri, Jan 26, 2007 at 01:51:01PM +1100, Donald Bruce Stewart wrote:

...
Binary: high performance, pure binary serialisation for Haskell ----------------------------------------------------------------------

The Binary Strike Team is pleased to announce the release of a new, pure, efficient binary serialisation library for Haskell, now available from Hackage:

tarball: http://hackage.haskell.org/cgi-bin/hackage-scripts/package/binary/0.2 darcs: darcs get http://darcs.haskell.org/binary haddocks: http://www.cse.unsw.edu.au/~dons/binary/Data-Binary.html

Remind me again: why do you need a Put monad, which always seems to have the argument type ()? Monoids really are underappreciated.

For the syntax, and So that people can directly port their code from NewBinary. (The instances are basically unchanged). newtype PutM a = Put { unPut :: (a, Builder) } type Put = PutM () It is always (). -- Don

Tomasz Zielonka

11:31 p.m.

On Tue, Jan 30, 2007 at 09:52:01AM +1100, Donald Bruce Stewart wrote:

...

For the syntax, and So that people can directly port their code from NewBinary. (The instances are basically unchanged).

newtype PutM a = Put { unPut :: (a, Builder) } type Put = PutM ()

It is always ().

BTW, the PutM type constructor is not exported, but I think it may be useful in some uses on the library. Best regards Tomasz

dons＠cse.unsw.edu.au

11:40 p.m.

tomasz.zielonka:

...

On Tue, Jan 30, 2007 at 09:52:01AM +1100, Donald Bruce Stewart wrote:

...
For the syntax, and So that people can directly port their code from NewBinary. (The instances are basically unchanged).

newtype PutM a = Put { unPut :: (a, Builder) } type Put = PutM ()

It is always ().

BTW, the PutM type constructor is not exported, but I think it may be useful in some uses on the library.

Done. its in darcs now. -- Don

Ross Paterson

30 Jan 30 Jan

9:38 a.m.

On Tue, Jan 30, 2007 at 09:52:01AM +1100, Donald Bruce Stewart wrote:

...

ross:

...
why do you need a Put monad, which always seems to have the argument type ()? Monoids really are underappreciated.

For the syntax, and So that people can directly port their code from NewBinary. (The instances are basically unchanged).

And so the successor to binary must have the same interface, and so on forever. The backward compatibility argument seems weak to me, leaving only the advantage of do-notation. Monads are the new lists.

Duncan Coutts

10:22 a.m.

New subject: ANNOUNCE: binary: high performance, pure binary serialisation

On Tue, 2007-01-30 at 09:38 +0000, Ross Paterson wrote:

...

On Tue, Jan 30, 2007 at 09:52:01AM +1100, Donald Bruce Stewart wrote:

...
ross:

...
why do you need a Put monad, which always seems to have the argument type ()? Monoids really are underappreciated.

For the syntax, and So that people can directly port their code from NewBinary. (The instances are basically unchanged).

And so the successor to binary must have the same interface, and so on forever. The backward compatibility argument seems weak to me, leaving only the advantage of do-notation.

Monads are the new lists.

I was about to say that for the more complicated binary serialisation formats (eg GHC's .hi format) people need monads with state, like string pools etc, but actually now that I think about it, that can be done with a monoid too. Ross, you need to make a monoid transformer library (at least reader and state) and campaign for ++ to be redefined as mappend, then everyone will want to use it since it'll be so neat and convenient! :-) Can anyone see a real serialisation use case that needs a monad for the serialisation side? I'd thought I had an example, but I was wrong. Duncan

Tomasz Zielonka

10:33 a.m.

On Tue, Jan 30, 2007 at 10:22:58AM +0000, Duncan Coutts wrote:

...

I was about to say that for the more complicated binary serialisation formats (eg GHC's .hi format) people need monads with state, like string pools etc, but actually now that I think about it, that can be done with a monoid too.

Ross, you need to make a monoid transformer library (at least reader and state) and campaign for ++ to be redefined as mappend, then everyone will want to use it since it'll be so neat and convenient! :-)

Can anyone see a real serialisation use case that needs a monad for the serialisation side? I'd thought I had an example, but I was wrong.

If you really want a monad, you can put the monoid into the writer monad - and that's exactly what is done in Data.Binary! However, for someone who prefers the monadic interface, wrapping in the writer monad every time and additional boilerplate code (tell) can be a nuisance. Best regards Tomasz

Ross Paterson

6:26 p.m.

On Tue, Jan 30, 2007 at 10:22:58AM +0000, Duncan Coutts wrote:

...

Ross, you need to make a monoid transformer library (at least reader and state) and campaign for ++ to be redefined as mappend, then everyone will want to use it since it'll be so neat and convenient! :-)

Reader is already there. The best way to do state, I think, is to generalize the Endo type from (->) to any arrow and then use Kleisli (Writer m). Monoids could certainly do with a more convenient syntax, but there's no hope of redefining ++. Another problem is that we have Monoid, Alternative, MonadPlus and ArrowPlus all with similar operations. If only we could write instance (Monad m, forall a. Monoid (m a)) => MonadPlus m This was proposed in http://research.microsoft.com/Users/simonpj/Papers/derive.htm but nothing seems to have come of it.

Bulat Ziganshin

31 Jan 31 Jan

11:38 a.m.

New subject: Re[2]: ANNOUNCE: binary: high performance, pure binary serialisation

Hello Duncan, Tuesday, January 30, 2007, 1:22:58 PM, you wrote:

...

Can anyone see a real serialisation use case that needs a monad for the serialisation side? I'd thought I had an example, but I was wrong.

my program, FreeArc, has its own compression level on top of serializing - i.e. data serialized sent in 64k blocks to the C compression routine and both serialization and compression are run at the same time using threads -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Duncan Coutts

1 Feb 1 Feb

12:39 a.m.

New subject: Re[2]: ANNOUNCE: binary: high performance, pure binary serialisation

On Wed, 2007-01-31 at 14:38 +0300, Bulat Ziganshin wrote:

...

Hello Duncan,

Tuesday, January 30, 2007, 1:22:58 PM, you wrote:

...
Can anyone see a real serialisation use case that needs a monad for the serialisation side? I'd thought I had an example, but I was wrong.

my program, FreeArc, has its own compression level on top of serializing - i.e. data serialized sent in 64k blocks to the C compression routine and both serialization and compression are run at the same time using threads

That's an interesting suggestion! I think the way we'd tackle this is not by integrating such a feature into the monoid/monad but instead do it as a parallel evaluation strategy on the lazy bytestring. I think it'd be something with type :: Lazy.ByteString -> Lazy.ByteString and it'd work on the list structure of the lazy bytestring using something like: parBuffer 1 rwhnf :: [Strict.ByteString] -> [Strict.ByteString] (from Control.Parallel.Strategies) so you'd use it like so: writeFile f . GZip.compress . parBuffer . Binary.serialise Hmm, nice. Pure, lazy, compositional. I wonder if it'll work... Duncan

Bulat Ziganshin

7:47 a.m.

New subject: Re[4]: ANNOUNCE: binary: high performance, pure binary serialisation

Hello Duncan, Thursday, February 1, 2007, 3:39:16 AM, you wrote:

...

...
...
Can anyone see a real serialisation use case that needs a monad for the serialisation side? I'd thought I had an example, but I was wrong.

my program, FreeArc, has its own compression level on top of serializing - i.e. data serialized sent in 64k blocks to the C compression routine and both serialization and compression are run at the same time using threads

i mean that in real world, programs may need to do something in IO monad - work with database, network, call C libs -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Duncan Coutts

10:35 a.m.

New subject: Re[4]: ANNOUNCE: binary: high performance, pure binary serialisation

On Thu, 2007-02-01 at 10:47 +0300, Bulat Ziganshin wrote:

...

Hello Duncan,

Thursday, February 1, 2007, 3:39:16 AM, you wrote:

...
...
...
Can anyone see a real serialisation use case that needs a monad for the serialisation side? I'd thought I had an example, but I was wrong.

my program, FreeArc, has its own compression level on top of serializing - i.e. data serialized sent in 64k blocks to the C compression routine and both serialization and compression are run at the same time using threads

i mean that in real world, programs may need to do something in IO monad - work with database, network, call C libs

Most of this can be done in a modular and mostly pure way. In that example I gave, the compression function - while pure - was of course calling out to a C library. As an example of a real world program that uses this stuff and does networking, keeps a persistent store and calls C libs, see http://hpaste.org/ It uses HappS, ByteStrings and stores the pastes in compressed form on disk (using my pure zlib wrapper library). Duncan

Tomasz Zielonka

30 Jan 30 Jan

10:25 a.m.

On Tue, Jan 30, 2007 at 09:38:26AM +0000, Ross Paterson wrote:

...

On Tue, Jan 30, 2007 at 09:52:01AM +1100, Donald Bruce Stewart wrote:

...
ross:

...
why do you need a Put monad, which always seems to have the argument type ()? Monoids really are underappreciated.

For the syntax, and So that people can directly port their code from NewBinary. (The instances are basically unchanged).

And so the successor to binary must have the same interface, and so on forever. The backward compatibility argument seems weak to me, leaving only the advantage of do-notation.

How about having both interfaces, so you can use the one you like better? class Binary t where -- | Encode a value in the Put monad. put :: t -> Put put x = Put ((), build x) -- | Encode a value using the Builder monoid build :: t -> Builder build x = snd (unPut (put x)) -- | Decode a value in the Get monad get :: Get t The downside is that GHC probably wouldn't warn about undefined methods, or would it? Best regards Tomasz

Simon Marlow

1:21 p.m.

Donald Bruce Stewart wrote:

...

Binary: high performance, pure binary serialisation for Haskell ----------------------------------------------------------------------

The Binary Strike Team is pleased to announce the release of a new, pure, efficient binary serialisation library for Haskell, now available from Hackage:

tarball: http://hackage.haskell.org/cgi-bin/hackage-scripts/package/binary/0.2 darcs: darcs get http://darcs.haskell.org/binary haddocks: http://www.cse.unsw.edu.au/~dons/binary/Data-Binary.html

A little benchmark I had lying around shows that this Binary library beats the one in GHC by a factor of 2 (at least on this example): GHC's binary library (quite heavily tuned by me): Write time: 2.41 Read time: 1.44 1,312,100,072 bytes allocated in the heap 96,792 bytes copied during GC (scavenged) 744,752 bytes copied during GC (not scavenged) 32,492,592 bytes maximum residency (6 sample(s)) 2384 collections in generation 0 ( 0.01s) 6 collections in generation 1 ( 0.00s) 63 Mb total memory in use INIT time 0.00s ( 0.00s elapsed) MUT time 3.78s ( 3.84s elapsed) GC time 0.02s ( 0.02s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 3.79s ( 3.86s elapsed) Data.Binary: Write time: 0.99 Read time: 0.65 1,949,205,456 bytes allocated in the heap 204,986,944 bytes copied during GC (scavenged) 5,154,600 bytes copied during GC (not scavenged) 70,247,720 bytes maximum residency (8 sample(s)) 3676 collections in generation 0 ( 0.25s) 8 collections in generation 1 ( 0.19s) 115 Mb total memory in use INIT time 0.00s ( 0.00s elapsed) MUT time 1.08s ( 1.13s elapsed) GC time 0.44s ( 0.52s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 1.51s ( 1.65s elapsed) This example writes a lot of 'Maybe Int' values. I'm surprised by the extra heap used by Data.Binary: this was on a 64-bit machine, so Ints should have been encoded as 64 bits by both libraries. Also, the GC seems to be working quite hard with Data.Binary, I'd be interested to know why that is. Anyway, this result is good enough for me, I'd like to use Data.Binary in GHC as soon as we can. Unfortunately we have to support older compilers, so there will be some build-system issues to surmount. Also we need a way to pass state around while serialising/deserialising - what's the current plan for this? Cheers, Simon

dons＠cse.unsw.edu.au

31 Jan 31 Jan

12:13 a.m.

simonmar:

...

Donald Bruce Stewart wrote:

...
Binary: high performance, pure binary serialisation for Haskell ---------------------------------------------------------------------- The Binary Strike Team is pleased to announce the release of a new, pure, efficient binary serialisation library for Haskell, now available from Hackage:

tarball: http://hackage.haskell.org/cgi-bin/hackage-scripts/package/binary/0.2 darcs: darcs get http://darcs.haskell.org/binary haddocks: http://www.cse.unsw.edu.au/~dons/binary/Data-Binary.html

A little benchmark I had lying around shows that this Binary library beats the one in GHC by a factor of 2 (at least on this example):

Very nice. We've been benchmarking again NewBinary, for various Word-sized operations, with the following results, on x86: NewBinary, fairly tuned (lots of fastMutInt#s) 10MB of Word8 in chunks of 1: 10.68MB/s write, 9.16MB/s read 10MB of Word16 in chunks of 16: 7.89MB/s write, 6.65MB/s read 10MB of Word32 in chunks of 16: 7.99MB/s write, 7.29MB/s read 10MB of Word64 in chunks of 16: 5.10MB/s write, 5.75MB/s read Data.Binary: 10MB of Word8 in chunks of 1 ( Host endian): 11.7 MB/s write, 2.4 MB/s read 10MB of Word16 in chunks of 16 ( Host endian): 89.3 MB/s write, 3.6 MB/s read 10MB of Word16 in chunks of 16 ( Big endian): 83.3 MB/s write, 1.6 MB/s read 10MB of Word32 in chunks of 16 ( Host endian): 178.6 MB/s write, 7.2 MB/s read 10MB of Word32 in chunks of 16 ( Big endian): 156.2 MB/s write, 2.5 MB/s read 10MB of Word64 in chunks of 16 ( Host endian): 78.1 MB/s write, 11.3 MB/s read 10MB of Word64 in chunks of 16 ( Big endian): 44.6 MB/s write, 2.8 MB/s read Note that we're much faster writing, in general, but read speed lags. The 'get' monad hasn't received much attention yet, though we know what needs tuning.

...

GHC's binary library (quite heavily tuned by me):

Write time: 2.41 Read time: 1.44 1,312,100,072 bytes allocated in the heap 96,792 bytes copied during GC (scavenged) 744,752 bytes copied during GC (not scavenged) 32,492,592 bytes maximum residency (6 sample(s))

2384 collections in generation 0 ( 0.01s) 6 collections in generation 1 ( 0.00s)

63 Mb total memory in use

INIT time 0.00s ( 0.00s elapsed) MUT time 3.78s ( 3.84s elapsed) GC time 0.02s ( 0.02s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 3.79s ( 3.86s elapsed)

Data.Binary:

Write time: 0.99 Read time: 0.65 1,949,205,456 bytes allocated in the heap 204,986,944 bytes copied during GC (scavenged) 5,154,600 bytes copied during GC (not scavenged) 70,247,720 bytes maximum residency (8 sample(s))

3676 collections in generation 0 ( 0.25s) 8 collections in generation 1 ( 0.19s)

115 Mb total memory in use

INIT time 0.00s ( 0.00s elapsed) MUT time 1.08s ( 1.13s elapsed) GC time 0.44s ( 0.52s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 1.51s ( 1.65s elapsed)

This example writes a lot of 'Maybe Int' values. I'm surprised by the extra heap used by Data.Binary: this was on a 64-bit machine, so Ints should have been encoded as 64 bits by both libraries. Also, the GC seems to be working quite hard with Data.Binary, I'd be interested to know why that is.

Very interesting! Is this benchmark online? I'm a little surprised by the read times, reading is still fairly unoptimised compared to writing.

...

Anyway, this result is good enough for me, I'd like to use Data.Binary in GHC as soon as we can. Unfortunately we have to support older compilers, so there will be some build-system issues to surmount. Also we need a way to pass state around while serialising/deserialising - what's the current plan for this?

The plan was to use StateT Put or StateT Get, I think. But we don't have a demo for this yet. Duncan, Lennart, any suggestions? -- Don

6727

Age (days ago)

6733

Last active (days ago)

List overview

Download

50 comments

19 participants

participants (19)

Arie Peterson
Bjorn Bringert
Bulat Ziganshin
Chris Kuklewicz
Conrad Parker
dons＠cse.unsw.edu.au
Dougal Stanton
Duncan Coutts
Henning Thielemann
Joel Reymont
John Meacham
Malcolm Wallace
Neil Davies
Neil Mitchell
Ross Paterson
Simon Marlow
Simon Peyton-Jones
Tomasz Zielonka
Tony Finch