
simonmar:
Donald Bruce Stewart wrote:
Binary: high performance, pure binary serialisation for Haskell ---------------------------------------------------------------------- The Binary Strike Team is pleased to announce the release of a new, pure, efficient binary serialisation library for Haskell, now available from Hackage:
tarball: http://hackage.haskell.org/cgi-bin/hackage-scripts/package/binary/0.2 darcs: darcs get http://darcs.haskell.org/binary haddocks: http://www.cse.unsw.edu.au/~dons/binary/Data-Binary.html
A little benchmark I had lying around shows that this Binary library beats the one in GHC by a factor of 2 (at least on this example):
Very nice. We've been benchmarking again NewBinary, for various Word-sized operations, with the following results, on x86: NewBinary, fairly tuned (lots of fastMutInt#s) 10MB of Word8 in chunks of 1: 10.68MB/s write, 9.16MB/s read 10MB of Word16 in chunks of 16: 7.89MB/s write, 6.65MB/s read 10MB of Word32 in chunks of 16: 7.99MB/s write, 7.29MB/s read 10MB of Word64 in chunks of 16: 5.10MB/s write, 5.75MB/s read Data.Binary: 10MB of Word8 in chunks of 1 ( Host endian): 11.7 MB/s write, 2.4 MB/s read 10MB of Word16 in chunks of 16 ( Host endian): 89.3 MB/s write, 3.6 MB/s read 10MB of Word16 in chunks of 16 ( Big endian): 83.3 MB/s write, 1.6 MB/s read 10MB of Word32 in chunks of 16 ( Host endian): 178.6 MB/s write, 7.2 MB/s read 10MB of Word32 in chunks of 16 ( Big endian): 156.2 MB/s write, 2.5 MB/s read 10MB of Word64 in chunks of 16 ( Host endian): 78.1 MB/s write, 11.3 MB/s read 10MB of Word64 in chunks of 16 ( Big endian): 44.6 MB/s write, 2.8 MB/s read Note that we're much faster writing, in general, but read speed lags. The 'get' monad hasn't received much attention yet, though we know what needs tuning.
GHC's binary library (quite heavily tuned by me):
Write time: 2.41 Read time: 1.44 1,312,100,072 bytes allocated in the heap 96,792 bytes copied during GC (scavenged) 744,752 bytes copied during GC (not scavenged) 32,492,592 bytes maximum residency (6 sample(s))
2384 collections in generation 0 ( 0.01s) 6 collections in generation 1 ( 0.00s)
63 Mb total memory in use
INIT time 0.00s ( 0.00s elapsed) MUT time 3.78s ( 3.84s elapsed) GC time 0.02s ( 0.02s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 3.79s ( 3.86s elapsed)
Data.Binary:
Write time: 0.99 Read time: 0.65 1,949,205,456 bytes allocated in the heap 204,986,944 bytes copied during GC (scavenged) 5,154,600 bytes copied during GC (not scavenged) 70,247,720 bytes maximum residency (8 sample(s))
3676 collections in generation 0 ( 0.25s) 8 collections in generation 1 ( 0.19s)
115 Mb total memory in use
INIT time 0.00s ( 0.00s elapsed) MUT time 1.08s ( 1.13s elapsed) GC time 0.44s ( 0.52s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 1.51s ( 1.65s elapsed)
This example writes a lot of 'Maybe Int' values. I'm surprised by the extra heap used by Data.Binary: this was on a 64-bit machine, so Ints should have been encoded as 64 bits by both libraries. Also, the GC seems to be working quite hard with Data.Binary, I'd be interested to know why that is.
Very interesting! Is this benchmark online? I'm a little surprised by the read times, reading is still fairly unoptimised compared to writing.
Anyway, this result is good enough for me, I'd like to use Data.Binary in GHC as soon as we can. Unfortunately we have to support older compilers, so there will be some build-system issues to surmount. Also we need a way to pass state around while serialising/deserialising - what's the current plan for this?
The plan was to use StateT Put or StateT Get, I think. But we don't have a demo for this yet. Duncan, Lennart, any suggestions? -- Don