RE: getting a Binary module into the standard libs

(I'll try to do some perf measurements on BinIO vs. BinMem later today, that should give us a rough idea).
I did a rough test using GHC's current Binary library hacked to be standalone. The test was writing 100000 Ints, followed by reading back 100000 Ints. Results: using BinMem took about 0.25 secs, BinIO took about 1.1 secs. That is, BinMem is current 4-5 times faster than BinIO. I should stress that this isn't a realistic benchmark. I conjecture the difference is mostly due to the overhead of going through the IO library each time a character is written. When I wrote the Binary library I was pretty careful to make sure that putWord8/getWord8 on a BinMem was just straight-line code (that's what the FastMutInt stuff is for). The upshot is that if you want really good performance for BinIO, there will have to be a cache in the BinHandle. Cheers, Simon

Hi again all, I've got a somewhat functioning new Binary module for GHC which supports bit writes. I've written a small test script to make sure simple things work...it essentially looks like this: data BinItem = ByteItem Word8 | BitItem Int Word8 -- num_bits bits | FlushItem -- flush byte | forall a . (Eq a, Binary a, Show a) => BinaryItem a -- some arbi trary data | forall a . (Eq a, Show a) => PutItem a (BinHandle -> a -> IO ()) (BinHandle -> IO a) on which Eq is defined (I use unsafePerformIO to cast the forall'd elements). then we have test functions which essentially open up a binMem or a binIO and write a list of BinItems to it, then read them back. We can then check if what we read back is the same as what we wrote. The tests I have run are: byteTest = map ByteItem [1,2,3,4,5] bitTest1 = [BitItem 3 6, FlushItem] bitTest2 = [BitItem 3 6, BitItem 4 9, BitItem 1 0, FlushItem] bitTest3 = [BitItem 6 10, BitItem 6 10, FlushItem] flushTest1 = [BitItem 3 6, FlushItem, BitItem 4 9, BitItem 1 0, FlushItem] flushTest2 = [ByteItem 1, FlushItem, FlushItem, FlushItem, FlushItem, ByteItem 2 , FlushItem] comboTest1 = [ByteItem 5, BitItem 3 6, FlushItem, ByteItem 9, BitItem 4 9, BitIt em 1 0, FlushItem, ByteItem 84, BitItem 3 2, FlushItem] comboTest2 = [ByteItem 5, BitItem 3 6, ByteItem 9, BitItem 7 9, BitItem 4 0, Byt eItem 84, BitItem 3 2, FlushItem] maybeTest1 = [BinaryItem (Just 5::Maybe Int), BinaryItem (Nothing::Maybe Int), B inaryItem (Just 0::Maybe Int), BinaryItem (Just 1::Maybe Int), BinaryItem (Nothi ng :: Maybe Int), FlushItem] maybeTest2 = map (\i -> PutItem i putMaybeInt getMaybeInt) [Just 5, Nothing, Jus t 0, Just 1, Nothing] ++ [FlushItem] Which all work fine. putMaybeInt uses a bit to put the Just/Nothing, while the normal Binary instance for Maybe uses a byte. ...this brings up another question...since we don't have named instances, I'm thinking of separating the actual instance definitions into two modules, one which uses bit-based instances and one which uses byte-based instances. You could then import whichever makes more sense for your application. The problem with this is that sometimes it might make sense to use bytes in one aspect and bits in another. Another solution would be to extend the Binary class to: class Binary a where put_ :: BinHandle -> a -> IO () put :: BinHandle -> a -> IO (Bin a) get :: BinHandle -> IO a putWithBits_ :: BinHandle -> a -> IO () putWithBits :: BinHandle -> a -> IO (Bin a) getWithBits :: BinHandle -> IO a so that each instance specifies how to write itself both with bits and bytes. of course they could default off eachother, so the user only has to write one put and one get if they're lazy. Thoughts? - Hal

Hi, another update wrt speed of bit writes; it makes me think it might not be worthwhile to split the instances or anything like that. I'm still looking for some comments though (hint hint). A small program writes 1 million maybe ints to a binmem, 1/5 of which are nothings, 4/5s of which are just ints. Note that both the nothing and the just int take 1 bit (mod 1 byte), so the bit offset is increasing by one every write, independent of which we're writing. the best case in terms of speed occurs when we write just int and we're already written 7 bits (so writing the int doesn't have to much around with bit stuff). Anyway, here are the speed results, averaged over 3 runs, with optimization on and off: Optimization level 0 level 2 ------- ------- bits read 2446 195 write 23157 1729 bytes read 1822 177 write 13769 1563 So, moral #1 is: never use binary without optimization or you will be waiting a long time. Moral #2 is that when optimization is on, bit-based writes take about 11% longer to read and write. We can pretty easily calculate the space savings, directly. An Int is 4 bytes, we write 800000 of them = 3200000 bytes. When we use bits, we also write 1 million bits = 125000 bytes. When we use bytes, we also write 1 million bytes. Thus, the difference is: bytes: 4200000 bits: - 1125000 --------- 3075000 byte savings using bits so using bits we save about 26% space (for writing Maybe Int, as always YMMV). So again, the question is how should we divide the bit-based instances and the byte-based instances? - Hal -- Hal Daume III "Computer science is no more about computers | hdaume@isi.edu than astronomy is about telescopes." -Dijkstra | www.isi.edu/~hdaume On Tue, 10 Dec 2002, Hal Daume III wrote:
Hi again all,
I've got a somewhat functioning new Binary module for GHC which supports bit writes. I've written a small test script to make sure simple things work...it essentially looks like this:
data BinItem = ByteItem Word8 | BitItem Int Word8 -- num_bits bits | FlushItem -- flush byte | forall a . (Eq a, Binary a, Show a) => BinaryItem a -- some arbi trary data | forall a . (Eq a, Show a) => PutItem a (BinHandle -> a -> IO ()) (BinHandle -> IO a)
on which Eq is defined (I use unsafePerformIO to cast the forall'd elements).
then we have test functions which essentially open up a binMem or a binIO and write a list of BinItems to it, then read them back. We can then check if what we read back is the same as what we wrote.
The tests I have run are:
byteTest = map ByteItem [1,2,3,4,5]
bitTest1 = [BitItem 3 6, FlushItem] bitTest2 = [BitItem 3 6, BitItem 4 9, BitItem 1 0, FlushItem] bitTest3 = [BitItem 6 10, BitItem 6 10, FlushItem]
flushTest1 = [BitItem 3 6, FlushItem, BitItem 4 9, BitItem 1 0, FlushItem] flushTest2 = [ByteItem 1, FlushItem, FlushItem, FlushItem, FlushItem, ByteItem 2 , FlushItem]
comboTest1 = [ByteItem 5, BitItem 3 6, FlushItem, ByteItem 9, BitItem 4 9, BitIt em 1 0, FlushItem, ByteItem 84, BitItem 3 2, FlushItem] comboTest2 = [ByteItem 5, BitItem 3 6, ByteItem 9, BitItem 7 9, BitItem 4 0, Byt eItem 84, BitItem 3 2, FlushItem]
maybeTest1 = [BinaryItem (Just 5::Maybe Int), BinaryItem (Nothing::Maybe Int), B inaryItem (Just 0::Maybe Int), BinaryItem (Just 1::Maybe Int), BinaryItem (Nothi ng :: Maybe Int), FlushItem] maybeTest2 = map (\i -> PutItem i putMaybeInt getMaybeInt) [Just 5, Nothing, Jus t 0, Just 1, Nothing] ++ [FlushItem]
Which all work fine. putMaybeInt uses a bit to put the Just/Nothing, while the normal Binary instance for Maybe uses a byte.
...this brings up another question...since we don't have named instances, I'm thinking of separating the actual instance definitions into two modules, one which uses bit-based instances and one which uses byte-based instances. You could then import whichever makes more sense for your application. The problem with this is that sometimes it might make sense to use bytes in one aspect and bits in another.
Another solution would be to extend the Binary class to:
class Binary a where put_ :: BinHandle -> a -> IO () put :: BinHandle -> a -> IO (Bin a) get :: BinHandle -> IO a putWithBits_ :: BinHandle -> a -> IO () putWithBits :: BinHandle -> a -> IO (Bin a) getWithBits :: BinHandle -> IO a
so that each instance specifies how to write itself both with bits and bytes. of course they could default off eachother, so the user only has to write one put and one get if they're lazy.
Thoughts?
- Hal
_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

Hal Daume
Hi, another update wrt speed of bit writes; it makes me think it might not be worthwhile to split the instances or anything like that. I'm still looking for some comments though (hint hint).
If the performance difference is just 11%, it sounds like it's best to keep the instances combined. (I'm assuming both versions were carefully written for speed - seems like a safe assumption given the author!) The speed difference is likely to continue as the speed difference between processors,memory,etc. and disks,system calls,etc. continues to grow. -- Alastair Reid alastair@reid-consulting-uk.ltd.uk Reid Consulting (UK) Limited http://www.reid-consulting-uk.ltd.uk/alastair/

Alastair Reid
The speed difference is likely to continue as the speed difference between processors,memory,etc. and disks,system calls,etc. continues to grow.
What I meant of course is: The performance difference between bit and word writes is likely to drop as the gap between processors,memory,etc. and disks,system calls,etc. continues to grow. -- Alastair

I tend to agree with Alastair. I just reran the experiments with BinIOs instead of BinMems to see if there was a greater difference (I expect a much smaller difference, in fact, due to the high disk overhead and the fact that putChar is slow). I didn't both with -O0, but with -O2, we get: bits read 2156 write 2639 bytes read 1617 write 2078 frankly, this shocks me. reading using bits is 33% slower; writing is about 27% slower. i actually expected the bits version to be *faster* (relatively, of course) here, due to the fact that we have much smaller files (the byte-based file is 4200000 bytes while the bit-based file is 3325000 bytes). With a 33% difference, I think I might want separate instances again :). I'm having great difficulty explaining to myself why there would be such a discrepancy. Anyone care to offer an idea? - Hal On 18 Dec 2002, Alastair Reid wrote:
Hal Daume
writes: Hi, another update wrt speed of bit writes; it makes me think it might not be worthwhile to split the instances or anything like that. I'm still looking for some comments though (hint hint).
If the performance difference is just 11%, it sounds like it's best to keep the instances combined. (I'm assuming both versions were carefully written for speed - seems like a safe assumption given the author!) The speed difference is likely to continue as the speed difference between processors,memory,etc. and disks,system calls,etc. continues to grow.
-- Alastair Reid alastair@reid-consulting-uk.ltd.uk Reid Consulting (UK) Limited http://www.reid-consulting-uk.ltd.uk/alastair/
_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries
participants (3)
-
Alastair Reid
-
Hal Daume III
-
Simon Marlow