
Hi, another update wrt speed of bit writes; it makes me think it might not be worthwhile to split the instances or anything like that. I'm still looking for some comments though (hint hint). A small program writes 1 million maybe ints to a binmem, 1/5 of which are nothings, 4/5s of which are just ints. Note that both the nothing and the just int take 1 bit (mod 1 byte), so the bit offset is increasing by one every write, independent of which we're writing. the best case in terms of speed occurs when we write just int and we're already written 7 bits (so writing the int doesn't have to much around with bit stuff). Anyway, here are the speed results, averaged over 3 runs, with optimization on and off: Optimization level 0 level 2 ------- ------- bits read 2446 195 write 23157 1729 bytes read 1822 177 write 13769 1563 So, moral #1 is: never use binary without optimization or you will be waiting a long time. Moral #2 is that when optimization is on, bit-based writes take about 11% longer to read and write. We can pretty easily calculate the space savings, directly. An Int is 4 bytes, we write 800000 of them = 3200000 bytes. When we use bits, we also write 1 million bits = 125000 bytes. When we use bytes, we also write 1 million bytes. Thus, the difference is: bytes: 4200000 bits: - 1125000 --------- 3075000 byte savings using bits so using bits we save about 26% space (for writing Maybe Int, as always YMMV). So again, the question is how should we divide the bit-based instances and the byte-based instances? - Hal -- Hal Daume III "Computer science is no more about computers | hdaume@isi.edu than astronomy is about telescopes." -Dijkstra | www.isi.edu/~hdaume On Tue, 10 Dec 2002, Hal Daume III wrote:
Hi again all,
I've got a somewhat functioning new Binary module for GHC which supports bit writes. I've written a small test script to make sure simple things work...it essentially looks like this:
data BinItem = ByteItem Word8 | BitItem Int Word8 -- num_bits bits | FlushItem -- flush byte | forall a . (Eq a, Binary a, Show a) => BinaryItem a -- some arbi trary data | forall a . (Eq a, Show a) => PutItem a (BinHandle -> a -> IO ()) (BinHandle -> IO a)
on which Eq is defined (I use unsafePerformIO to cast the forall'd elements).
then we have test functions which essentially open up a binMem or a binIO and write a list of BinItems to it, then read them back. We can then check if what we read back is the same as what we wrote.
The tests I have run are:
byteTest = map ByteItem [1,2,3,4,5]
bitTest1 = [BitItem 3 6, FlushItem] bitTest2 = [BitItem 3 6, BitItem 4 9, BitItem 1 0, FlushItem] bitTest3 = [BitItem 6 10, BitItem 6 10, FlushItem]
flushTest1 = [BitItem 3 6, FlushItem, BitItem 4 9, BitItem 1 0, FlushItem] flushTest2 = [ByteItem 1, FlushItem, FlushItem, FlushItem, FlushItem, ByteItem 2 , FlushItem]
comboTest1 = [ByteItem 5, BitItem 3 6, FlushItem, ByteItem 9, BitItem 4 9, BitIt em 1 0, FlushItem, ByteItem 84, BitItem 3 2, FlushItem] comboTest2 = [ByteItem 5, BitItem 3 6, ByteItem 9, BitItem 7 9, BitItem 4 0, Byt eItem 84, BitItem 3 2, FlushItem]
maybeTest1 = [BinaryItem (Just 5::Maybe Int), BinaryItem (Nothing::Maybe Int), B inaryItem (Just 0::Maybe Int), BinaryItem (Just 1::Maybe Int), BinaryItem (Nothi ng :: Maybe Int), FlushItem] maybeTest2 = map (\i -> PutItem i putMaybeInt getMaybeInt) [Just 5, Nothing, Jus t 0, Just 1, Nothing] ++ [FlushItem]
Which all work fine. putMaybeInt uses a bit to put the Just/Nothing, while the normal Binary instance for Maybe uses a byte.
...this brings up another question...since we don't have named instances, I'm thinking of separating the actual instance definitions into two modules, one which uses bit-based instances and one which uses byte-based instances. You could then import whichever makes more sense for your application. The problem with this is that sometimes it might make sense to use bytes in one aspect and bits in another.
Another solution would be to extend the Binary class to:
class Binary a where put_ :: BinHandle -> a -> IO () put :: BinHandle -> a -> IO (Bin a) get :: BinHandle -> IO a putWithBits_ :: BinHandle -> a -> IO () putWithBits :: BinHandle -> a -> IO (Bin a) getWithBits :: BinHandle -> IO a
so that each instance specifies how to write itself both with bits and bytes. of course they could default off eachother, so the user only has to write one put and one get if they're lazy.
Thoughts?
- Hal
_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries