
I'd do it this way:
putBits :: BinHandle -> Int{-size-} -> Int{-value-} -> IO ()
and similarly for getBits. It will be easiest if the size is not allowed to go over 8, because then you have to deal with endianness, and in any case we already have put for Int16, Int32 etc. written in terms of putWord8.
Any reason not to make those Word8s instead of Ints? That way we don't have to throw errors when things are too big; additionally, Word implies more of a bit-wise tihng to me than Int, about which I always worry about whether shifts are going to mess with my sign bit, etc... I imagine 'putBits bh 3 word' means to write the 3 least significant bits of word? That is, 'putBits bh 3 (32 + 7)' is the same as 'putBits bh 3 7', right?
BinMem and BinIO differ quite a bit here: for BinMem you can write straight into the array, whereas for BinIO we need a cache - a single byte at the least, but ideally more. BinMem is the most important case to optimise (for us in GHC anyhow), since BinIO is already significantly slower due to the overhead of the Handle interface.
Currently the BinIO implementation doesn't do any caching, though, right? I actually rarely use the BinMem implementation, so I would be more interested in improving the speed of BinIO. Presumably, however, the Handle actually does buffering, which should take care of a large part of this, right?
There should really be a closeBin function too; it's quite simple to add.
On files this would flush the current byte then close the file handle, right? What would it do on BinMems (other than flush the byte)?