RE: getting a Binary module into the standard libs

I was wondering if it was on the list of things to do to get a Binary module into the standard libraries. I know SimonM has a version for GHC and there's an NHC version (I think the original). I don't know about Hugs.
I ask because by putting it in the standard libs, library developers could feel more pressured to release their data structures with Binary instances.
Indeed. The only reason I didn't put my version into the libraries yet was because it differed somewhat from the NHC version, and I thought it would be a good idea to discuss what the interface should look like first. FYI, the main differences between GHC's Binary library and NHC's are described below. Keep in mind that GHC's Binary library is heavily tuned for speed, because we use it for reading/writing interface files. GHC's Binary class: class Binary a where put_ :: BinHandle -> a -> IO () put :: BinHandle -> a -> IO (Bin a) get :: BinHandle -> IO a NHC's Binary class: class Binary a where put :: BinHandle -> a -> IO (Bin a) get :: BinHandle -> IO a getF :: BinHandle -> Bin a -> (a, Bin b) putAt :: BinHandle -> Bin a -> a -> IO () getAt :: BinHandle -> Bin a -> IO a getFAt :: BinHandle -> Bin a -> a - For GHC, I added the put_ method. The reason is efficiency: you can often write a tail-recursive definition of put_, but not put, and one rarely needs the return value of put (I found). Each function has a default definition defined in terms of the other (in fact, I think I use put_ exclusively in GHC, and put could be taken out of the class altogether). - For GHC, I didn't implement getF. Instead, I have explicit lazyGet and lazyPut operations, to give me more control over the laziness: I only want laziness in a few well-defined places. - I implemented putAt and getAt as functions rather than class methods. There are lots of instances of Binary, so you save a few dictionary fields, and I didn't come across a case where I needed to override either of these. - GHC's library also works in terms of bytes rather than bits, again for efficiency (time over space). There are putByte and getByte functions for writing your own instances of Binary, whereas NHC has putBits and getBits. There are more differences in the rest of the interface, but these are the most fundamental ones. Cheers, Simon

[copied to Malcolm]
From the attached mail, it sounds like Simon has made some worthwhile additions to the Binary interface but left out a few things. The only omission that seems fundamental is that Simon's version supports reading/writng bytes whilst Malcolm's supports reading/writing bits.
Malcolm: - How important is this? - Assuming that supporting bits slows down the whole interface, is there a cunning implementation trick which would have very low overhead if you're doing a byte-aligned read/write (e.g., if all previous reads/writes has been multiples of bytes)? - Or, would it be appropriate to build one as a layer on top of the other so that programmers can express their choice by using one type or another. (I suggest a layered approach in the hope that this would lead to more code sharing, reduce tendency for API divergence, etc. but I have no concrete thought on what a layered approach might look like. -- Alastair
I was wondering if it was on the list of things to do to get a Binary module into the standard libraries. I know SimonM has a version for GHC and there's an NHC version (I think the original). I don't know about Hugs.
I ask because by putting it in the standard libs, library developers could feel more pressured to release their data structures with Binary instances.
Indeed. The only reason I didn't put my version into the libraries yet was because it differed somewhat from the NHC version, and I thought it would be a good idea to discuss what the interface should look like first.
FYI, the main differences between GHC's Binary library and NHC's are described below. Keep in mind that GHC's Binary library is heavily tuned for speed, because we use it for reading/writing interface files.
GHC's Binary class:
class Binary a where put_ :: BinHandle -> a -> IO () put :: BinHandle -> a -> IO (Bin a) get :: BinHandle -> IO a
NHC's Binary class:
class Binary a where put :: BinHandle -> a -> IO (Bin a) get :: BinHandle -> IO a getF :: BinHandle -> Bin a -> (a, Bin b)
putAt :: BinHandle -> Bin a -> a -> IO () getAt :: BinHandle -> Bin a -> IO a getFAt :: BinHandle -> Bin a -> a
- For GHC, I added the put_ method. The reason is efficiency: you can often write a tail-recursive definition of put_, but not put, and one rarely needs the return value of put (I found). Each function has a default definition defined in terms of the other (in fact, I think I use put_ exclusively in GHC, and put could be taken out of the class altogether).
- For GHC, I didn't implement getF. Instead, I have explicit lazyGet and lazyPut operations, to give me more control over the laziness: I only want laziness in a few well-defined places.
- I implemented putAt and getAt as functions rather than class methods. There are lots of instances of Binary, so you save a few dictionary fields, and I didn't come across a case where I needed to override either of these.
- GHC's library also works in terms of bytes rather than bits, again for efficiency (time over space). There are putByte and getByte functions for writing your own instances of Binary, whereas NHC has putBits and getBits.
There are more differences in the rest of the interface, but these are the most fundamental ones.
Cheers, Simon

Alastair Reid
From the attached mail, it sounds like Simon has made some worthwhile additions to the Binary interface but left out a few things. The only omission that seems fundamental is that Simon's version supports reading/writng bytes whilst Malcolm's supports reading/writing bits.
That does seem to be the main difference.
- How important is this?
The motivation for my Binary library was to save space, whereas Simon's motivation was to be fast (at any rate to be faster than parsing text). Thus, a bitstream is potentially far more compact than a bytestream, depending of course on the natural size of the objects to be serialised. But the tradeoff is that a bytestream is far quicker to build/read, because there is no tricky logic required to ensure that bits are shifted to the right place etc. Different applications will require different characteristics. There is no one-size-fits-all.
- Assuming that supporting bits slows down the whole interface, is there a cunning implementation trick which would have very low overhead if you're doing a byte-aligned read/write (e.g., if all previous reads/writes has been multiples of bytes)?
Well, with a bit-stream implementation you need to test whether a read/write is fully aligned (i.e. both that the buffer position is aligned to an appropriate boundary, and that the data to be added/read is over exactly the right size to take you to another boundary), but after that it should be just the same speed as if you read/write the bytes directly. So the question is really how efficient is the test in terms relative to the actual read/write.
- Or, would it be appropriate to build one as a layer on top of the other so that programmers can express their choice by using one type or another.
Yes, it is possible that there is a suitable separation (using MPTC no doubt) to allow the choice of either bit-wise or byte-wise (perhaps even Word16 or Word32) implementations of the same basic interface. Something like class Binary impl a where ... data BitStream data ByteStream instance Binary BitStream Bool where ... instance Binary ByteStream Bool where ... An alternative would be to provide (in a single class) both the original bit-wise ops, and in addition, the byte-aligned "fast-entry-point" methods, so for example you could mix the two, perhaps requiring the use of some operation like "alignBuffer" when you switch from one style to the other. Regards, Malcolm
participants (3)
-
Alastair Reid
-
Malcolm Wallace
-
Simon Marlow