RE: getting a Binary module into the standard libs

> > BinMem and BinIO differ quite a bit here: for BinMem you can write > > straight into the array, whereas for BinIO we need a cache > - a single > > byte at the least, but ideally more. BinMem is the most > important case > > to optimise (for us in GHC anyhow), since BinIO is already > significantly > > slower due to the overhead of the Handle interface. > > Currently the BinIO implementation doesn't do any caching, though, > right? I actually rarely use the BinMem implementation, so I would be > more interested in improving the speed of BinIO. Presumably, > however, the > Handle actually does buffering, which should take care of a > large part of this, right? The problem is the overhead of the Handle interface. hPutChar is quite expensive, because it has to lock/unlock the Handle. Using a cache in the BinIO implementation would help a lot, but it also means that you have to keep exclusive access to the Handle while doing binary operations. There is already a similar problem, because the library caches the file pointer outside the Handle, so someone doing an hSeek on the same Handle will cause the cached version to be out of sync with the real one. Should multiple threads be allowed to access the same BinHandle simultaneously? Probably not in write mode, but it might be useful when reading. Maybe we could provide dupBin :: BinHandle -> IO BinHandle this is easy enough to implement, and the separate BinHandles could be used from different threads. > > There should really be a closeBin function too; it's quite simple to > > add. > > On files this would flush the current byte then close the file handle, > right? What would it do on BinMems (other than flush the byte)? closeBin can throw away a BinMem. If it needs to be written to a file, then we provide writeBinMem. Cheers, Simon

I was waiting a while before replying, hoping to get other comments from people who know more about this stuff than I do, but that doesn't seem to be happening. As I see it, there are three things left on the table: 1) Should putBits support >8 bit operations 2) How should we support flushByte 3) How should we buffer BinIO I'll address each of these in turn. 1) I vote in favor of "no". I fear my opinion on this is influenced by the fact that I offered to implement it. Of course you could always define putGT8Bits in terms of putLEQ8Bits, but would probably be less efficient than defining putGT8Bits "natively." I don't see a real need for it, as Simon said, most of the use for this will be for constructors and Booleans. I could probably be persuaded otherwise, but I'd like to hear a good, strong example of why more than 8 bit puts are essential. 2) The proposals for flushByte, as I see it, are: a) flushBytes h n aligns the stream to the next 2^n byte (bit?) boundary b) flushBytes h m n aligns the stream such that the position p satisfies (p = n) mod 2^m c) encoding (b) as a single integer (as per Dean's suggestion) This is something I don't really know enough about to comment. Clearly (a) is the simplest, implementation wise, and probably the fastest. (b) would be a bit more work and I don't understand what it would gain you, but since it seems to be well known I'll admit that I just know too little to say. (c) wouldn't be much more work than (b), but I wonder if it's getting too complicated. My vote is probably for (a), but my vote should only count epsilon in this context. Perhaps (b) is the right thing to do (I don't need too much convincing here). 3) I think we can all agree that we should buffer BinIOs. There are a few questions, given this: a) Should multiple threads be allowed to write the same BinHandle simultaneously? If not, is an error thrown or is the behiour just left "unspecified"? b) Should multiple threads be allowed to read from the same BinHandle simultaneously? If not, ... c) Should one thread be allowed to write and another to read from the same BH simultaneously? If not, ... I would probably say: a) No & left unspecified b) Yes c) Yes That said, we probably need a dupBin function as Simon suggests. I must say here that I don't know enough about how Handles are implemented in GHC to know where to start on this. I know that they are already MVars of Handle__s which basically hold the file pointer and some other stuff, but I don't know what would need to be done to accomplish such a dupBin function. That said, I put it out to the rest of you for comments/persuasions. - Hal

2) The proposals for flushByte, as I see it, are:
a) flushBytes h n aligns the stream to the next 2^n byte (bit?) boundary
I think this is the right one to do. It would probably only be used with n=8,16,32 but I doubt the extra generality will cost anything.
b) flushBytes h m n aligns the stream such that the position p satisfies (p = n) mod 2^m
I mentioned this style of interface but I doubt we'd need it in practice. If we do, it can always be added later as a separate function.
3) I think we can all agree that we should buffer BinIOs. There are a few questions, given this:
a) Should multiple threads be allowed to write the same BinHandle simultaneously? If not, is an error thrown or is the behiour just left "unspecified"? b) Should multiple threads be allowed to read from the same BinHandle simultaneously? If not, ... c) Should one thread be allowed to write and another to read from the same BH simultaneously? If not, ...
I believe GHC has a reader-writer lock on Handles so the answer is that one thread blocks if another is already using it in a conflicting way. Basically, I suggest doing whatever normal file Handles do.
That said, we probably need a dupBin function as Simon suggests. I must say here that I don't know enough about how Handles are implemented in GHC to know where to start on this. I know that they are already MVars of Handle__s which basically hold the file pointer and some other stuff, but I don't know what would need to be done to accomplish such a dupBin function.
Again, this should do what normal file Handles do - and code can be stolen/shared to make this work. -- Alastair

Does flushByte n flush to the next 2^n bit or byte? -- Hal Daume III "Computer science is no more about computers | hdaume@isi.edu than astronomy is about telescopes." -Dijkstra | www.isi.edu/~hdaume On 14 Nov 2002, Alastair Reid wrote:
2) The proposals for flushByte, as I see it, are:
a) flushBytes h n aligns the stream to the next 2^n byte (bit?) boundary
I think this is the right one to do. It would probably only be used with n=8,16,32 but I doubt the extra generality will cost anything.
b) flushBytes h m n aligns the stream such that the position p satisfies (p = n) mod 2^m
I mentioned this style of interface but I doubt we'd need it in practice. If we do, it can always be added later as a separate function.
3) I think we can all agree that we should buffer BinIOs. There are a few questions, given this:
a) Should multiple threads be allowed to write the same BinHandle simultaneously? If not, is an error thrown or is the behiour just left "unspecified"? b) Should multiple threads be allowed to read from the same BinHandle simultaneously? If not, ... c) Should one thread be allowed to write and another to read from the same BH simultaneously? If not, ...
I believe GHC has a reader-writer lock on Handles so the answer is that one thread blocks if another is already using it in a conflicting way.
Basically, I suggest doing whatever normal file Handles do.
That said, we probably need a dupBin function as Simon suggests. I must say here that I don't know enough about how Handles are implemented in GHC to know where to start on this. I know that they are already MVars of Handle__s which basically hold the file pointer and some other stuff, but I don't know what would need to be done to accomplish such a dupBin function.
Again, this should do what normal file Handles do - and code can be stolen/shared to make this work.
-- Alastair
participants (3)
-
Alastair Reid
-
Hal Daume III
-
Simon Marlow