ANN: A triple of new packages for talking to the outside world

Over the Xmas break I made some headway on writing an RPC package since many of the ideas that I want to play with involve such a thing as a basic building block. However, some might find some of the spin-off packages that I wrote along the way useful. * binary-strict: This is mostly a cut-n-paste job from the excellent binary package which provides Data.Binary.Strict.Get - a monad which is a drop in replacement for Get, but which parses strict ByteStrings and returns an Either, thus one can parse binary data without triggering an exception on failure. * codec-libevent: This parses the "tagging" IDL files introduced in libevent[1] 1.4.0-beta and later and can generate the Haskell code for them. These are platform-independent "structs" (in the C sense) and libevent provides the complementary C infrastructure for them. You should be able to understand almost everything these is to know about them from a quick example: struct msg { string from_name = 1; string to_name = 2; optional struct[kill] attack = 3; array struct[run] run = 4; } [1] http://monkey.org/~provos/libevent/ * control-timeout: This module provides simple timeouts - e.g. actions which occur after a given number of seconds and which can be canceled. It does it in a not-totally-stupid fashion so that you don't need to worry about setting hundreds of them. AGL -- Adam Langley agl@imperialviolet.org http://www.imperialviolet.org 650-283-9641

Adam Langley wrote:
This is mostly a cut-n-paste job from the excellent binary package which provides Data.Binary.Strict.Get - a monad which is a drop in replacement for Get, but which parses strict ByteStrings and returns an Either,
Ooh, nice. We could really do with an incremental version, too, which could be spoonfed chunks of bytes, and dole out values as deserialisation completes. Passing back a Left String is in some sense not much of an improvement over calling error, as if I merely doesn't have enough bytes accumulated yet, I can't restart parsing from the point the bytes ran out. Any chance you'd like to hand back a continuation instead?

On Jan 6, 2008 9:13 PM, Bryan O'Sullivan
Ooh, nice. We could really do with an incremental version, too, which could be spoonfed chunks of bytes, and dole out values as deserialisation completes.
Passing back a Left String is in some sense not much of an improvement over calling error, as if I merely doesn't have enough bytes accumulated yet, I can't restart parsing from the point the bytes ran out. Any chance you'd like to hand back a continuation instead?
The Left String was just a way to return an error string, otherwise a parse failure can be a little oblique. I think an incremental parser would be sufficiently different that it would be a different monad. Of course, the code duplication (again) makes me a sad panda, but avoiding that involves a lot of work on the binary package and the maintainers were uninterested last time I tried that. It would seem that there would be three possible outcomes from an incremental Get: - Failure: some bitstreams are just invalid and no amount of extra data will ever fix that - Complete [Result]: the last chunk of data has been processed. Maybe this should also include the remainder of the data? - Partial Cont [Result]: needs more data, but here's a (possibly empty) list of results so far. Applying a ByteString to the Cont from the Partial result would result in one of the same three outcomes etc. What do you think of that? It wouldn't be too hard to make up something like that using ContT if it would be useful to you. AGL -- Adam Langley agl@imperialviolet.org http://www.imperialviolet.org 650-283-9641

It would seem that there would be three possible outcomes from an incremental Get: - Failure: some bitstreams are just invalid and no amount of extra data will ever fix that - Complete [Result]: the last chunk of data has been processed. Maybe this should also include the remainder of the data? - Partial Cont [Result]: needs more data, but here's a (possibly empty) list of results so far.
Yes, that's more or less exactly what I had in mind.

On Jan 7, 2008 11:17 AM, Bryan O'Sullivan
It would seem that there would be three possible outcomes from an incremental Get: - Failure: some bitstreams are just invalid and no amount of extra data will ever fix that - Complete [Result]: the last chunk of data has been processed. Maybe this should also include the remainder of the data? - Partial Cont [Result]: needs more data, but here's a (possibly empty) list of results so far.
Yes, that's more or less exactly what I had in mind.
Ok, see http://www.imperialviolet.org/IncrementalGet.hs Hot off the keyboard and lacking documentation and comments ;) There's a small test function at the bottom which demonstrates an infinite parser which extracts 16-bit ints from whatever you feed it. Questions: 1) Should Finished include the remainder of the ByteString (e.g. that which wasn't used by that parser) 2) I've no idea what I've done to the parse speed 3) I removed all the lookahead functions because, if the look ahead runs out of data it's not clean what should be done. We can request more with a partial result, but if the user doesn't have any more data to give there's no way, currently, to signal this and cause the lookahead to backtrack. Also, if we let the lookahead request more data via a Partial, then there's no way for the user to tell that Partial apart from one which has "real" values. But if this is useful to you, make any requests. I'll (hopefully) do them, clean it up and push a new release of binary-strict. AGL -- Adam Langley agl@imperialviolet.org http://www.imperialviolet.org 650-283-9641

Adam Langley wrote:
That's excellent! This is just the sort of thing one wants if getting dribs and drabs of information instead of a steady stream. For example, I need to reconstruct TCP streams from individual packets captured off the wire, and this is a much easier mechanism to use than playing tricks with the direct-mode Get monad.
Questions: 1) Should Finished include the remainder of the ByteString (e.g. that which wasn't used by that parser)
Yes, definitely. I had to add a runGetState to the existing Get monad so that I could recover the unparsed residual, so I'm sure it will be necessary here.
2) I've no idea what I've done to the parse speed
Getting the API right is the appropriate thing to be doing first. Afterwards, the rewrite rule ninjas can stage a night attack on performance problems.
But if this is useful to you, make any requests. I'll (hopefully) do them, clean it up and push a new release of binary-strict.
I'm lobbying for Don and company to include this stuff in the regular binary distribution. A proliferation of almost-identical packages doesn't serve the community all that well. Thanks for the nice work! I'll try to put that code to use in perhaps a few days, and let you know how the API works out in practice.

bos:
Adam Langley wrote:
That's excellent! This is just the sort of thing one wants if getting dribs and drabs of information instead of a steady stream. For example, I need to reconstruct TCP streams from individual packets captured off the wire, and this is a much easier mechanism to use than playing tricks with the direct-mode Get monad.
Questions: 1) Should Finished include the remainder of the ByteString (e.g. that which wasn't used by that parser)
Yes, definitely. I had to add a runGetState to the existing Get monad so that I could recover the unparsed residual, so I'm sure it will be necessary here.
2) I've no idea what I've done to the parse speed
Getting the API right is the appropriate thing to be doing first. Afterwards, the rewrite rule ninjas can stage a night attack on performance problems.
Yeah, I'm happy to attack that problem. The key is finding a useful API that you actually can work with for real problems. This looks like a good step -- and interesting to see how you're needs differed from the original intent.
But if this is useful to you, make any requests. I'll (hopefully) do them, clean it up and push a new release of binary-strict.
I'm lobbying for Don and company to include this stuff in the regular binary distribution. A proliferation of almost-identical packages doesn't serve the community all that well.
I'm happy to -- if people need a slightly different API, we should do what we can to support that. The goal is to make binary hacking in Haskell painless, so anything that helps that goal, is good :) -- Don

Adam Langley
But if this is useful to you, make any requests. I'll (hopefully) do them, clean it up and push a new release of binary-strict.
How difficult would it be to have a getBits functions as well as a getBytes? That would allow me drop the dependency on NewBinary in the ASN.1 package. Thanks, Dominic.

On Wed, 2008-01-09 at 09:26 +0000, Dominic Steinitz wrote:
Adam Langley
writes: But if this is useful to you, make any requests. I'll (hopefully) do them, clean it up and push a new release of binary-strict.
How difficult would it be to have a getBits functions as well as a getBytes? That would allow me drop the dependency on NewBinary in the ASN.1 package.
The difficulty is in deciding what the api should be. Does it give you a real bitstream or only a byte aligned one? If I ask for 3 bits then 15 bytes what does it do? Does it assume I meant 3 bits, then pad to the next byte boundary and get 15 bytes, or does it mean get 15 bytes but at this 3 bit shift offset? Duncan

Duncan Coutts
On Wed, 2008-01-09 at 09:26 +0000, Dominic Steinitz wrote:
Adam Langley
writes: But if this is useful to you, make any requests. I'll (hopefully) do them, clean it up and push a new release of binary-strict.
How difficult would it be to have a getBits functions as well as a getBytes? That would allow me drop the dependency on NewBinary in the ASN.1 package.
The difficulty is in deciding what the api should be. Does it give you a real bitstream or only a byte aligned one? If I ask for 3 bits then 15 bytes what does it do? Does it assume I meant 3 bits, then pad to the next byte boundary and get 15 bytes, or does it mean get 15 bytes but at this 3 bit shift offset?
Duncan
I'd suggest an aligned and unaligned api. So the aligned api would get 3 bits and the 15 bytes would start from the next byte boundary. The unaligned api would get 3 bits and the 15 bytes (=15 x 8 bits) would finish still with an offset of 3. Dominic.

On Jan 9, 2008 10:10 AM, Dominic Steinitz
Duncan Coutts
writes: The difficulty is in deciding what the api should be. Does it give you a real bitstream or only a byte aligned one? If I ask for 3 bits then 15 bytes what does it do? Does it assume I meant 3 bits, then pad to the next byte boundary and get 15 bytes, or does it mean get 15 bytes but at this 3 bit shift offset?
I'd suggest an aligned and unaligned api.
So the aligned api would get 3 bits and the 15 bytes would start from the next byte boundary.
The unaligned api would get 3 bits and the 15 bytes (=15 x 8 bits) would finish still with an offset of 3.
Do you mean we'd have an unalignedGetBytes as well as getBytes (which would remain aligned)? That would make sense, but it would seem a bit heavy to duplicate all of the binary API. David

"David Roundy"
On Jan 9, 2008 10:10 AM, Dominic Steinitz
wrote: Duncan Coutts
writes: The difficulty is in deciding what the api should be. Does it give you a real bitstream or only a byte aligned one? If I ask for 3 bits then 15 bytes what does it do? Does it assume I meant 3 bits, then pad to the next byte boundary and get 15 bytes, or does it mean get 15 bytes but at this 3 bit shift offset?
I'd suggest an aligned and unaligned api.
So the aligned api would get 3 bits and the 15 bytes would start from the next byte boundary.
The unaligned api would get 3 bits and the 15 bytes (=15 x 8 bits) would finish still with an offset of 3.
Do you mean we'd have an unalignedGetBytes as well as getBytes (which would remain aligned)? That would make sense, but it would seem a bit heavy to duplicate all of the binary API.
getBytes per default unaligned and additionally snapToNextByte? -- (c) this sig last receiving data processing entity. Inspect headers for past copyright information. All rights reserved. Unauthorised copying, hiring, renting, public performance and/or broadcasting of this signature prohibited.

On Wed, Jan 09, 2008 at 11:43:52PM +0100, Achim Schneider wrote:
"David Roundy"
wrote: On Jan 9, 2008 10:10 AM, Dominic Steinitz
wrote: Duncan Coutts
writes: The difficulty is in deciding what the api should be. Does it give you a real bitstream or only a byte aligned one? If I ask for 3 bits then 15 bytes what does it do? Does it assume I meant 3 bits, then pad to the next byte boundary and get 15 bytes, or does it mean get 15 bytes but at this 3 bit shift offset?
I'd suggest an aligned and unaligned api.
So the aligned api would get 3 bits and the 15 bytes would start from the next byte boundary.
The unaligned api would get 3 bits and the 15 bytes (=15 x 8 bits) would finish still with an offset of 3.
Do you mean we'd have an unalignedGetBytes as well as getBytes (which would remain aligned)? That would make sense, but it would seem a bit heavy to duplicate all of the binary API.
getBytes per default unaligned and additionally snapToNextByte?
But I can't imagine an implementation in which this change wouldn't slow down getBytes for the normal case. Perhaps the slowdown would be small, but it seems unwise to enforce that slowness at the API level, when we've already got a perfectly good API for fast binary IO. Maybe there's some type hackery you could do to avoid a speed penalty, but that's a lot to add for a somewhat dubious benefit. Note that you could get a similar effect with getBytes always aligned, and an additional function shiftToByte which takes the remainder of the input and bitshifts everything so that the current read pointer is on a byte boundary. Obviously this would be an O(N) operation (where N is the remainder of the input), which could certainly be a problem. Another option, I suppose, would be to introduce a type class for bytewise-reading monads. That'd be a type hack, but not such a bad one. Then you could have the efficient implementation, and one that allows bitwise reading, and there could also be a function that allows bitwise parsing of a chunk of a byte-aligned data. -- David Roundy Department of Physics Oregon State University

On Jan 9, 2008 5:01 PM, David Roundy
But I can't imagine an implementation in which this change wouldn't slow down getBytes for the normal case. Perhaps the slowdown would be small, but it seems unwise to enforce that slowness at the API level, when we've already got a perfectly good API for fast binary IO. Maybe there's some type hackery you could do to avoid a speed penalty, but that's a lot to add for a somewhat dubious benefit.
I believe that it would be an additional if statement in the fast path at least. How about a BitGet monad which get be run in the Get monad?
test :: Get () test = do runBitGet 2 (do getBitField 2)
So the first argument to runBitGet is the number of bytes to parse for bit fields and then functions in BitGet can extract bit-length ints etc. Anyone like that idea? AGL -- Adam Langley agl@imperialviolet.org http://www.imperialviolet.org 650-283-9641

On Jan 9, 2008 8:25 PM, Adam Langley
I believe that it would be an additional if statement in the fast path at least.
How about a BitGet monad which get be run in the Get monad? ... Anyone like that idea?
Sounds good to me. But then, I don't use Data.Binary (due to the lack of floating point support), so I'm not sure my opinion is all that relevant. David

agl:
On Jan 9, 2008 5:01 PM, David Roundy
wrote: But I can't imagine an implementation in which this change wouldn't slow down getBytes for the normal case. Perhaps the slowdown would be small, but it seems unwise to enforce that slowness at the API level, when we've already got a perfectly good API for fast binary IO. Maybe there's some type hackery you could do to avoid a speed penalty, but that's a lot to add for a somewhat dubious benefit.
I believe that it would be an additional if statement in the fast path at least.
How about a BitGet monad which get be run in the Get monad?
test :: Get () test = do runBitGet 2 (do getBitField 2)
So the first argument to runBitGet is the number of bytes to parse for bit fields and then functions in BitGet can extract bit-length ints etc.
Anyone like that idea?
That's pretty much what we envisaged as the approach to take. Monad transformers adding some bit-buffer state over Get/Put. -- Don

On Jan 10, 2008 10:45 AM, Don Stewart
That's pretty much what we envisaged as the approach to take. Monad transformers adding some bit-buffer state over Get/Put.
For anyone who's still reading this thread... I've just uploaded[1] binary-strict 0.2.1 which includes Data.Binary.Strict.BitGet - a Get like monad which works by the bit. I'm afraid that Haddock 2 is choaking on {-# UNPACK #-}, so I don't have the HTML documentation to point to. (And I thought that Haddock 2 was going to fix all those parsing issues :( - hopefully I'm just doing something stupid). [1] http://hackage.haskell.org/cgi-bin/hackage-scripts/package/binary-strict-0.2... AGL -- Adam Langley agl@imperialviolet.org http://www.imperialviolet.org 650-283-9641

agl:
On Jan 10, 2008 10:45 AM, Don Stewart
wrote: That's pretty much what we envisaged as the approach to take. Monad transformers adding some bit-buffer state over Get/Put.
For anyone who's still reading this thread...
I've just uploaded[1] binary-strict 0.2.1 which includes Data.Binary.Strict.BitGet - a Get like monad which works by the bit. I'm afraid that Haddock 2 is choaking on {-# UNPACK #-}, so I don't have the HTML documentation to point to. (And I thought that Haddock 2 was going to fix all those parsing issues :( - hopefully I'm just doing something stupid).
[1] http://hackage.haskell.org/cgi-bin/hackage-scripts/package/binary-strict-0.2...
Ok. That's awesome. I guess if you do all the TODOs for Binary like this, we should merge the code back in :)

On Jan 15, 2008 3:26 PM, Don Stewart
Ok. That's awesome. I guess if you do all the TODOs for Binary like this, we should merge the code back in :)
Well, at the moment I'm pretty unhappy with the amount of code duplication required both within binary-strict and between binary and binary-strict. I think this code needs a whole lot of restructuring (maybe a bit of TH for generating the common bits). I'll get to that when it appears that the API seems reasonable. AGL -- Adam Langley agl@imperialviolet.org http://www.imperialviolet.org 650-283-9641

Hi
(maybe a bit of TH for generating the common bits)
That would be bad. Then you'll have gone from Data.Binary being portable code, to being GHC specific code, and I will cry.... :'-( CPP is a good way to common stuff up in a portable way - I've used it in FilePath. There is nearly no end to the amount of crazy CPP hackery you can use to refactor stuff. Thanks Neil

On Jan 15, 2008 5:01 PM, Neil Mitchell
That would be bad. Then you'll have gone from Data.Binary being portable code, to being GHC specific code, and I will cry.... :'-(
Ok, no TH ;) AGL -- Adam Langley agl@imperialviolet.org http://www.imperialviolet.org 650-283-9641

On Jan 15, 2008 7:33 PM, Adam Langley
Ok, no TH ;)
I've just uploaded binary-strict 0.2.2 to Hackage which factors most of the common code out via CPP. Hopefully I didn't break anything. AGL -- Adam Langley agl@imperialviolet.org http://www.imperialviolet.org 650-283-9641

Adam Langley
On Jan 10, 2008 10:45 AM, Don Stewart
wrote: That's pretty much what we envisaged as the approach to take. Monad transformers adding some bit-buffer state over Get/Put.
For anyone who's still reading this thread...
I've just uploaded[1] binary-strict 0.2.1 which includes Data.Binary.Strict.BitGet - a Get like monad which works by the bit. I'm afraid that Haddock 2 is choaking on {-# UNPACK #-}, so I don't have the HTML documentation to point to. (And I thought that Haddock 2 was going to fix all those parsing issues :( - hopefully I'm just doing something stupid).
[1] http://hackage.haskell.org/cgi-bin/hackage-scripts/package/binary-strict-0.2...
AGL
Thanks for taking the time on this. The old NewBinary had NewBinary.Binary.getBits :: NewBinary.Binary.BinHandle -> Int -> IO GHC.Word.Word8 which allowed you to do things like tlv_ bin = do tagValueVal <- getBits bin 5 tagConstructionVal <- getBits bin 1 tagTypeVal <- getBits bin 2 I'm sure I'm wrong but putting bits into [Bool] doesn't look very efficient. Of course, NewBinary didn't address what happened for n >= 8. Some possibilities are a) not allowing more than 8 b) returning [Word8] or c) (which I thought was where we'd go) a ByteString with some sort of padding. Dominic.

On Jan 16, 2008 2:41 PM, Dominic Steinitz
tlv_ bin = do tagValueVal <- getBits bin 5 tagConstructionVal <- getBits bin 1 tagTypeVal <- getBits bin 2
I'm sure I'm wrong but putting bits into [Bool] doesn't look very efficient. Of course, NewBinary didn't address what happened for n >= 8. Some possibilities are a) not allowing more than 8 b) returning [Word8] or c) (which I thought was where we'd go) a ByteString with some sort of padding.
BitGet is just an API RFC at the moment, so I'm just describing it here - not trying to justify it. In BitGet there's getAsWord[8|16|32|64] which take a number of bits ($n$) and returns the next $n$ bits in the bottom of a Word$x$. Thus, getAsWord8 is what you call getBits and, if you had a 48 bit number, you could use getAsWord64 and the bottom 48-bits of the resulting Word64 would be what you want. Equally, asking for more than $x$ bits when calling getAsWord$x$ is a mistake, however I don't check for it in the interest of speed. There are also get[Left|Right]ByteString which return the next $n$ bits in a ByteString of Word8's. The padding is either at the end of the last byte (left aligned) or at the beginning of the first byte (right aligned). If you did want a [Bool], you could use: bits <- sequence $ take n $ repeat getBit AGL -- Adam Langley agl@imperialviolet.org http://www.imperialviolet.org 650-283-9641

Adam Langley
BitGet is just an API RFC at the moment, so I'm just describing it here - not trying to justify it.
In BitGet there's getAsWord[8|16|32|64] which take a number of bits ($n$) and returns the next $n$ bits in the bottom of a Word$x$. Thus, getAsWord8 is what you call getBits and, if you had a 48 bit number, you could use getAsWord64 and the bottom 48-bits of the resulting Word64 would be what you want.
Equally, asking for more than $x$ bits when calling getAsWord$x$ is a mistake, however I don't check for it in the interest of speed.
There are also get[Left|Right]ByteString which return the next $n$ bits in a ByteString of Word8's. The padding is either at the end of the last byte (left aligned) or at the beginning of the first byte (right aligned).
Ok so I should be doing something like this. I'm not clear what happens if you are reading from a socket and not all the input has arrived but I'll think about that over the weekend. Another thought: could e.g. getRightByteString be in the IO monad and then I don't have to run the Get(?) monad? Or is that a really stupid question? Dominic. import qualified Data.ByteString as B import Data.Word import IO import qualified Data.Binary.Strict.BitGet as BG test = do h <- openFile "foobarbaz" ReadMode b <- B.hGetContents h let ebms = test2 b case ebms of Left s -> return s Right bms -> return (concat ((map (show . B.unpack) bms))) test1 = do bm1 <- BG.getRightByteString 2 bm2 <- BG.getRightByteString 2 return [bm1,bm2] test2 bs = BG.runBitGet bs test1

On Jan 17, 2008 11:05 AM, Dominic Steinitz
I'm not clear what happens if you are reading from a socket and not all the input has arrived but I'll think about that over the weekend.
At the moment, BitGet deals with strict ByteStrings only. One could use it within a standard Get monad by getting a strict ByteString from the lazy input. I believe that lazy ByteStrings got fixed a while back so that reading from a socket doesn't block reading a whole block. (i.e. if you trickle data, byte by byte, to a socket a lazy ByteString should return a spine of 1 byte strict ByteStrings.) A fully lazy BitGet would also be possible, of course, I've just not written it yet ;)
Adam Langley
writes: Another thought: could e.g. getRightByteString be in the IO monad and then I don't have to run the Get(?) monad? Or is that a really stupid question?
If it were in the IO monad, I guess that you're suggesting that it read from a handle? If that were the case, the remainder of the last byte would have to be discarded because one can only read whole bytes from a Handle and there's no mechanism for pushing back into it. It's certainly possible to do, but I think a quick wrapper around a BitGet would be the way to do it. If it's particually desirable I can add it, although I'll admit that it seems a bit odd and I'm wondering what your use case is. Cheers AGL -- Adam Langley agl@imperialviolet.org http://www.imperialviolet.org 650-283-9641

David Roundy
[ something that every C programmer dreams of ]
I'm not going to answer, I'd be just vapour-waring around.
But, yes, any-alignment any-granularity reads can be done in O(1), with
1 ranging from case to case from one instruction to a few shifts and
&'s, plus some constant cost to choose the appropriate function based
on current alignment, which could be set by things like snapToByte,
snapToWord, snapToInt128, getBit, getBits
participants (9)
-
Achim Schneider
-
Adam Langley
-
Bryan O'Sullivan
-
David Roundy
-
David Roundy
-
Dominic Steinitz
-
Don Stewart
-
Duncan Coutts
-
Neil Mitchell