Declarative binary protocols

Cafe, We have some fantastic tools for binary parsing in packages like binary and cereal (and presumably attoparsec, which I've not used). But they don't quite scratch an itch I have when writing implementations of binary communication protocols. A good example of my problem is in my implementation of the memcached binary wire protocol: http://hackage.haskell.org/package/starling What I've tried to do is divide up the library into a declarative protocol description and an imperative machine to sit on a handle and link together a server response with the source request. In the declarative core all of the types which come off of the wire have an associated Data.Binary.Get action - but this isn't quite good enough. Data.Binary works on ByteStrings, but I have a handle. I don't want to use hGetContents because I have trouble working out when lazy IO is and is not correct. I can't use hGet because I don't know how much to get until I'm in the middle of the Get action. How do other folks solve this issue? What I've done is broken down and included (getResponse :: Handle -> IO Response) in my core protocol description module, which gets a fixed-length header from which we can figure out how much else to get to form the complete response. But it would be nice to have something cleaner. One thing I've thought of is generalizing Data.Binary.Get to operate over either ByteStrings or Handles. But then I would be doing reads from the handle every couple of bytes. To avoid that I could extend the monad to include declarations of how many bytes the parser will require, which may be declared mutliple times throughout the parser. This felt a bit weird to me, though. Thanks, Antoine Declarative core module: http://hackage.haskell.org/packages/archive/starling/0.1.1/doc/html/Network-... IO-centric module: http://hackage.haskell.org/packages/archive/starling/0.1.1/doc/html/Network-...

On Mon, Jan 18, 2010 at 10:39 PM, Antoine Latter
Cafe,
We have some fantastic tools for binary parsing in packages like binary and cereal (and presumably attoparsec, which I've not used). But they don't quite scratch an itch I have when writing implementations of binary communication protocols.
A good example of my problem is in my implementation of the memcached binary wire protocol: http://hackage.haskell.org/package/starling
What I've tried to do is divide up the library into a declarative protocol description and an imperative machine to sit on a handle and link together a server response with the source request.
In the declarative core all of the types which come off of the wire have an associated Data.Binary.Get action - but this isn't quite good enough. Data.Binary works on ByteStrings, but I have a handle. I don't want to use hGetContents because I have trouble working out when lazy IO is and is not correct. I can't use hGet because I don't know how much to get until I'm in the middle of the Get action.
Now that I've posted I've come up with a solution. I will start with binary-strict:Data.Binary.Strict.IncrementalGet[1] It currently defines
data Result a = Failed String | Finished ByteString a | Partial (ByteString -> Result a) <<<<<
Which I will change to:
data Result a = Failed String | Finished ByteString a | Partial Int (ByteString -> Result a) <<<<<
Where the p type includes some information as to why the result is partial. This means that I will change the current function: suspend :: Get r () to: require :: Int -> Get r () We will then only return the 'Partial' result on a call to 'require'. Any other attempts to read beyond the so-far fetched byte-string will result in failure. I'll then have a function of type: runFromHandle :: Handle -> Get r r -> IO r Which leaves the handle in a usable state and never seeks ahead in the handle, and also a function: runFromBytes :: ByteString -> Get r r -> {- some sensible return type -} The idea is that the partial return type is hidden from the users of the library, but we take advantage of it to read from the handle in a sensible way. This is all based on a quick read-through of binary-strict. But it fits how I think about a lot of the binary protocols I write: getResponse = do require 256 x <- getX len <- getWord16be y <- getY z <- getZ require (fromIntegral len * 8) a <- getA b <- getB return $ Response x y z a b c The only weird part is that I only ever intend to write the "require" statements at the top-level - maybe 'getA' and the like can be written in some restricted version of the Get monad which doesn't permit 'require' declarations. Any comments? Is there an easier way to do this? Antoine [1] http://hackage.haskell.org/packages/archive/binary-strict/0.4.6/doc/html/Dat...

Antoine Latter wrote:
getResponse = do require 256 x <- getX len <- getWord16be y <- getY z <- getZ require (fromIntegral len * 8) a <- getA b <- getB return $ Response x y z a b c
This looks like code that could be written in applicative style, in which case you could analyze the parser and automatically compute how many bytes are needed, removing the need for explicit calls to require. Groetjes, Martijn.
participants (2)
-
Antoine Latter
-
Martijn van Steenbergen