Binary parser combinators and pretty printing

Hello I am trying to figure out the best interface to binary parser and pretty printing combinators for network protocols. I am trying to find the most natural syntax to express these parsers in Haskell and would like opinions and new ideas. As an example I will use a protocol with the following packet structure: 0 message-id 4 sender-id 8 receiver-id 12 number of parameters 16 parameters. Each parameter is prefixed by 32bit length followed by the data. We will use the following Haskell datatype: data Packet = Packet Word32 Word32 Word32 [FastString] 1) Simple monadic interface getPacket = do mid <- getWord32BE sid <- getWord32BE rid <- getWord32BE nmsg<- getWord32BE vars<- replicateM (fromIntegral nmsg) (getWord32BE >>= getBytes) return $ Packet mid sid rid nmsg vars putPacket (Packet mid sid rid vars) = do mapM_ putWord32BE [mid, sid, rid, length vars] mapM_ (\fs -> putWord32BE (length fs) >> putBytes fs) vars This works but writing the code gets tedious and dull. 2) Using better combinators packet = w32be <> w32be <> w32be <> lengthPrefixList w32be (lengthPrefixList w32be bytes) getPacket = let (mid,sid,rid,vars) = getter packet in Packet mid sid rid vars putPacket (Packet mid sid rid vars) = setter packet mid sid rid vars Maybe even the tuple could be eliminated by using a little of TH. Has anyone used combinators like this before and how did it work? 3) Using TH entirely $(getAndPut 'Packet "w32 w32 w32 lengthPrefixList (w32 bytes)") Is this better than the combinators in 2)? Also what sort of syntax would be best for expressing nontrivial dependencies - e.g. a checksum calculated from other fields. 4) Using a syntax extension Erlang does this with the bit syntax (http://erlang.se/doc/doc-5.4.8/doc/programming_examples/bit_syntax.html) and it is very nifty for some purposes. getPacket = do << mid:32, sid:32, rid:32, len:32 rest:len/binary >> ... The list of lists gets nontrivial here too... - Einar Karttunen

Einar Karttunen
I am trying to figure out the best interface to binary parser and pretty printing combinators for network protocols.
2) Using better combinators
packet = w32be <> w32be <> w32be <> lengthPrefixList w32be (lengthPrefixList w32be bytes) Has anyone used combinators like this before and how did it work?
Yes, the nhc98 Binary library has a "<<" combinator, in very much the style you outline. It is only used in pure code, but it permits some very concise descriptions of the binary layout. The library is described here: ftp://ftp.cs.york.ac.uk/pub/malcolm/ismm98.html but unfortunately that paper has only the tiniest examples of the usage of "<<". However, in my experience the style worked well and was pleasant to use. Regards, Malcolm

On Tue, Sep 13, 2005 at 06:03:00PM +0300, Einar Karttunen wrote:
We will use the following Haskell datatype:
data Packet = Packet Word32 Word32 Word32 [FastString]
1) Simple monadic interface
[...]
This works but writing the code gets tedious and dull.
2) Using better combinators
packet = w32be <> w32be <> w32be <> lengthPrefixList w32be (lengthPrefixList w32be bytes) getPacket = let (mid,sid,rid,vars) = getter packet in Packet mid sid rid vars putPacket (Packet mid sid rid vars) = setter packet mid sid rid vars
Maybe even the tuple could be eliminated by using a little of TH. Has anyone used combinators like this before and how did it work?
No need for TH. If you have monadic interface, you can write getPacket as: getPacket = (return Packet) `ap` w32be `ap` w32be `ap` w32be `ap` lengthPrefixList w32be (lengthPrefixList w32be bytes) There's more trouble with putPacket though.
3) Using TH entirely
$(getAndPut 'Packet "w32 w32 w32 lengthPrefixList (w32 bytes)")
Is this better than the combinators in 2)? Also what sort of syntax would be best for expressing nontrivial dependencies - e.g. a checksum calculated from other fields.
How about all these points together?: a) Simple monadic interface b) Using better combinators c) Using TH to generate code for the simple cases d) Using type-classes Having a monadic interface doesn't prevent you from introducing other combinators. In fact, every useful monad should have some combinators other than >>= and return. There are already some generic monadic combinators that can simplify your code, as shown in the getPacket example. Points c) and d) are closely related - you can introduce a type class for Binary decodable/encodable datatypes and then generate instances with TH. The code for these instances is generated directly from the structure of a datatype and it is quite simple, because it's mostly recursively using the type-class methods - this can greatly simplify TH code. So, assuming that you have instances of Binary for Word32 and FastString and [], making Packet an instance of Binary would amount to writing data Packet = Packet Word32 Word32 Word32 [FastString] $(deriveBinary 'Packet) Manually written instances for Packet would look like this: instance Binary Packet where decode = f $ f $ f $ f $ return Packet encode (Packet mid sid rid vars) = do encode mid encode sid encode rid encode vars f x = x `ap` decode Unfortunately the world is not that simple, so you'll probably a bit more complicated framework to handle varying endianness, varying encodings for the same types, strange encoding schemas (like DNS packet compression, <number of records> fields far away from the record sequences, etc). To some degree it can be solved by introducing newtypes or making more complicated typeclasses. I've played with such frameworks a couple of times and I feel it's time to make a library useful for others. If you're interested, we could cooperate.
4) Using a syntax extension
If there is any extension that would help here, I think it should be something more general than merely a syntax for specifying binary format. This problem seems like a good use for generics and TH. Best regards Tomasz

On 13.09 23:31, Tomasz Zielonka wrote:
How about all these points together?:
a) Simple monadic interface
I think I already have this - minus packaging and documentation.
b) Using better combinators
This is lacking.
c) Using TH to generate code for the simple cases
I have TH for generating code, but that is not yet general purpose (the code comes from SerTH).
d) Using type-classes
As most real-world protocols will need customization I cannot see much improvement here. Keeping the types of the serialized data explicit makes sense. Otherwise changing an innocent Haskell data declaration would cause on-wire data mismatch rather than compile-time type errors.
I've played with such frameworks a couple of times and I feel it's time to make a library useful for others. If you're interested, we could cooperate.
I would be interested in cooperation and getting an usefull library released. Currently my parsers just use [FastString] (thus support lazy IO), peek and poke. - Einar Karttunen

Hello Einar, Tuesday, September 13, 2005, 7:03:00 PM, you wrote: EK> data Packet = Packet Word32 Word32 Word32 [FastString] well. you can see my own BinaryStream package at http://freearc.narod.ru class BinaryData a where read :: ... write :: ... instance BinaryData Word32 where read = ... write = ... instance BinaryData FastString where read = ... write = ... instance (BinaryData a, BinaryData b, BinaryData c, BinaryData d) => BinaryData (a,b,c,d) where read = ... write = ... instance (BinaryData a) => BinaryData [a] where read = ... write = ... EK> 1) Simple monadic interface EK> getPacket = do mid <- getWord32BE EK> sid <- getWord32BE EK> rid <- getWord32BE EK> nmsg<- getWord32BE EK> vars<- replicateM (fromIntegral nmsg) (getWord32BE >>= getBytes) EK> return $ Packet mid sid rid nmsg vars turns into: (a,b,c,d) <- read return $ Packet a b c d EK> Maybe even the tuple could be eliminated by using a little of TH. it may be eliminated even without TH! :+: and :*: should work, although i don't tried this -- Best regards, Bulat mailto:bulatz@HotPOP.com

On 15.09 21:53, Bulat Ziganshin wrote:
EK> data Packet = Packet Word32 Word32 Word32 [FastString]
well. you can see my own BinaryStream package at http://freearc.narod.ru
class BinaryData a where read :: ... write :: ...
I don't think this is a very good solution. Keeping the on-wire datatypes explicit makes sense to me. Also things like endianess will need to be taken into account. If the encoding is derived automatically then changing the Haskell datatype will change the on-wire representation. This is not wanted when interfacing with external protocols. For typeclasses I would rather have: getWord32BE :: Num a => MyMonad a than get :: MyClass a => MyMonad a Note the difference between the Haskell type determining the on-wire type and it being explicit. I already have working TH code for the case where I want to derive automatic binary serialization for Haskell datatypes (SerTH).
EK> Maybe even the tuple could be eliminated by using a little of TH.
it may be eliminated even without TH! :+: and :*: should work, although i don't tried this
I don't know how generics work in newer versions of GHC, but it may be worth investigating. - Einar Karttunen
participants (4)
-
Bulat Ziganshin
-
Einar Karttunen
-
Malcolm Wallace
-
Tomasz Zielonka