
On Mon, Jan 26, 2004 at 02:39:32PM -0800, John Meacham wrote:
On Mon, Jan 26, 2004 at 01:24:37PM +0100, Tomasz Zielonka wrote:
3) Roll your own (de)serialization framework
That's what I did. It's a bit complicated, but I will try to describe it within a couple of days. Right now all I can say is that it uses TH and has a couple of implementations of low-lever decoders, one of which reads directly from UArray Int Word8. I managed to achieve throughput of 3 MB / s for quite complicated binary protocols, and I think I can improve that even further.
Hi, I would be very interesting in looking at the design of this. Is it available on the web somewhere?
I've put the code in attachment. I compiled in GHC 6.0.1 (this may matter because of Template Haskell. I also used other GHC extensions). There is an Example.hs, which shows how to declare a record for IP header and how to use both supplied parsers. This is not a standalone program, so to try it, compile it with ghc -O2 --make Example and load it in GHCi (you have to compile it first, because the interpreter can't handle unboxed tuples). The library is really very simple, I created it quickly to solve some immediate problems. There is no support for bitfields - I think there would be if I had to deal with them ;) The original library used little endian encoding. I changed to big endian aka network order, but there really should be a possibility to choose. Right now I am not sure how this choice should be available to the user. The ability to easily switch from Parsec to an efficient UArray Parser and the benefit from specializing 'times' came as nice surprises. Well, not exactly unexpected surprises, because I was striving for it a bit, but it was easier than I thought. Template Haskell is used to automatically derive instances for record types. In my application I had to create some instances by hand, because some regions of binary files where not self-describing - they needed additional information from the outside. There is also an encoding part - if someone's interested, I can extract it from the application. The funny thing is that I was only supposed to produce files, not parse them. But I started by doing a parser and thanks to a declarative approach, when I finished the parser, the unparser was ready almost instantly. One way to introduce bitfields and varying endianness would be to design some description language like that in Erlang (some algebraic datatype should suffice). Then we could generate datatypes and class instances from it using TH. Something like that (just a sketch): ipHeaderLayout = Record "IPHeader" [ BitField "Word8" BE [ (4, Just ("iphHeaderLen", "!Word8")) , (4, Just ("iphVersion", "!Word8")) ] , Field "iphTOS" (Unsigned 1 BE) "!Word8" , Field "iphTotalLen" (Unsigned 2 BE) "!Word16" , Field "iphID" (Unsigned 2 BE) "!Word16" , Field "iphFragOff" (Unsigned 2 BE) "!Word16" , Field "iphTTL" (Unsigned 1 BE) "!Word8" , Field "iphProtocol" (Unsigned 1 BE) "!Word8" , Field "iphCheck" (Unsigned 2 BE) "!Word16" , Field "iphSAddr" (Unsigned 4 BE) "!Word32" , Field "iphDAddr" (Unsigned 4 BE) "!Word32" ] $(createDataType ipHeaderLayout) $(createDecodableClassInstance ipHeaderLayout) $(createEncodableClassInstance ipHeaderLayout) Best regards, Tom -- .signature: Too many levels of symbolic links