Re: [Haskell-cafe] Re: Abstraction leak

4 Jul 2007

      ...
Anyone trying to do any of this?
I've done some work in this area. I'm particularly interested in
manipulating ASN.1 in haskell. Actually, my first use of Parsec was an
ASN.1 parser. I'd done one previously in Spirit (the Boost C++ rip-off
of parsec), but semantic actions were horrible in the extreme. Mmmm
Parsec.

In the indexing system I'm currently building in Haskell for my day
job, I'm serializing several data structures, and using Data.Bits and
Data.ByteString heavily.

I was using HaXml, but I found it was very slow. So instead, I'm using
an internal (within the indexing system) representation that is more
akin to WBXML:

import Data.ByteString as ByteString
import Data.List as List
import Data.Sequence as Seq

data DocTree
    = DocElem ByteString [(ByteString,ByteString)] [DocTree]
    | DocText ByteString

serialize tree = ByteString.concat $ Seq.toList $ execState
(serialize' tree) Seq.empty
serialize' (DocText txt) = do
    stuff <- get
    put (stuff |> pack [0])
    putStr txt
serialize' (DocElem name attrs kids) = do
    stuff <- get
    put (stuff |> pack [1])
    putStr name
    putNum (List.length attrs)
    mapM_ (putPair putStr putStr) attrs
    putNum (List.length kids)
    mapM_ serialize' kids

putStr ....

You get the idea. Actually, the *real* code is trickier - it grovels
first to find all the element names and numbers them. Likewise with
attribute names (per element). The extra grovel is well worth it - it
takes a little longer to serialize, but is more compact and
deserializes quicker.

Also worth noting - whether you compile a dictionary of element names
or not, the result is much much much more space efficient than using
HaXml, since it can all be decoded out of a single ByteString
containing the document tree, with no actual string copying at all.
That's the kind of [de]serialization I like. :-) Mind you, I still
have to use HaXml when I first read documents into the system, and a
very nice job it does too.

T.
-- 
Dr Thomas Conway
drtomc@gmail.com

Silence is the perfectest herald of joy:
I were but little happy, if I could say how much.