Parsing binary 'hierachical' objects for lazy developers

newer
More ideas for controlled mutation

John Obbele

27 Apr 2011 27 Apr '11

6:16 p.m.

Hi Haskellers, I'm currently serializing / unserializing a bunch of bytestrings which are somehow related to each others and I'm wondering if there was a way in Haskell to ease my pain. The first thing I'm looking for, is to be able to automatically derive "Serializable" objects, for example: --------------------------------------------------------------- import Data.Serialize -- using cereal as an example data MyFlag = One | Two | Three instance Serialize [MyFlag] where put = putWord16le . marshalFlags get = unmarshal `fmap` getWord16le data ObjectA = ObjectA { attribute0 :: Word8 , attribute1 :: Word16le , attribute2 :: [MyFlag] } deriving (Serialize) -- magic goes here! --------------------------------------------------------------- Unfortunately ghci complains that 'Serialize' is not a derivable class. Yet, deriving the Serialize instance for ObjectA should be simple, since all the three attributes are already serializable themselves... Second issue, I would like to find a way to dispatch parsers. I'm not very good at expressing my problem in english, so I will use another code example: --------------------------------------------------------------- -- let's say we have two objects with almost the same structure: data ObjectA = ObjectA { objLength :: Int , objType :: TypeId , attribute2a :: [MyFlag] } data ObjectB = ObjectB { objLength :: Int , objType :: TypeId , attribute2b :: Word32le } --------------------------------------------------------------- When we begin to deserialize theses objects, we don't know their final type, we just know how to read their length and their typeId. Only then can we determine if what we are parsing is an ObjectA or an ObjectB. Once we now the object type, we can resume the parsing and return either an ObjectA or ObjectB. Oki, so I may have read too much of Peter Seibel's chapter on binary-data parsing in Common Lisp or spent too much time working on object-oriented code, but currently, I have no idea on how to write this 'simply' in Haskell :( any help would be welcome /john

Show replies by date

Maciej Marcin Piechotka

27 Apr 27 Apr

7:46 p.m.

On Wed, 2011-04-27 at 20:16 +0200, John Obbele wrote:

...

Hi Haskellers,

I'm currently serializing / unserializing a bunch of bytestrings which are somehow related to each others and I'm wondering if there was a way in Haskell to ease my pain.

The first thing I'm looking for, is to be able to automatically derive "Serializable" objects, for example:

--------------------------------------------------------------- import Data.Serialize -- using cereal as an example

data MyFlag = One | Two | Three

instance Serialize [MyFlag] where put = putWord16le . marshalFlags get = unmarshal `fmap` getWord16le

data ObjectA = ObjectA { attribute0 :: Word8 , attribute1 :: Word16le , attribute2 :: [MyFlag] } deriving (Serialize) -- magic goes here! ---------------------------------------------------------------

Unfortunately ghci complains that 'Serialize' is not a derivable class. Yet, deriving the Serialize instance for ObjectA should be simple, since all the three attributes are already serializable themselves...

Second issue, I would like to find a way to dispatch parsers. I'm not very good at expressing my problem in english, so I will use another code example:

--------------------------------------------------------------- -- let's say we have two objects with almost the same structure: data ObjectA = ObjectA { objLength :: Int , objType :: TypeId , attribute2a :: [MyFlag] }

data ObjectB = ObjectB { objLength :: Int , objType :: TypeId , attribute2b :: Word32le } ---------------------------------------------------------------

When we begin to deserialize theses objects, we don't know their final type, we just know how to read their length and their typeId.

Only then can we determine if what we are parsing is an ObjectA or an ObjectB.

Once we now the object type, we can resume the parsing and return either an ObjectA or ObjectB.

Oki, so I may have read too much of Peter Seibel's chapter on binary-data parsing in Common Lisp or spent too much time working on object-oriented code, but currently, I have no idea on how to write this 'simply' in Haskell :(

I believe following should work class Serializer ObjectA where get = check =<< (ObjectA <$> get <*> get <*> get) where check obj@(ObjectA len id attr) | len < 10 && id == 0 = return obj | otherwise = empty class Serializer ObjectB where get = check =<< (ObjectB <$> get <*> get <*> get) where check obj@(ObjectB len id attr) | len > 10 && id == 1 = return obj | otherwise = empty parseEitherAB :: Get (Either ObjectA ObjectB) parseEitherAB = (Left <$> get) <|> (Right <$> get) Regards

...

any help would be welcome /john

John Obbele

30 Apr 30 Apr

11:40 a.m.

New subject: Parsing binary 'hierachical' objects for lazy developers

On Wed, Apr 27, 2011 at 09:46:08PM +0200, Maciej Marcin Piechotka wrote:

...

I believe following should work

class Serializer ObjectA where get = check =<< (ObjectA <$> get <*> get <*> get) where check obj@(ObjectA len id attr) | len < 10 && id == 0 = return obj | otherwise = empty

class Serializer ObjectB where get = check =<< (ObjectB <$> get <*> get <*> get) where check obj@(ObjectB len id attr) | len > 10 && id == 1 = return obj | otherwise = empty

parseEitherAB :: Get (Either ObjectA ObjectB) parseEitherAB = (Left <$> get) <|> (Right <$> get)

Hum, that's still not automatic but using Control.Applictive to write one-liners seems a good-enough solution. I will try it. thanks ,) /john

Stephen Tetley

27 Apr 27 Apr

7:52 p.m.

John Meacham's DrIFT tool used to get extended faster than GHC for things that "should" be automatic. I'm not sure of its current status, though: http://repetae.net/computer/haskell/DrIFT/ For your second problem, something like this: getAB :: Get (Either A B) getAB = do len <- getWord16be tag <- getWord16be if tag == 0x00 then do { a <- getA len; return (Left a) } else do { a <- getB len; return (Right b) } -- length already consumed so sent as an argument... getA :: Word16 -> Get A getB :: Word16 -> Get B

Alexander Solla

8:28 p.m.

On Wed, Apr 27, 2011 at 11:16 AM, John Obbele wrote:

...

Hi Haskellers,

I'm currently serializing / unserializing a bunch of bytestrings which are somehow related to each others and I'm wondering if there was a way in Haskell to ease my pain.

The first thing I'm looking for, is to be able to automatically derive "Serializable" objects, for example:

Happstack has "Serialize" type class, and uses TemplateHaskell to automate deriving instances. I don't know if they are binary compatible with cereal (i.e., that you could serialize with one and deserialize with the other, or vice-versa)

...

--------------------------------------------------------------- import Data.Serialize -- using cereal as an example

data MyFlag = One | Two | Three

instance Serialize [MyFlag] where put = putWord16le . marshalFlags get = unmarshal `fmap` getWord16le

data ObjectA = ObjectA { attribute0 :: Word8 , attribute1 :: Word16le , attribute2 :: [MyFlag] } deriving (Serialize) -- magic goes here! ---------------------------------------------------------------

Unfortunately ghci complains that 'Serialize' is not a derivable class. Yet, deriving the Serialize instance for ObjectA should be simple, since all the three attributes are already serializable themselves...

...

Second issue, I would like to find a way to dispatch parsers. I'm not very good at expressing my problem in english, so I will use another code example:

This sounds very hard in the general case. Others have shown you how to dispatch on two types. But there is no general data type which combines all (or even arbitrarily many) types. Somehow, "Read" is able to do this, but I don't know what kind of magic it uses.

...

--------------------------------------------------------------- -- let's say we have two objects with almost the same structure: data ObjectA = ObjectA { objLength :: Int , objType :: TypeId , attribute2a :: [MyFlag] }

data ObjectB = ObjectB { objLength :: Int , objType :: TypeId , attribute2b :: Word32le } ---------------------------------------------------------------

When we begin to deserialize theses objects, we don't know their final type, we just know how to read their length and their typeId.

Only then can we determine if what we are parsing is an ObjectA or an ObjectB.

Once we now the object type, we can resume the parsing and return either an ObjectA or ObjectB.

Oki, so I may have read too much of Peter Seibel's chapter on binary-data parsing in Common Lisp or spent too much time working on object-oriented code, but currently, I have no idea on how to write this 'simply' in Haskell :(

any help would be welcome /john

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Stephen Tetley

9:18 p.m.

On 27 April 2011 21:28, Alexander Solla wrote:

...

On Wed, Apr 27, 2011 at 11:16 AM, John Obbele wrote:

...
Second issue, I would like to find a way to dispatch parsers. I'm not very good at expressing my problem in english, so I will use another code example:

This sounds very hard in the general case. Others have shown you how to dispatch on two types. But there is no general data type which combines all (or even arbitrarily many) types. Somehow, "Read" is able to do this, but I don't know what kind of magic it uses.

Read always "demands its type" so it doesn't use any magic - if the input string doesn't conform it will throw an error. Any sensible binary format will have a scheme such as tag byte prefixes to control choice in parsing (binary parsing generally avoids all backtracking). If your binary data doesn't have a proper scheme it will be hard to parse for any language (or cast-to in the case of C), so the most sensible answer is to revise the format.

John Obbele

30 Apr 30 Apr

12:07 p.m.

New subject: Parsing binary 'hierachical' objects for lazy developers

On Wed, Apr 27, 2011 at 10:18:47PM +0100, Stephen Tetley wrote:

...

On 27 April 2011 21:28, Alexander Solla wrote:

...
On Wed, Apr 27, 2011 at 11:16 AM, John Obbele wrote:

...
Second issue, I would like to find a way to dispatch parsers. I'm not very good at expressing my problem in english, so I will use another code example:

This sounds very hard in the general case. Others have shown you how to dispatch on two types. But there is no general data type which combines all (or even arbitrarily many) types. Somehow, "Read" is able to do this, but I don't know what kind of magic it uses.

Read always "demands its type" so it doesn't use any magic - if the input string doesn't conform it will throw an error.

Any sensible binary format will have a scheme such as tag byte prefixes to control choice in parsing (binary parsing generally avoids all backtracking). If your binary data doesn't have a proper scheme it will be hard to parse for any language (or cast-to in the case of C), so the most sensible answer is to revise the format.

Oki, so far the use of the Control.Applicative magic, the syntax sugar for monadic operations and manually written it/then/else or 'case of' branching statements have helped me considerably in the parsing task. I have not try DrIFT since I prefer to avoid pre-processors for now. So the only quirk that is still upsetting me is the 'deriving' issue: if I know that what I am parsing could only result in ObjectA or ObjectB, every thing would be simple. But when someone decides to add an extension to the binary format, let's say add a new tag identifier and a new ObjectC with a different size and new attributes, I will have to re-write part of my Haskell parser. I think, I will just have to rewrite the abstract type to 'data AbstractObject = ObjectA | ObjectB | ObjectC' let my parser still have the type signature: 'parser :: B.ByteString -> AbstractObject' and modify the branching inside to add the C tag identifier: 'case tag identifier of A -> ... B -> ... C -> ...'. It's not straightforward but it should be manageable. regards, /john

5364

Age (days ago)

5367

Last active (days ago)

List overview

Download

6 comments

4 participants

participants (4)

Alexander Solla
John Obbele
Maciej Marcin Piechotka
Stephen Tetley