
Hi all, Can anyone recommend any articles relating to de/serialization in Haskell? I've been reading these: - http://hackage.haskell.org/packages/archive/binary/0.4.1/doc/html/Data-Binar... - http://en.wikipedia.org/wiki/Serialization#Haskell - http://www.haskell.org/tutorial/stdclasses.html (The Read/Show bits) ...and I'd like some more examples etc. My situation is basically this, I have a non-Haskell black-box tuple space ( http://en.wikipedia.org/wiki/Tuple_space) which I want to use Haskell to read/write data from/into. I had trouble understanding how Read/Show could help me here. Data.Binary made more sense, but like I say, I'd really like to find some more articles and examples. Even apart from the tuple space stuff. I'm trying to piece together in my mind how I'd read some serialized data from anywhere and turn that into a Haskell data type in a nicely reusable way. I've been flicking through Real World Haskell and couldn't see much that I found helpful for this. Many thanks, Tom

Hi Tom If you are interfacing with non-Haskell binary objects - you will want binary parsing / writing rather than simple serialization as the format will be determined by the foreign objects. You can still use Data.Binary (indeed its probably the best choice), but you will want to use the modules Data.Binary.Get and Data.Binary.Put directly and probably avoid the Binary class as it is specialized to serializing values for Haskell only. There are probably quite a few libraries on Hackage that you can look at for examples, though there might be more packages that supply parsers only and don't do writing, e.g: http://hackage.haskell.org/package/pecoff (Parser only) There will be more among the packages this list that directly depend on Binary: http://bifunctor.homelinux.net/~roel/cgi-bin/hackage-scripts/revdeps/binary-... Best wishes Stephen

the Binary class as it is specialized to serializing values for Haskell only
Can you please expand on this? I've been using Data.Binary to (de)serialize
messages for some networking protocols, and have made all my types instances
of Binary. Non-Haskell programs will be receiving and sending messages on
one end, but I didn't think that mattered since my get and put functions are
written to adhere to the protocol's definition. Is there some issue I'm
missing?
On Mon, May 10, 2010 at 6:18 AM, Stephen Tetley
Hi Tom
If you are interfacing with non-Haskell binary objects - you will want binary parsing / writing rather than simple serialization as the format will be determined by the foreign objects.
You can still use Data.Binary (indeed its probably the best choice), but you will want to use the modules Data.Binary.Get and Data.Binary.Put directly and probably avoid the Binary class as it is specialized to serializing values for Haskell only.
There are probably quite a few libraries on Hackage that you can look at for examples, though there might be more packages that supply parsers only and don't do writing, e.g:
http://hackage.haskell.org/package/pecoff (Parser only)
There will be more among the packages this list that directly depend on Binary:
http://bifunctor.homelinux.net/~roel/cgi-bin/hackage-scripts/revdeps/binary-...
Best wishes
Stephen _______________________________________________ Beginners mailing list Beginners@haskell.org http://www.haskell.org/mailman/listinfo/beginners

On Monday 10 May 2010 21:00:14, Ashish Agarwal wrote:
the Binary class as it is specialized to serializing values for Haskell
only
Can you please expand on this? I've been using Data.Binary to (de)serialize messages for some networking protocols, and have made all my types instances of Binary. Non-Haskell programs will be receiving and sending messages on one end, but I didn't think that mattered since my get and put functions are written to adhere to the protocol's definition. Is there some issue I'm missing?
I think the point was that derive (.., Binary) isn't a good idea when communicating with the Non-Haskell part of the world. If you write serialisation functions adhering to a specified protocol, defining a Binary instance for your types with those functions isn't going to do harm. Only it might give rise to confusion if somebody wants to transmit those types according to another protocol.

Hi Ashish Daniel has largely answered this for me (thanks Daniel!). If you define Binary instances for your data types to match a protocol - then as Daniel says you can only use them for that protocol. Similarly the all the regular Haskell types - Int, Word8, Float, etc. - have Binary instances ready-made which you may not want when dealing with anything non-Haskell [*]: numbers are always big-endian, the encodings for Integers, Floats and the like are sparsely documented and may well handle signs differently to an equivalent C / Java / ... representation. [*] Personally I'd go as far as saying, as saying you should avoid them entirely except for writing other instances of the Binary class. Best wishes Stephen

Thanks for your responses.
Only it might give rise to confusion if somebody wants to transmit those types according to another protocol.
So are Binary instances perceived to be just for (de)serializing from/to Haskell? Would it be better style for me to define a new type class with methods getProt and putProt, where Prot is whatever protocol I'm supporting?
numbers are always big-endian
I have been wondering about the difference in put/get instances of Word
types versus the getWord8be, getWord8le, etc. functions. Since the default
instances use big-endian, is there any difference between the following:
get (0::Word8)
getWord8be 0
What exactly are the guarantees for Binary instances, only that get and put
are inverses? Are all other features possibly different between compiler
versions or different implementations of Haskell?
On Mon, May 10, 2010 at 5:49 PM, Stephen Tetley
Hi Ashish
Daniel has largely answered this for me (thanks Daniel!).
If you define Binary instances for your data types to match a protocol - then as Daniel says you can only use them for that protocol. Similarly the all the regular Haskell types - Int, Word8, Float, etc. - have Binary instances ready-made which you may not want when dealing with anything non-Haskell [*]: numbers are always big-endian, the encodings for Integers, Floats and the like are sparsely documented and may well handle signs differently to an equivalent C / Java / ... representation.
[*] Personally I'd go as far as saying, as saying you should avoid them entirely except for writing other instances of the Binary class.
Best wishes
Stephen _______________________________________________ Beginners mailing list Beginners@haskell.org http://www.haskell.org/mailman/listinfo/beginners

On 11 May 2010 18:00, Ashish Agarwal
Thanks for your responses.
Only it might give rise to confusion if somebody wants to transmit those types according to another protocol. So are Binary instances perceived to be just for (de)serializing from/to Haskell? Would it be better style for me to define a new type class with methods getProt and putProt, where Prot is whatever protocol I'm supporting?
Hi Ashish For the first question - my opinion would be yes, though I might be in a minority of one on that point. For the second question - its just a matter of taste, personally I like type class to be a bit more than a naming convenience e.g. the Monad class communicates a strong notion of, ahem, "monads" and there are many valuable functions that can be built using just the operations of monad class without knowing the implementation (sequence, liftM, mapM etc.). For say pretty printing, I can live happily just prefixing my type with pp rather than hankering after a class (e.g ppInt, ppFloat, ppSyntaxTree...). Again this is an entirely personal taste - and when a Pretty class is provided (as in wl-pprint, but not HughesPJ) I use it.
is there any difference between the following: get (0::Word8) getWord8be 0
Did you mean this, as there is no `getWord8`:
get (0::Word8) getWord8 0
This will always be fine for a single Word8, as there's no notion of endian-ness (as far as my understanding goes). Also the Binary instance for Word8 is implemented as:
instance Binary Word8 where get = getWord8 put = putWord8
... so the code is the same. An Int8 is a bit more problematic - I think works as 0x00 to 0x0E covers 0..127, 0x10 is -128, going to 0xEE for -1. Whilst I think this would be the expected behaviour in C, I can't remember if its true, so I wouldn't like to rely on it for inter-working between Haskell and C. For floating point numbers - representations are quite likely to diverge, one would hope any good protocol would pick a proper representation (e.g. IEEE-754). Data.Binary is a "Standard Library" so any Haskell compiler should use the same code and produce the same binary layout. Best wishes Stephen

On Tuesday 11 May 2010 19:45:42, Stephen Tetley wrote:
An Int8 is a bit more problematic
Not really, the Binary instances for IntN just use fromIntegral to convert between IntN and WordN before serialising/after deserialising a WordN. Now, it's not written in the language definition that fromIntegral should be a reinterpretation of the bit-pattern for these conversions, but I think you can pretty much rely on that. What might be a problem for the unwary are the instance for Int and Word (guess what they do and why).
I wouldn't like to rely on it for inter-working between Haskell and C.
Int8 and Word8 are unproblematic (unless you have a ones-complement machine), the problems appear only for larger types, as there's no guarantee whether your architecture is big-endian or little-endian. But if you know that, you know what to do on the C end.
For floating point numbers - representations are quite likely to diverge, one would hope any good protocol would pick a proper representation (e.g. IEEE-754).
Data.Binary is a "Standard Library" so any Haskell compiler should use the same code and produce the same binary layout.
Best wishes
Stephen

On 11 May 2010 19:16, Daniel Fischer
Int8 and Word8 are unproblematic (unless you have a ones-complement machine), the problems appear only for larger types, [SNIP]
Hi Daniel Likely I was revealing a personal bias on that one, the only protocol I've implemented where Int8 has featured has been MIDI. If MIDI isn't ones-complement, my codes - I've written decoders / encoders several times - have been wrong for years... Best wishes Stephen

On Tuesday 11 May 2010 21:01:19, Stephen Tetley wrote:
On 11 May 2010 19:16, Daniel Fischer
wrote: Int8 and Word8 are unproblematic (unless you have a ones-complement machine), the problems appear only for larger types, [SNIP]
Hi Daniel
Likely I was revealing a personal bias on that one, the only protocol I've implemented where Int8 has featured has been MIDI.
If MIDI isn't ones-complement, my codes - I've written decoders / encoders several times - have been wrong for years...
I've no idea how MIDI messages/files/streams are interpreted, I was thinking about the hardware, how the ALU interprets the bit-pattern for arithmetic.
Best wishes
Stephen

get (0::Word8) getWord8be 0
Make that
put (0 :: Word16) putWord16be 0
?
Yes, thanks for the correction. Word8 was a bad choice since endianness is
not an issue.
Thanks for all the advice. It seems I should avoid making my types instances
of Binary. In fact, this relates to another design issue I've been grappling
with, supporting multiple versions of the protocol. Probably I'll need
something like, putProtVersion1, putProtVersion2, etc. Or something along
this line.
On Tue, May 11, 2010 at 3:41 PM, Daniel Fischer
On Tuesday 11 May 2010 21:01:19, Stephen Tetley wrote:
On 11 May 2010 19:16, Daniel Fischer
wrote: Int8 and Word8 are unproblematic (unless you have a ones-complement machine), the problems appear only for larger types, [SNIP]
Hi Daniel
Likely I was revealing a personal bias on that one, the only protocol I've implemented where Int8 has featured has been MIDI.
If MIDI isn't ones-complement, my codes - I've written decoders / encoders several times - have been wrong for years...
I've no idea how MIDI messages/files/streams are interpreted, I was thinking about the hardware, how the ALU interprets the bit-pattern for arithmetic.
Best wishes
Stephen
_______________________________________________ Beginners mailing list Beginners@haskell.org http://www.haskell.org/mailman/listinfo/beginners

On Wednesday 12 May 2010 19:15:42, Ashish Agarwal wrote:
get (0::Word8) getWord8be 0
Make that
put (0 :: Word16) putWord16be 0
?
Yes, thanks for the correction. Word8 was a bad choice since endianness is not an issue.
Thanks for all the advice. It seems I should avoid making my types instances of Binary.
I wouldn't go so far, just stop and think before you do.
In fact, this relates to another design issue I've been grappling with, supporting multiple versions of the protocol. Probably I'll need something like, putProtVersion1, putProtVersion2, etc. Or something along this line.
That, or data Protocol a = Prot { putVal :: a -> Put , getVal :: Get a } ieee754 :: Protocol Double ieee754 = Prot { putVal=... , getVal=... } json :: Protocol JSON json = ... work args = do stuff mapM_ (putVal prot) vals moreStuff

Ashish, I've been using XML picklers for serialization, which you might want to consider if you can handle a little extra CPU/data cost. I'm the author of hexpat-pickle, so this is a plug for my package. I lifted the idea wholesale from the HXT package. Handling changes in protocol version is really easy. I've been doing a lot of xpSomething = new `xpTryCatch` old where new = ... old = ... If the new encoding fails to parse, it tries the old one, and on transmission it uses the new one. I find it very quick and convenient to bang out a new pickler now that I'm familiar with it. You write your pickler and your unpickler with the same code - it works well. I am also working on hexpat-iteratee which is a lot more socket friendly since it doesn't use Haskell's lazy I/O. The learning curve increases a little bit with doing it that way, though. Steve On 13/05/10 05:15, Ashish Agarwal wrote:
Thanks for all the advice. It seems I should avoid making my types instances of Binary. In fact, this relates to another design issue I've been grappling with, supporting multiple versions of the protocol. Probably I'll need something like, putProtVersion1, putProtVersion2, etc. Or something along this line.

Thanks for the reference. Sounds like a neat idea so I'll definitely check it out! Right now, there are lots of complications in designing this library, at least if I want a nice API. I have lots and lots of type definitions to encode the various constructs of the protocol I'm working with. New versions could make arbitrary changes: add or remove a field to a record, change the type of a record field, add or remove one of several possible cases (which I encode with sum types), and so on. It's not at all clear to me yet how I should factor my type definitions, and/or use other features such as polymorphic types, type classes, etc. But I guess this is a different thread. On Wed, May 12, 2010 at 4:35 PM, Stephen Blackheath [to Haskell-Beginners] < mutilating.cauliflowers.stephen@blacksapphire.com> wrote:
Ashish,
I've been using XML picklers for serialization, which you might want to consider if you can handle a little extra CPU/data cost. I'm the author of hexpat-pickle, so this is a plug for my package. I lifted the idea wholesale from the HXT package.
Handling changes in protocol version is really easy. I've been doing a lot of
xpSomething = new `xpTryCatch` old where new = ... old = ...
If the new encoding fails to parse, it tries the old one, and on transmission it uses the new one. I find it very quick and convenient to bang out a new pickler now that I'm familiar with it. You write your pickler and your unpickler with the same code - it works well.
I am also working on hexpat-iteratee which is a lot more socket friendly since it doesn't use Haskell's lazy I/O. The learning curve increases a little bit with doing it that way, though.
Steve
On 13/05/10 05:15, Ashish Agarwal wrote:
Thanks for all the advice. It seems I should avoid making my types instances of Binary. In fact, this relates to another design issue I've been grappling with, supporting multiple versions of the protocol. Probably I'll need something like, putProtVersion1, putProtVersion2, etc. Or something along this line.
Beginners mailing list Beginners@haskell.org http://www.haskell.org/mailman/listinfo/beginners

On Tuesday 11 May 2010 19:00:43, Ashish Agarwal wrote:
Thanks for your responses.
Only it might give rise to confusion if somebody wants to transmit those types according to another protocol.
So are Binary instances perceived to be just for (de)serializing from/to Haskell?
Sort of. A Binary instance should be generic (for an unspecified value of 'generic'). Binary instances are for making the (de)serislisation of types using the type in question easy (and are expected to have as little overhead as possible - at least, that's what I expect).
Would it be better style for me to define a new type class with methods getProt and putProt, where Prot is whatever protocol I'm supporting?
You don't need to define a new class, you could just use putProt and getProt directly. However, if you are reasonably sure that (de)serialising your type according to a different protocol is exceptional, go ahead and make a Binary instance. If there are several standard protocols for a type, an instance using one could be confusing.
numbers are always big-endian
I have been wondering about the difference in put/get instances of Word types versus the getWord8be, getWord8le, etc. functions. Since the default instances use big-endian, is there any difference between the following:
get (0::Word8) getWord8be 0
Make that put (0 :: Word16) putWord16be 0 ? -- Words8s are written as bytes instance Binary Word8 where put = putWord8 get = getWord8 -- Words16s are written as 2 bytes in big-endian (network) order instance Binary Word16 where put = putWord16be get = getWord16be [analogously for other WordN types]. With optimisations, there should be no difference, without, the difference would be one dictionary lookup, I think.
What exactly are the guarantees for Binary instances, only that get and put are inverses?
With hand-written instances, not even that :)
Are all other features possibly different between compiler versions or different implementations of Haskell?
No, serialised values should be portable, i.e. if you serialise a value using implementation x and deserialise it using implementation y, you ought to get the same value.
participants (5)
-
Ashish Agarwal
-
Daniel Fischer
-
Stephen Blackheath [to Haskell-Beginners]
-
Stephen Tetley
-
Tom Hobbs