AltBinary library features: - interface compatibility with Binary library - compatibility both with Hugs and GHC, with GHC-specific speed optimizations - (de)serialization speed of 20-60 mb/sec on 1GHz CPU - support for byte-aligned and bit-aligned, low-endian and big-endian serialization using the same interface - free intermixing of text and binary i/o on the same handle - UTF8 encoding for strings, variable-length encoding for Int/Word/Integer (including encoding of list/array bounds) - fixed-size integral values use network byte order; any integral value can be saved with explicitly specified size using functions "putBits bh bits value" and "getBits bh bits" (putBit/putWord8...putWord64 is also supported for all integral types) - implementation of Binary interface for all Bounded Enum types - implementation of Binary interface for all array types - ability to serialize data to any Stream the following is the guide to using library, containing the following parts: 1. emulation of Binary interface 2. AltBinary interface 2.1 byte-aligned and bit-aligned streams 2.2 getBits/putBits; Binary instances for Bool, Maybe, Either 2.3 putWord8..putWord64; Binary instances for Int8..Word64; putWord##le 2.4 putBounded; Binary instances for Bounded Enum types 2.5 putUnsigned/putInteger/putLength; Binary instances for Int/Integer/Word 2.6 Binary instances for Char and String 2.7 lists support 2.8 arrays support 2.9 putGhcInteger and putByteArray - GHC-specific routines 2.10 defining Binary instances for custom serialization formats 3. Stream interface >1. emulation of Binary interface This library implements 2 interfaces: Binary and AltBinary. First interface allows to use this library as drop-in replacement for the well-known Binary and NewBinary libs. all you need to do is to replace "import Data.Binary" statement with either import Data.Binary.ByteAligned or import Data.Binary.BitAligned depending on what type of access you need. in the first case representation of any data value will be written/read as the whole number of bytes, in the second case data values may cross byte boundaries and, for example, Bools will be packed 8 values per byte. please draw attention that despite interface emulation this library and original Binary lib use different representations for most of the data types >2. AltBinary interface let s = encode ("11",123::Int,[1..10::Int]) print (decode s::(String,Int,[Int])) >2.1 4 types of binary streams AltBinary is "native" interface of this library to (de)serialize data. it provides the same operations `get` and `put_` to read/write data, but allows to use them directly on Handles and any other streams: import Data.AltBinary h <- openBinaryFile "test" WriteMode put_ h [1..100::Int] hClose h h <- openBinaryFile "test" ReadMode x <- get h :: IO [Int] print x if you need bit-aligned serialization, use the `openBitAligned` stream transformer: h <- openBinaryFile "test" WriteMode >>= openBitAligned put_ h "string" put_ h True vClose h of course, to read these data you also need to use `openBitAligned`: h <- openBinaryFile "test" ReadMode >>= openBitAligned x <- get h :: IO String y <- get h :: IO Bool print (x,y) The above code writes data in big-endian format, if you need to use low-endian formats, use the following transformers: h <- openBinaryFile "test" WriteMode >>= openByteAlignedLE and h <- openBinaryFile "test" WriteMode >>= openBitAlignedLE for the byte-aligned and bit-aligned access, respectively. You can also mix the binary and text i/o at the same stream, with only one requirement: use "flushBits h" after you used stream for some bit-aligned I/O: h <- openBinaryFile "test" WriteMode >>= openBitAligned put_ h True flushBits h vPutStr h "string" vClose h it's also possible to use different types of binary atreams on top of one Stream: h <- openBinaryFile "test" WriteMode bh <- openBitAligned h put_ bh True flushBits bh bh <- openByteAlignedLE h vPutStr bh "string" vClose h ... if you will ever need this :) >2.2 getBits/putBits; Binary instances for Bool, Maybe, Either `get` and `put_` operations are just enough if you need only to save some values in Stream and then restore them. but to assemble/parse data in some particular format, you will need some more low-level functions, such as `getBits` and `putBits`, which transfers just the specified number of bits: putBits 32 h (123::Int) x <- getBits 32 h :: IO Int if you call on byte-aligned stream putBits with number of bits, what is not divisible by 8, the whole number of bytes are occupied. in particular, putBit on byte-aligned streams occupies entire byte this makes possible to use the same (de)serialization code and in particular the same definitions of Binary instances both for byte-aligned and bit-aligned streams! for example, the following definition: instance Binary Bool where put_ h x = putBit h $! (fromEnum x) get h = do x <- getBit h; return $! (toEnum x) allows to encode Bool values with just one bit in bit-aligned streams, but uses the whole byte in byte-aligned ones. further, serialization code for Maybe types uses Bool values: instance Binary a => Binary (Maybe a) where put_ bh (Just a) = do put_ bh True; put_ bh a put_ bh Nothing = do put_ bh False get bh = do flag <- get bh if flag then do a <- get bh; return (Just a) else return Nothing as a result, representation of `Maybe a` uses just one more bit than representation of type `a` in bit-aligned streams, and whole extra byte otherwise. the same story is for Either types >2.3 getWord8..putWord64; Binary instances for Int8..Word64 most widespread uses of getBits/putBits is for 1/8/16/32/64 bits, and so there are specialized (and sometimes more efficient) versions of these functions, called putBit, putWord8...putWord64 (and of course their get... counterparts). please draw attention that all these functions accept arguments (or return values) of any Integral type (i.e. types what are instances of Integral class - Int, Integer, Word, Int8..Word64), so you don't need to convert types if you want, for example, encode Int as 8-bit value: putWord8 h (length "test") these fixed-bits routines used in definitions of Binary instances for types with fixed sizes - Int8...Word64. types Int, Word and Integer by default uses variable-sized representation, which would be described later. if you need to read or write values of these types using fixed-size representation, use appropriate fixed-bits procedures instead of get/put_: putWord16 h (1::Int) putWord32 h (2::Word) putWord64 h (3::Integer) the same rule applies if you need to write fixed-size value with non-default number of bits: putWord8 h (4::Int32) functions putWord16..putWord64 uses big-endian representation, also known as network byte order - it is the order of bytes, used natively on PowerPC/Sparc processors. in this format, representation of value started fom most significant bytes. if you use bit-aligned stream, high bits of each byte are also filled first. if you need little-endian (native for Intel processors) formats, putWord16le..putWord64le is at your service >2.4 putBounded; Binary instances for Bounded Enum types next pair of functions uses mininal possible number of bits to encode values in given range [min..max]: putBounded min max h x x <- getBounded min max h they also support values of any Integral type. These functions are used to provide default Binary instances for all Bounded Enum types (i.e. types which support both Bounded and Enum interfaces). for example, you can declare: data Color = Red | Green | Blue deriving (Bounded, Enum) and now you can use get/put_ on Colors; Color values would be encoded using 2 bits in bit-aligned streams (of course, whole byte would be used in byte-aligned streams) >2.5 putUnsigned/putInteger/putLength; Binary instances for Int/Integer/Word putUnsigned provides variable-sized encoding, what can be used to represent any non-negative Integral value using minimal possible number of bytes. it uses 7+1 encoding, i.e. 7 bits in each byte represents bits of actual value, and higher bit used to distinguish last byte in sequence. so, values in range 0..127 would be encoded using one byte, values in range 128..2^14-1 - using two bytes and so on putInteger is about the same, but allows to encode also negative values, so -64..63 encoded with one byte, -2^13..2^13-1 - with two bytes... putLength is synonym for putUnsigned, just used to represent lengths of various containers - strings, lists, arrays and so on put_ uses putInteger to encode Int and Integer, and putUnsigned to encode Word; i don't used fixed-size representation for Int and Word because that will produce data incompatible between 32-bit and 64-bit platforms. i also don't use internal GHC's representation of Integer to speed up (de)serialization because that will produce data incompatible with other Haskell compilers. but if you need to (de)serialize large number of Integers quickly, you should use putGhcInteger/getGhcInteger procedures, described later. of course, this way your program will become compatible only with the GHC compiler. >2.6 Binary instances for Char and String >2.7 lists support >2.8 arrays support This library supports (de)serialization for all array types, included in standard hierarchical libraries plus PArr arrays, supported only by GHC. Immutable array types can be (de)serialized to any Stream (just like lists); mutable arrays can be (de)serialized only in the corresponding monad (where this array can be read/modified), i.e. IOArray can be get/put only to Stream belonging to IO monad, STArray can be get/put only to Stream belonging to the same state monad. all that is done automatically, just use put_ or get operation on the corresponding array if you read an array, you may need (or don't need, depending on the surrounding code) to specify its type explicitly, say: arr <- get h :: IO (Array Int Int32) besides of automatic support for all array types in put_/get operations, there are also huge number of "low-level" array (de)serialization routines. first, there are routines putIArray h arr putMArray h arr what can be used to write to the Stream any array that is instance of IArray or MArray class, correspondingly (the first class contains all immutable arrays: Array, UArray, DiffArray, DiffUArray; the second - all other, mutable arrays - IOArray, IOUArray, STArray, STUArray, StorableArray). corresponding operations to read these arrays require to explicitly pass them bounds of array read: arr <- getIArray h bounds arr <- getMArray h bounds note that this operations is not full analogues of put_/get ones, which are write and read array bounds automatically. these operations are more low-level - they are read/write only the array elements. also note that just like `get` operation, you may need to specify type of the array read: arr <- getIArray h (0,9) :: IO (Array Int Int32) second, you can read/write array elements with explicitly pointed (de)serialization procedure for array elements isstead of default ones, provided by the Binary class. to achive this, add `With` suffix to routine name and specify procedure to read or write array elements as the first argument: putIArrayWith putUnsigned h arr putMArrayWith (putBits 15) h arr arr <- getIArrayWith getWord8 h bounds arr <- getMArrayWith (getBounded 1 5) h bounds of course, you can also provide your own read/write procedures, if they have the same types as standard get/put_ functions. there are also variants of all get operations, which uses `size` parameter instead of `bounds`, and creates arrays with bounds (0,size-1::Int). they have names with `N` at the end of of procedure name, but before `With`: arr <- getIArrayN h 10 :: IO (Array Int Int32) arr <- getMArrayNWith getWord32 h 10 :: IO (IOArray Int Int) these operations in some way dubs the similar list procedures at last, part of the `get` operations have versions, specialized to specific type constructors. for example, `getMArrayN` have `getIOArrayN` and `getIOUArrayN` variants which can read only the IOArray/IOUArray, accordingly. it's just a trick to avoid necessity to specify array types in `get` operations, say instead of: arr <- getIArrayN h 10 :: IO (Array Int Int32) one can write arr <- getArrayN h 10 it is nothing more than handy shortcuts. the only exclusion is operations to read `UArray`, what is not specializations of corresponding `IArray` operations, but use some faster algorithm and work only in IO monad. if you need to read `UArray` in any other monad - please use general operations on the `IArray` instead (anyway the compiler will ensure proper use via the typechecking) so far i don't say anything about specific operations for (de)serialization of parallel arrays (available only in GHC via the module GHC.PArr). 2.9 putGhcInteger and putGhcByteArray - GHC-specific routines 2.10 defining Binary instances for custom serialization formats