
Sorry, I know there was an ongoing discussion on this topic somewhere, but I can't find it, so I'll have to hope this list is the most appropriate. The general problem to solve is "How to we convert things to and from a binary format, so they can be efficiently written to and from disk". I have solved this (incompletely) twice myself, I expect Glasgow Haskell has solved it too, there really ought to be a standard solution. From the deficiencies of my incomplete solutions I conclude: (1) you don't want to force everything to go via single Haskell characters. This is horrendously inefficient when what you want is to blast in and out large quantities of binary data, and of course that's precisely where you probably want efficiency. (2) you don't want a binary converter only to be able to write to Handles. I've found myself that it's useful to be able to convert to (for example) vectors of bytes. Now my idea is that this be implemented using monads. For example, consider the problem of writing out data to be converted into binary. We might consider there to be two primitive operations, based on the types Byte (a single Byte) and Bytes (an array of bytes). (I'm not sure exactly what these types should actually be in GHC.) The consumer of binary data should provide a record like data WriteBinaryData m = WriteBinaryData { writeByte :: Byte -> m (), writeBytes :: Bytes -> m () } Then something which can be written out in a binary form would instance the class class HasBinaryWrite a where writeA :: Monad m => WriteBinaryData m -> a -> m () The advantage of this approach is that an instance of HasBinary can both be written to a file (m = IO) and to a byte vector. Also this should be fairly efficient I think, and it should be easy to build up more complex instances of binary. The converse functions would likewise use monads data ReadBinaryData m = ReadBinaryData { readByte :: m Byte, readBytes :: Int -> m Bytes } class HasBinaryRead a where readA :: Monad m => ReadBinaryData m -> m a and again we can instance this either for files or for byte vectors, and again building up more complex instances should be easy enough. I wonder if this is the best way of doing this kind of thing, and if so should it be implemented and put in the standard libraries? Also, what should Byte and Bytes actually be?

George Russell wrote:
Sorry, I know there was an ongoing discussion on this topic somewhere, but I can't find it, so I'll have to hope this list is the most appropriate.
The general problem to solve is "How to we convert things to and from a binary format, so they can be efficiently written to and from disk". I have solved this (incompletely) twice myself, I expect Glasgow Haskell has solved it too, there really ought to be a standard solution.
From the deficiencies of my incomplete solutions I conclude: (1) you don't want to force everything to go via single Haskell characters. This is horrendously inefficient when what you want is to blast in and out large quantities of binary data, and of course that's precisely where you probably want efficiency. (2) you don't want a binary converter only to be able to write to Handles. I've found myself that it's useful to be able to convert to (for example) vectors of bytes.
Possibly the best way to handle this would be to extend the Storable
class, e.g.
class Storable a where
...
hPutObj :: Handle -> a -> IO ()
hGetObj :: Handle -> IO a
toByteList :: a -> [Word8]
fromByteList :: [Word8] -> a
toByteArray :: a -> Array Int Word8
fromByteArray :: Array Int Word8 -> a
At a minimum, it would be useful to at least have standard low-level
read/write functions, e.g.
hWrite :: Handle -> Ptr a -> IO ()
hRead :: Handle -> Ptr a -> IO ()
The rest could then be constructed using the existing Storable
methods.
That still leaves the effort of [un]marshalling boxed values, but that
may be unavoidable.
--
Glynn Clements

On Mon, May 19, 2003 at 07:21:00PM +0100, Glynn Clements wrote:
George Russell wrote:
The general problem to solve is "How to we convert things to and from a binary format, so they can be efficiently written to and from disk".
Possibly the best way to handle this would be to extend the Storable class, e.g.
class Storable a where ... hPutObj :: Handle -> a -> IO () hGetObj :: Handle -> IO a toByteList :: a -> [Word8] fromByteList :: [Word8] -> a toByteArray :: a -> Array Int Word8 fromByteArray :: Array Int Word8 -> a
Hmmm, I thought that Foreign.Storable.Storable is meant for different purposes, namely for accessing C data in memory and for marshalling between Haskell and C worlds. Are you sure that these functionalities must overlap? Extending Storable would require to extend its every existing instance. It might be better to introduce some subclass, but I'm not sure that every type supporting marshalling to and from binary format can support operations from Storable. It's quite easy to write value of type [[Int]] to disk, but how to define alignment and sizeOf methods for this type? Best regards, Tom -- .signature: Too many levels of symbolic links

Tomasz Zielonka wrote:
The general problem to solve is "How to we convert things to and from a binary format, so they can be efficiently written to and from disk".
Possibly the best way to handle this would be to extend the Storable class, e.g.
class Storable a where ... hPutObj :: Handle -> a -> IO () hGetObj :: Handle -> IO a toByteList :: a -> [Word8] fromByteList :: [Word8] -> a toByteArray :: a -> Array Int Word8 fromByteArray :: Array Int Word8 -> a
Hmmm, I thought that Foreign.Storable.Storable is meant for different purposes, namely for accessing C data in memory and for marshalling between Haskell and C worlds. Are you sure that these functionalities must overlap?
Extending Storable would require to extend its every existing instance. It might be better to introduce some subclass, but I'm not sure that every type supporting marshalling to and from binary format can support operations from Storable.
It's quite easy to write value of type [[Int]] to disk, but how to define alignment and sizeOf methods for this type?
Sorry, I'd overlooked the fact that Storable was limited to fixed-size
types.
In any case, reading/writing data to memory and to disk aren't all
that different; the existence of mmap() relies upon that fact. And,
mmap() aside, on most OSes, all I/O comes down to reading and writing
blocks of memory (i.e. read() and write()); everything else is just
additional layers.
I still think that we need a superclass of Storable, as any instance
of Storable can be read and written with e.g.:
hPutObj :: (Storable a) => Handle -> a -> IO ()
hPutObj h x = do
alloca $ \ptr -> do
poke ptr x
fd <- handleToFd h
let bptr = castPtr ptr :: Ptr CChar
let blen = fromIntegral $ sizeOf x
write fd bptr blen
hGetObj' :: (Storable a) => Handle -> a -> IO a
hGetObj' h dummy = do
alloca $ \ptr -> do
let blen = fromIntegral $ sizeOf dummy
let bptr = castPtr ptr :: Ptr CChar
fd <- handleToFd h
read fd bptr blen
x <- peek ptr
return x
hGetObj :: (Storable a) => Handle -> IO a
hGetObj h = hGetObj' h undefined
A similar process can be used to convert instances of Storable to or
from a list (or array) of bytes.
--
Glynn Clements

Glynn Clements
George Russell wrote:
The general problem to solve is "How to we convert things to and from a binary format, so they can be efficiently written to and from disk". I have solved this (incompletely) twice myself,
Possibly the best way to handle this would be to extend the Storable class, e.g.
class Storable a where : [snip] : toByteArray :: a -> Array Int Word8 fromByteArray :: Array Int Word8 -> a
I wonder if this could be used to address a question (or perhaps "frustration" is a better term), namely that of efficient packing of data structures in memory. Would it be possible to separate binary I/O in two layers, the binary layer converting to/from (U?)Arrays of Word8, and another, the I/O layer, reading and writing such arrays? Something almost entirely, but not quite, unlike: class Binary a where toBin :: a -> Array Int Word8 toBinList :: [a] -> Array Int Word8 fromBin :: Array Int Word8 -> a -- perhaps with offset? fromBinList :: Array Int Word8 -> [a] hPutArray :: Handle -> Array Int Word8 -> IO () hGetArray :: ... writeBinaryFile :: FilePath -> Array Int Word8 -> IO () : -kzm -- If I haven't seen further, it is by standing in the footprints of giants

On 20 May 2003 09:07:24 +0200 ketil@ii.uib.no (Ketil Z. Malde) wrote:
Glynn Clements
writes: George Russell wrote:
The general problem to solve is "How to we convert things to and from a binary format, so they can be efficiently written to and from disk". I have solved this (incompletely) twice myself,
Possibly the best way to handle this would be to extend the Storable class, e.g.
class Storable a where : [snip] : toByteArray :: a -> Array Int Word8 fromByteArray :: Array Int Word8 -> a
I wonder if this could be used to address a question (or perhaps "frustration" is a better term), namely that of efficient packing of data structures in memory.
Would it be possible to separate binary I/O in two layers, the binary layer converting to/from (U?)Arrays of Word8, and another, the I/O layer, reading and writing such arrays?
Something almost entirely, but not quite, unlike:
class Binary a where toBin :: a -> Array Int Word8 toBinList :: [a] -> Array Int Word8 fromBin :: Array Int Word8 -> a -- perhaps with offset? fromBinList :: Array Int Word8 -> [a]
hPutArray :: Handle -> Array Int Word8 -> IO () hGetArray :: ... writeBinaryFile :: FilePath -> Array Int Word8 -> IO () :
-kzm
What's wrong with NHC's Binary/BinArray library? There seems to be a GHC port on the GreenCard page.

Ketil Z. Malde wrote (snipped)
Would it be possible to separate binary I/O in two layers, the binary layer converting to/from (U?)Arrays of Word8, and another, the I/O layer, reading and writing such arrays?
I think the proposal I posted is better because it allows you to mix up both single Word8's and arrays. In my experience, structures to be exported typically contain both. For example, I pack integers as a sequence of bytes (so that integers between -64 and 63 can be represented by one byte, for example). Constructing singleton arrays for each of these bytes would be tiresome, and presumably not as efficient as writing them directly into the target memory area. Here are two examples of cases where I want to write binary data to memory, not to a Handle: (1) Suppose I maintain a dictionary indexed by Strings (say) whose values belong to different types with binary representations, and I want this dictionary too to have a binary representation. Whenever I look something up in the dictionary I know what type I'm expecting, but I don't know what types to expect when decoding the dictionary. Then the most practical encoding seems to be as [(String,Bytes)], where each Bytes contains an encoded entry in the dictionary. (2) If I want to process binary data in some way before sending it to the outside world, for example by compressing it using zlib (the API to the GZIP algorithm), or by encrypting it.
participants (5)
-
Derek Elkins
-
George Russell
-
Glynn Clements
-
ketil@ii.uib.no
-
Tomasz Zielonka