Re: Converting things to and from binary

I'm going to chime in here for the first time. There was a *long* discussion of this on the libraries list a few months back. See http://haskell.org/pipermail/libraries/2002-November/000691.html and the discussion that followed. I think that we should (a) move this discussion to the libraries list and (b) work from the point reached by those discussions - Hal -- Hal Daume III | hdaume@isi.edu "Arrest this man, he talks in maths." | www.isi.edu/~hdaume On Tue, 20 May 2003, George Russell wrote:
Derek wrote (snipped)
What's wrong with NHC's Binary/BinArray library? There seems to be a GHC port on the GreenCard page.
The home page of the York Binary library seems to be http://www.cs.york.ac.uk/fp/nhc13/libs/Binary.html Malcolm will doubtless have something to say here. However I would say not that there's something "wrong" with the Binary/BinArray library, but that it seems to be addressed towards a different class of problems (specifically data-compression) to those I am interested in (blasting data rapidly in and out). The main differences are (1) the framework I proposed in the original message: http://haskell.org/pipermail/glasgow-haskell-users/2003-May/005166.html is byte-based, while the York Binary framework is bit-based. I would imagine that this means the York Binary framework would be very much less efficient at handling long sequences of bytes, since they will presumably have to be shifted before being written to the destination. (2) the York Binary library uses the IO monad, and presumably various variables within a BinHandle, to keep track of state. I think this is unnecessary, for example I don't think the process of converting a value to a byte array should really have to go through IO. We are supposed to be functional programmers after all. (3) the York binary library provides two things you can write bits to (a Handle, and a fixed area in memory) and a large set of operations (seek and co), but it would be difficult for a normal programmer to extend this. (For example, what about someone in GHC wanting to write to a Posix Fd?) On the other hand the framework I propose has only two basic operations for writing, and two for reading, which means it should be much easier to define alternative consumers and sources of binary data.
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

I'll respond to this post specifically now,
library, but that it seems to be addressed towards a different class of problems (specifically data-compression) to those I am interested in (blasting data rapidly in and out). The main differences are
I don't think that's necessarily true...read on...
(1) the framework I proposed in the original message: http://haskell.org/pipermail/glasgow-haskell-users/2003-May/005166.html is byte-based, while the York Binary framework is bit-based. I would imagine that this means the York Binary framework would be very much less efficient at handling long sequences of bytes, since they will presumably have to be shifted before being written to the destination.
Indeed, the new "general" version of the binary module supports both byte and bit operations. Several tests by SimonM and I have shown that the bit style is about 20% slower than the byte style.
(2) the York Binary library uses the IO monad, and presumably various variables within a BinHandle, to keep track of state. I think this is unnecessary, for example I don't think the process of converting a value to a byte array should really have to go through IO. We are supposed to be functional programmers after all.
What's wrong with the IO monad? :) More seriously, I think the idea of going to and from lists of Word8 is going to kill performance. Especially if you are going to write to a file in the end. You're going to have to go: data structure -> [Word8] -> File and likely the middle won't be deforested by any compiler. Whether you use functional arrays or lists is somewhat irrelevant -- functional arrays are also exceedingly slow.
(3) the York binary library provides two things you can write bits to (a Handle, and a fixed area in memory) and a large set of operations (seek and co), but it would be difficult for a normal programmer to extend this. (For example, what about someone in GHC wanting to write to a Posix Fd?) On the other hand the framework I propose has only two basic operations for writing, and two for reading, which means it should be much easier to define alternative consumers and sources of binary data.
This is true and I would say that this is the primary drawback of the design. Of course, (SimonM, please chime in here), it's probably possible to extend the library to support writing to Fds (I'm not sure though, due to the stateful stuff) but I agree that this is a fairly obtuse solution. That said, my understanding of what GHC uses the binary module for internally is to create large BinMems (things in memory, essentially large arrays) and then write those to Handles at the end. There's no reason we couldn't provide a 'BinMem -> IO (IOUArray Int Word8)' or something function that would allow you to peek at the data and write it to a Fd or do something else you wanted. I think this solves the problem of giving it to different consumers. A similar function could provide access for arbitrary producers. - Hal

On Tue, 20 May 2003 07:32:40 -0700 (PDT)
Hal Daume III
I'm going to chime in here for the first time.
There was a *long* discussion of this on the libraries list a few months back. See
http://haskell.org/pipermail/libraries/2002-November/000691.html
and the discussion that followed.
It also got picked up again a bit in March http://haskell.org/pipermail/libraries/2003-March/000782.html
participants (2)
-
Derek Elkins
-
Hal Daume III