RE: Persistant (as in on disk) data

| (c) how do we derive instances of Binary? If you guys can agree an interface that GHC, nhc and Hugs can all support, I'll gladly do the 'deriving' stuff to make 'deriving Binary' work for GHC. What's always inhibited me is that there isn't a single agreed interface. For most users, having a library that works across all Haskell implementations and platforms is much more important than having the most efficient possible library. But (the possibility of an) efficient implementation has to be a goal, just not the only goal. If GHC can't use it directly for interface files, so be it. Simon

hello, thanks for your replies. i browsed thrugh the discussion on the libraries list, but it mainly seems to discuss if one should use bits or bytes in the binary representation. not that this is not important (my personal preference is to be fast rather then small, within reason), but i was more interested in what these functions should do. unfortunately i couldn't quite figure that out from the discussion there. in particular, i was thinking that this dumping facility should preserve sharing and support cyclic data. as such, i don't think one can write it in Haskell, as presumbably sharing is not observable from within the language. this is why the "deriving" bit seems essential - the compiler can perform some magic. bye iavor Simon Peyton-Jones wrote:
| (c) how do we derive instances of Binary?
If you guys can agree an interface that GHC, nhc and Hugs can all support, I'll gladly do the 'deriving' stuff to make 'deriving Binary' work for GHC. What's always inhibited me is that there isn't a single agreed interface.
For most users, having a library that works across all Haskell implementations and platforms is much more important than having the most efficient possible library. But (the possibility of an) efficient implementation has to be a goal, just not the only goal. If GHC can't use it directly for interface files, so be it.
Simon
-- ================================================== | Iavor S. Diatchki, Ph.D. student | | Department of Computer Science and Engineering | | School of OGI at OHSU | | http://www.cse.ogi.edu/~diatchki | ==================================================

I'd not been following this discussion, but now it seems it's gotten to instances of the Binary module. I figured I'd chime in briefly:
thanks for your replies. i browsed thrugh the discussion on the libraries list, but it mainly seems to discuss if one should use bits or bytes in the binary representation. not that this is not important (my
The bits/bytes argument was largely because the NHC library supported Bits and the GHC library supported Bytes. In order ot have a common library, we wanted to support both.
personal preference is to be fast rather then small, within reason), but i was more interested in what these functions should do. unfortunately i couldn't quite figure that out from the discussion there.
Basically, write arbitrary data to a file in a binary fashion, or to a memory location (as in BinMem).
in particular, i was thinking that this dumping facility should preserve sharing and support cyclic data. as such, i don't think one can write
I'm not convinced that the binary library should "natively" support cyclic data. I think that if saying: print x would not terminate, then there's no reason that puts bh x should terminate. I like to think of "puts" as a binary version of print. (That is, of course, unless the instance writer for the Binary/Show instances of the type of x is smart enough to not keep writing the same thing over and over again.) I would challenge the interested party to write a Show instance of String which wouldn't loop indefinitely on "repeat 'x'". If the user has some cyclic data structure and they want to be able to write it in binary form, it should be on their shoulders to do it correctly, not on the library's. So essentially, I believe 'deriving Binary' should work identically to 'deriving Show', except using a binary rep instead of a string rep.
it in Haskell, as presumbably sharing is not observable from within the language. this is why the "deriving" bit seems essential - the compiler can perform some magic.
I assume you mean something like: let x = ...some really large structure... y = [x,x] in puts bh y then the size of what is written is |x+c| not |2x| for some small c? If so, then I don't believe this can be implemented in the language; it would have to be in the compiler. I see this as unlikely of happening because it would mean that all compilers would have to implement this identically and some might not handle sharing the same manner. It might be nice, but again, I see this as something you could do yourself if you really want it (i.e., replace this function with: let x = ... in puts bh 2 >> puts bh x or something like that, when you can -- and obviously you won't always be able to.) - Hal

I'm not convinced that the binary library should "natively" support cyclic data. I think that if saying:
print x
would not terminate, then there's no reason that
puts bh x
should terminate. I like to think of "puts" as a binary version of print. (That is, of course, unless the instance writer for the Binary/Show instances of the type of x is smart enough to not keep writing the same thing over and over again.) I would challenge the interested party to write a Show instance of String which wouldn't loop indefinitely on "repeat 'x'". well, it is your choice to think of it as you like, but this is not what my original mail was about. i think the ability to make data persistant is a useful one and it should be as transperant to the programmer as
hello, possible. when i write something like: ones = 1 : ones i don't think of "printing infinately many ones in memory" and i don't see why i should start thinking of it that way just because i want to make the object persistant. after all, one can think of the disk as a verys low memory.
If the user has some cyclic data structure and they want to be able to write it in binary form, it should be on their shoulders to do it correctly, not on the library's. why is that? i thought the whole point of having nice tools is that you don't need to write mindless stuff and concentrate on the important bits of your program. i don't have to worry much about sharing and cyclic data when i program in Haskell (i.e. it just happens), why should i suddenly start to worry about that if i want to make something persistant across executions of my program.
So essentially, I believe 'deriving Binary' should work identically to 'deriving Show', except using a binary rep instead of a string rep. something like that could be useful, but with drift and the atrem library one can already do some of that. and the aterm library is a reasonably portable way to represent terms. this is definately not what i had in mind in my original post.
it in Haskell, as presumbably sharing is not observable from within the language. this is why the "deriving" bit seems essential - the compiler can perform some magic.
I assume you mean something like:
let x = ...some really large structure... y = [x,x] in puts bh y
then the size of what is written is |x+c| not |2x| for some small c? If so, then I don't believe this can be implemented in the language; it would have to be in the compiler. this is what i meant by compiler magic.
I see this as unlikely of happening because it would mean that all compilers would have to implement this identically and some might not handle sharing the same manner. different implementations do not need to implement sharing in the same way. they need to understand a common format. i am not saying designing such a format is easy, in fact things like: nats = 0 : map (+1) nats seem tricky as they involve functions. but persitance is useful.
in fact as a beginning i was hoping for something that works in say GHC, and won't be too hard to implement. actually i thought it might already exist, but i guess not. bye iavor
participants (4)
-
diatchki@cse.ogi.edu
-
Hal Daume III
-
Iavor S. Diatchki
-
Simon Peyton-Jones