
Hello John, Wednesday, April 19, 2006, 3:27:49 AM, you wrote:
if that is due to the time of reading .hi files, my alternative Binary library should help in some future
Interesting, A big bottleneck
big bottleneck? ;)
in jhc right now is reading the (quite large) binary ho and hl files on startup. a few things I have wanted out of a binary library are:
I was going to get around to writing this sometime, but perhaps there is room for a collaborative project in there. Is your code available somewhere bulat?
http://freearc.narod.ru/Streams.tar.gz http://haskell.org/haskellwiki/Library/Streams but this doc don't contain info about Binary part that is now discussed. i attached to the letter my unfinished docs about this part of library now about your requirements:
* mmap based reading.
my Streams library mainly consists of two parts - Streams and AltBinary. The streams part implements Handle-like interface (including such functions as vGetChar, vGetByte, vPutBuf, vSeek and so on) for various data sources - files, memory buffers, pipes, strings. m/m files support is planned but now has just preliminary implementation AltBinary part works via the Streams part. basically, it just implements various ways to convert data structure to the sequence of vPutByte operations (and vice versa), with support for lists, arrays and all other "simpler" datatypes that Haskell/GHC provides to us. Binary instances for other datatypes can be autogenerated via DrIFT or TH
* being able to jump over unneeded data, as in go directly to the 112th record, or the third field in a data structure without having to slurp through everything that came before it.
what should be the user interface? the lib (its Streams part) supports vSeek/vTell operations. skipping to 112th record without knowing it's exact location will be impossible if each record can have different size the following things imho should not be a part of Binary library itself, but a higher-level client code
* the ability to create a hash of the structure of the underlying data type, to verify you are reading data in the right format.
you mean that using signature is not enough, or to be exact - that library should generate this signature itself? interesting. i think that for jhc (and potentially ghc) this should be implemented via DrIFT?
* extensible type-indexed sets (implemented hackily in Info.Binary in jhc)
by creating hash of structure we can reduce this task to just ordinary hash-like database?
* VSDB[1] style ACID updates as an option. * VSDB style write-time optimized constant hash table. I don't mind spending extra time when writing library files to speed up their usage.
i don't understand second thing. but anyway you already implemented VSDB database. you already has the way to autogenerate Binary instances. my lib can help by making serialization faster and providing uniform access to various media (files, buffers, m/m files). i can also work on hash-of-structure implementation using DrIFT or TH -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com