Re: [Haskell-cafe] Information on B-Tree IO implemenations

7 Nov 2005

      Scott Weeks  writes:
...
For a project that I'm working on I need to implement a B-Tree or 
similar disk based data structure for storing large amounts of data. I 
feel like I'm getting it and then I get hung up on disk IO operations 
and trying to figure out how they should fit into the mix.
Here is a first-cut specification of a B-Tree:

    data BTree k d = BTree Int [(k,[d])] [BTree k d]

where k = key and d = stored data.  Obviously this does not actually
store the B-Tree on file, it just gives a flavour of the intended
structure.

In order to store the data on file, we need to introduce an indirection
corresponding to a file pointer, on each of the child nodes.

    data BTree k d = BTree Int [(k,[d])] [FilePtr (BTree k d)]

Then you need a serialisation routine to write any one node of the
tree out and read it back in.  And your tree-traversal routines
(lookup, insert, etc) need to be embedded in the IO monad, so they
can do the necessary file access.

Attached is an implementation of B-Trees (by Colin Runciman), which
uses nhc98's Binary class
    http://haskell.org/nhc98/libs/Binary.html
for serialisation.  The type BinHandle corresponds to a file handle,
and Bin corresponds to a file pointer.  In this code, in fact we
managed to keep the lookup function pure, because the Binary interface
allows referentially-transparent I/O (getFAt), which may not be
possible in other serialisation libraries.  This facility however
depends on an invariant that the B-Tree can only be extended - there
is no operation to remove nodes or data.  However the insertion
routine clearly must use I/O to extend the tree.  I hope you can see
the pattern of how this I/O works.

Note also that the B-Tree structure is stored in BPages, separate
from the data values hanging off the tree, which are in DBlocks.

Hope this helps,
Regards,
    Malcolm

Re: [Haskell-cafe] Information on B-Tree IO implemenations

Malcolm Wallace