
Hi All, I'm hacking some (external) B-Tree code, and amongst the numerous interesting problems I've come up against[*], is to do with managing which pages/nodes are in memory and which are not. We use TVars to point from one (in-memory version of a) page to another, so we have a type like the following: data Node = Leaf [(Key,Value)] | Node NodePtr [(Key,NodePtr)] type NodePtr = TVar (Maybe Address, Maybe Node) type Address = Integer Addresses are just page numbers or file offsets, or similar. NodePtr is the interesting bit - the following table summarizes what the combination of values mean: explain (Just addr, Nothing) = "An external node we have not read in yet" explain (Nothing, Just node) = "A new node for which we have not yet allocated a page" explain (Just addr, Just node) = "A node we've read in" explain (Nothing, Nothing) = "Emit nasal daemons" Now, we want a cache to hang on to nodes that we've read in to hold them in memory, to save repeatedly reading them off disk unnecessarily. But more than that, we want to make sure that when we create a NodePtr pointing to a particular Address, that all references to this particular address go through the same NodePtr, otherwise in a concurrent situation we could well end up with two copies of the page in memory, both recieving updates leading to havoc. At the same time, we'd like to be able to evict pages according to some policy (the Oracle hot-list cold-list cache is attractive, if implemented carefully). So one scheme that comes to mind is to keep a table mapping addresses to (Weak NodePtr). When we ask for a node, first we consult the cache (if we have one) and get the NodePtr right back. If the addressed node is not in the cache then we check the table to see if there is/was a known NodePtr for that address. If there is none then we can allocate a new one and stick it in the table, and if it is there then we return it. This is also nice because we can attach finalizers to the (Weak NodePtr)s to write them out if dirty. So the question is, is there a simpler scheme (though this one isn't tooooo bad), and more importantly, is this likely to be horribly slow? How efficient is dereferencing a weak reference, and what is the likely impact on performance of having 10^3-10^5 weak references around? cheers, Tom [*]Film at 11. I'm planning to write it all up, maybe for a conference, maybe just for the Monad Reader, or something.

So I just coded up the approach. It compiles, so I assume it works as intended. ;-) Comments and clever obervations extremely welcome. cheers, T.
participants (1)
-
Thomas Conway