
On 8/21/07, Ben
for sake of argument, suppose an enterprising haskell newbie wanted to code up concurrent b-trees (really b-link trees) in haskell. if i am understanding STM correctly, it will NOT help in any way with the implementation, because of the IO-intensive nature of the algorithms? so i will have to resort to the usual games with locks and latches?
I have produced exactly such an implementation in my day-job (so I can't, at this stage, give you the code, I'm afraid), but I'll happily give you some tips: 1. Investigate relaxed balance. BTrees with relaxed balance enable you to break up operations into much smaller transactions, which will reduce the amount of rerunning on transactions (big transactions are more likely to contain conflicts). Also, getting all the edge cases right is hard with strict balance. Especially in the presence of deletions. It is VASTLY simpler with relaxed balance, though there are a few little tricks. If it was too easy, it wouldn't be any fun (see 3, below). Hint: Although the on-disk version doesn't need or want parent pointers, you might want them for your in-memory version of pages. 2. Separate the IO from the BTree-stuff. Conceptually keep a <code>TVar (Map Address ByteString)</code>. In the transaction, use this to find pages. If the page is not there, throw an exception containing the desired address. In a wrapper, catch the exception, read the page, add it to the map as a separate transaction then retry the original transaction. I say "conceptually" because something like <code>TArray Address (Maybe ByteString)</code>, or similar will yield much better concurrency. In general, you want to push the TVars down as far as possible. 3. Have Fun STM is very cool, so make sure you enjoy making it all hang together. :-)