
for sake of argument, suppose an enterprising haskell newbie wanted to code up concurrent b-trees (really b-link trees) in haskell. if i am understanding STM correctly, it will NOT help in any way with the implementation, because of the IO-intensive nature of the algorithms? so i will have to resort to the usual games with locks and latches? (using Ibrahim Jaluta, Seppo Sippu and Eljas Soisalon-Soininen. Concurrency control and recovery for balanced B-link trees. The VLDB Journal, Volume 14, Issue 2 (April 2005), Pages: 257 - 277, ISSN:1066-8888. as a source for b-tree algorithms.) take care, B

On 8/21/07, Ben
for sake of argument, suppose an enterprising haskell newbie wanted to code up concurrent b-trees (really b-link trees) in haskell. if i am understanding STM correctly, it will NOT help in any way with the implementation, because of the IO-intensive nature of the algorithms? so i will have to resort to the usual games with locks and latches?
I have produced exactly such an implementation in my day-job (so I can't, at this stage, give you the code, I'm afraid), but I'll happily give you some tips: 1. Investigate relaxed balance. BTrees with relaxed balance enable you to break up operations into much smaller transactions, which will reduce the amount of rerunning on transactions (big transactions are more likely to contain conflicts). Also, getting all the edge cases right is hard with strict balance. Especially in the presence of deletions. It is VASTLY simpler with relaxed balance, though there are a few little tricks. If it was too easy, it wouldn't be any fun (see 3, below). Hint: Although the on-disk version doesn't need or want parent pointers, you might want them for your in-memory version of pages. 2. Separate the IO from the BTree-stuff. Conceptually keep a <code>TVar (Map Address ByteString)</code>. In the transaction, use this to find pages. If the page is not there, throw an exception containing the desired address. In a wrapper, catch the exception, read the page, add it to the map as a separate transaction then retry the original transaction. I say "conceptually" because something like <code>TArray Address (Maybe ByteString)</code>, or similar will yield much better concurrency. In general, you want to push the TVars down as far as possible. 3. Have Fun STM is very cool, so make sure you enjoy making it all hang together. :-)

Thomas Conway wrote:
2. Separate the IO from the BTree-stuff.
Conceptually keep a <code>TVar (Map Address ByteString)</code>. In the transaction, use this to find pages. If the page is not there, throw an exception containing the desired address. In a wrapper, catch the exception, read the page, add it to the map as a separate transaction then retry the original transaction. I say "conceptually" because something like <code>TArray Address (Maybe ByteString)</code>, or similar will yield much better concurrency. In general, you want to push the TVars down as far as possible.
This type of idea has been discussed before, see [1]. This lead to the AdvSTM monad at [2]. The "Helper Thread Code" on that page provides
newtype AdvSTM a = AdvSTM (ReaderT Env STM a) deriving (Functor,Monad,MonadPlus,Typeable) type Env = (CommitVar,RetryVar) type CommitVar = TVar (IO ()->IO ()) type RetryVar = MVar (IO ()->IO ())
and MonadAdvancedSTM which gives onCommit and onRetry.
class MonadAdvSTM m where onCommit :: IO a -> m () onRetry :: IO a -> m ()
Any IO action passed to onCommit is scheduled to run if and and only if the current atomic attempt succeeds. This is the easy part and if that is all you need then the "Just onCommit" section on [2] is much simpler. Thus (orElseAdv fails works >> alsoWorks) will run all onCommit actions posted by works and alsoWorks but will not run any onCommit actions posted by the 'fails' piece (because when fails retries it will be rolled back and the change to the TVar will not be seen). The onRetry is sneakier and uses unsafeIOToSTM to put the scheduled action into a private MVar. The runAdvSTM uses a final orElse to detect when the whole action will retry and again uses unsafeIOToSTM to send the scheduled onRetry actions to the (perhaps newly spawned) helper thread before the retry. Thus (orElseAdv fails works >> alsoFails) will run all onRetry action posted by any of the three pieces fails, works, and alsoFails. The private MVar cannot be replaced with a private TVar since the 'fails' piece gets rolled back before running 'works'. Exception handling needs to be improved (e.g. bracket or finally in the retry helper thread and the ensure writeChan chan Nothing gets called if needed, etc.).
3. Have Fun
STM is very cool, so make sure you enjoy making it all hang together. :-)
[1] http://www.haskell.org/pipermail/haskell-cafe/2006-November/019771.html [2] http://haskell.org/haskellwiki/New_monads/MonadAdvSTM
participants (3)
-
Ben
-
ChrisK
-
Thomas Conway