
On Sun, Dec 28, 2008 at 11:08 PM, Evan Laforge
But every STM operation has to modify a transaction log, which seems like it should be even more expensive than frobbing a lock bit. So it seems like if the per-operation STM overhead is higher, and blocking contention is the same (assuming the lock implementation uses shared locks for reads), I don't see how the STM implementation could be faster.
Absolutely, although theoretically for lots of repeated operations your transaction log remains in cache, so it shouldn't be too bad. When I talked to Simon Peyton-Jones at MSR this fall, he talked about locks being like "the assembly language of concurrency"; STM is then a high-level language. You pay some amount in performance to get abstraction & the ability to write composable programs that remain correct, a problem that locking hasn't been able to solve despite decades of research. It's always possible to decompose the final program down into one that could be based on locks. But this is often at the cost of maintainability; the code becomes one mess of spaghetti locking in order to maintain whatever invariants need to be maintained to prevent deadlock and compose code together.
I'm sure it's all more complicated than this... but where exactly is the STM performance coming from in those papers? Also, they disclaim in the testing section that the STM implementation was immature, but now parallel GC and perhaps STM doesn't do so much dynamic allocation (?), shouldn't the STM numbers be even better?
The paper probably got its performance from the "optimistic concurrency" that STM allows; transactions that read many shared parts of a data structure but only modify a small amount that is likely far away from other transactions, like in a binary tree, is ideal for STM performance. You are correct that multiple-reader locks could get some of this benefit back. But those systems have serious drawbacks when a read-only lock needs to be changed into a read-write lock in the middle of an operation. There's also the same deadlock/lock-ordering problems as in any lock-based solution. STM cuts this Gordian knot by forcing all lock-taking actions to be otherwise pure, besides the mutation to the data structures protected by the locks. This way, a failing transaction can always just be killed and restarted if something goes wrong. It might be possible to enforce the same sort of purity on top of a lock-based system. I don't think the STM runtime has gotten a lot of love since that paper; given that TVar operations are using a linear search over the transaction log, I suspect it could go a long way if it had a capable volunteer. This is part of the motivation behind trying to port the concurrency substrate from C into Haskell [1]; getting STM out of the GHC RTS and into the hands of library writers will put a lot more eyeballs on the performance problems. -- ryan [1] Li, Peng. Programmable Concurrency in a Pure and Lazy Language. http://www.seas.upenn.edu/~lipeng/homepage/