
On Mon, Dec 29, 2008 at 1:15 PM, Ryan Ingram
Both readTVar and writeTVar are worse than O(1); they have to look up the TVar in the transaction log to see if you have made local changes to it.
Right now it looks like that operation is O(n) where n is the number of TVars accessed by a transaction, so your big transaction which is just accessing a ton of TVars is likely O(n^2).
So this actually brings up a tangential question I've wondered about for a while. The lock-free datastructure paper at http://research.microsoft.com/users/Cambridge/simonpj/papers/stm/lock-free-f... shows lock vs. STM with very similar performance for single processor with STM quickly winning on multi-processors. I understand the basic idea as being that the STM versions are optimistic in that they only pay a cost on a transaction collision, while locks have to go frob some bit of memory on every read or write. I'm guessing that STM would also show less contention on a read-heavy load since you don't need block anyone if you are just reading (though I guess you can get the same effect with shared-read locks in a locking style?). But every STM operation has to modify a transaction log, which seems like it should be even more expensive than frobbing a lock bit. So it seems like if the per-operation STM overhead is higher, and blocking contention is the same (assuming the lock implementation uses shared locks for reads), I don't see how the STM implementation could be faster. I'm sure it's all more complicated than this... but where exactly is the STM performance coming from in those papers? Also, they disclaim in the testing section that the STM implementation was immature, but now parallel GC and perhaps STM doesn't do so much dynamic allocation (?), shouldn't the STM numbers be even better?