
On 18 Nov 2008, at 10:04, Ketil Malde wrote:
Neil Davies
writes: You may not be asking the right question here. Your final system's performance is going to be influenced far more by your algorithm for updating than by STM (or any other concurrency primitive's) performance.
I may not be asking the right question, but I am happy about the answer, including yours :-)
I think STM is semantically the right tool for the (my) job, but for it to be useful, the implementation must not be the limiting factor. I.e running on n+1 CPUs should be measurably faster than running on n, at least up to n=8, and preferably more.
More detailed questions: how complex is the mutual exclusion block? If it is well known now and not likely to change you can implement several ways and work out any extra overhead (run it lot) against the other approaches. Nothing like a quick benchmark. Otherwise stick with STM (its composable after all).
With the assuming I can get enough parallelism and avoiding too many rollbacks, of course.
Its not the parallelism that is the issue here, it is the locality of reference. If you have data that is likely to be widely spread amongst all the possible mutual exclusion blocks then you are on to a winner. If your data is clustered and likely to hit the same 'block' then, whatever you do, your scalability is scuppered. Also, consider how much concurrency you've got, not just the parallelism. You need enough concurrency to exploit the parallelism but not too much more - too much more can start creating contention for the mutual exclusion blocks that would not have existed at less concurrency.
Others have already mentioned the granularity of locking - but that one of the performance design decisions that you need to quantify.
Yes. Fine grained - I'm thinking a large Array of TVars. (If you only have a single TVar, it might as well be an MVar, no?)
What do those TVars contain? how many of them are being updated atomically?
-k -- If I haven't seen further, it is by standing in the footprints of giants
Neil