
Could you post the code please?
On Sun, Jan 24, 2016 at 12:46 AM, Thomas Koster
Hi friends,
Using Criterion, I have been running benchmarks to measure the relative performance of STM and MVars for some simple transactions that I expect will be typical in my application. I am using GHC 7.10.2 and libraries as at Stackage LTS 3.2.
I have found that STM is faster than MVars in all my benchmarks, without exception. This seems to go against accepted wisdom [1][2][3]. I have not included my source code here to save space, but if you suspect that I am using MVars incorrectly, just say so and I will post my source code separately.
I have two questions:
1. When are MVars faster than STM? If the answer is "never", then when are MVars "better" than STM? (Choose your own definition of "better".)
2. When given two capabilities (+RTS -N2), MVars are suddenly an order of magnitude slower than with just one capability. Why?
For those who want details:
My benchmark forks four Haskell threads. Each thread repeats a transaction that increments a shared counter many, many times. These transactions must be serialized. The counter is therefore highly contended. One version uses an MVar to store the counter in the obvious way. The other version uses a TVar instead.
By the way, simply using "atomic-primops" to increment the counter won't do because the increment operation is actually a mock substitute for a more complex operation. I use the counter for my benchmarks because the real operation needs much more memory and I don't want the additional, unpredictable GC cost to affect my measurements.
Typical measurements are:
1 capability, using MVar: 37.30 ms 1 capability, using TVar: 24.88 ms 2 capabilities, using MVar: 1.564 s 2 capabilities, using TVar: 80.09 ms 4 capabilities, using MVar: 2.890 s 4 capabilities, using TVar: 207.8 ms
Notice that the MVar version suddenly slows by an order of magnitude when run with more than one capability. Why is this so? (This is question 2.)
Despite the absolute time elapsed, I realize that the CPU usage characteristics of the two versions are also quite different. I realize that the MVar version interlocks the four threads so that only one capability is ever busy at a time, irrespective of the number of capabilities available, whereas the STM version allows up to four capabilities to be busy at once. However, I believe that the additional parallel transactions in the STM version would be mostly wasted, destined to be retried. Unless I am mistaken, this assumption appears to be consistent with the observation that the STM version with -N1 is the fastest of all. Despite all this wasted work by the thundering herd, the total CPU time (i.e. my power bill) for the STM version is still less than for the MVar version, because the MVar version so dramatically slow.
Paradoxically, MVars seem to be the wrong tool for this job. So when are MVars faster than STM? (This is question 1.)
[1] https://stackoverflow.com/questions/15439966/when-why-use-an-mvar-over-a-tva... [2] https://www.reddit.com/r/haskell/comments/39ef3y/ioref_vs_mvar_vs_tvar_vs_tm... [3] https://mail.haskell.org/pipermail/haskell-cafe/2014-January/112158.html
Thanks, Thomas Koster _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
-- Chris Allen Currently working on http://haskellbook.com