>IIRC, MVars (in Haskell anyway) also have some fairness guarantees, which I don't see otherwise mentioned in the discussion so far.
Looking at given mutex implementation I guess fairness controlled by the primitive itself. More particular:
1. Each thread which trying to obtain mutex will wait for one I-Var exclusively (no competing threads waits shared I-Var hence there is no space for “unfair” schedule).
2. I-Var’s swapped by atomic ‘getAndSet’ which is likely can be implemented over CAS or LL/SC and will be as fair as hardware could be.
> If the goal is to have a simpler base primitive on which to build, everything has to boil down to compare-and-swap, load-linked/store-conditional, etc
I had this discussion with a Cats guys, but I still can’t understand how it’s possible to implement generic non-busy waiting with CAS/LL-SC only. Ok, somewhere at deeper level it’s actually CAS/LL-SC, but one of the threads should be able to HLT (privileged instruction). Otherwise non-busy waiting would not be possible. Speaking other words, generic locking strategy should allow you to deadlock. How would you implement non-busy deadlock without HLT? Probably I’m missing something, and I’ll be grateful if anyone clarifies. Because I was told multiple times that CAS/LL-SC is enough along, but I can’t figure out myself.
> If what you're after is *composable* concurrency, simpler locks won't get you that.
Composability is not a point at all currently. I’m not after composability I’m after completeness. Let me speculate a bit and presume that any sufficiently complex composable system (i.e. such a system which would hold some “correctness” properties over allowed composition operations) will necessary be incomplete (i.e. some “correct” expressions will be inexpressible due to the lack of compositions rules and there is no way to supplement the system without making it controversial).
This is my very personal interpretation of the Godel Theorem and pure speculation, but look, every non-Turing complete language will prohibit correct programs, on the other hand, every Turing complete will allow you to write non-terminating programs (contradictions). I believe, same property holds with concurrency. If some system prohibits you from deadlocks, it will also prohibit you from some correct synchronization strategies (once again, it’s my speculations, maybe there is a proof of contrary, but I haven't found them yet).
Standing from this point, it’s worth to have some complete and controversial (uncomposable) API at low level and having several composable implementations over it with their restrictions.
>async exceptions, you're in the same situation with MVars as you are with all resource-acquiring IO actions in Haskell.
In my opinion they are completely different. In one case we should provide guaranteed resource release, so we want to make resource acquisition atomic. In other case we want to make wait operation interruptible (hence – non-atomic unless acquisition consists of only one operation). Consider `mask $ \restore -> do { x <- resourceAcquisition; interruptibleOperation; onException (restore $ f x) (resorceRelease<1> x); resorceRelease<2> x }`. In case of interruption we will neither get to resorceRelease<1> nor to resorceRelease<2> (that is why uninterruptibleMask was introduced). Its tempting to consider a MVar as a resource, but resources don’t like interruptions, they should be separated.