Hi Carter & others,
Carter, yes, this is CAS on pointers and in my next mail I'll try to come up with some hypotheses as to why we may have (remaining) problems there.
But first, I have been assured that on x86 there is no failure mode in which doing a comparison on the value read by CAS should not correctly diagnose success or failure (same as directly reading the Zero Flag) [1].
And yet, there's this discrepancy, where the modified casMutVar that I linked to does not have the failure. As for reproducing the failure, either of the two following tests will currently show problems:
- Two threads try to casIORef False->True, both succeed
- 120 threads try to read, increment, CAS until they succeed. The total is often not 120 because multiple threads think the successfully incremented, say, 33->34.
Here's a specific recipe for the latter test on GHC 7.6.3 Mac or Linux:
git clone git@github.com:rrnewton/haskell-lockfree-queue.git
cd haskell-lockfree-queue/AtomicPrimops/
git checkout 1a1e7e55f6706f9e5754
cabal sandbox init
cabal install -f-withTH -fforeign ./ ./testing --enable-tests
./testing/dist/dist-sandbox-*/build/test-atomic-primops/test-atomic-primops -t n_threads
You may have to run the last line several times to see the failure.
Best,
-Ryan
[1] I guess the __sync_bool_compare_and_swap intrinsic which reads ZF is there just to avoid the extra comparison.
[2] P.S. I'd like to try this on GHC head, but the RHEL 6 machine I usually use to build it is currently not validating (below error, commit 65d05d7334). After I debug this gmp problem I'll confirm that the bug under discussion applies on the 7.8 branch.
./sync-all checkout ghc-7.8
sh validate
...
/usr/bin/ld: libraries/integer-gmp/gmp/objs/aors.o: relocation R_X86_64_32 against `__gmpz_sub' can not be used when making a shared object; recompile with -fPIC
libraries/integer-gmp/gmp/objs/aors.o: could not read symbols: Bad value