I commented on the commit here:

   https://github.com/ghc/ghc/commit/521b792553bacbdb0eec138b150ab0626ea6f36b

The problem is that our "cas" routine in SMP.h is similar to the C compiler intrinsic __sync_val_compare_and_swap, in that it returns the old value.  But it seems we cannot use a comparison against that old value to determine whether or not the CAS succeeded.  (I believe the CAS may fail due to contention, but the old value may happen to look like our old value.)

Unfortunately, this didn't occur to me until it started causing bugs [1] [2].  Fixing casMutVar# fixes these bugs.  However, the way I'm currently fixing CAS in the "atomic-primops" package is by using __sync_bool_compare_and_swap:

   https://github.com/rrnewton/haskell-lockfree/commit/f9716ddd94d5eff7420256de22cbf38c02322d7a#diff-be3304b3ecdd8e1f9ed316cd844d711aR200

What is the best fix for GHC itself?   Would it be ok for GHC to include a C compiler intrinsic like __sync_val_compare_and_swap?  Otherwise we need another big ifdbef'd function like "cas" in SMP.h that has the architecture-specific inline asm across all architectures.  I can write the x86 one, but I'm not eager to try the others.

Best,
   -Ryan

[1] https://github.com/iu-parfunc/lvars/issues/70
[2] https://github.com/rrnewton/haskell-lockfree/issues/15