New subject: [GHC] #8157: Add a broader set of (C/CMM-based) atomic memory operations

22 Aug 2013

      #8157: Add a broader set of (C/CMM-based) atomic memory operations
------------------------------------+-------------------------------------
       Reporter:  rrnewton          |             Owner:
           Type:  bug               |            Status:  new
       Priority:  normal            |         Milestone:
      Component:  Compiler          |           Version:  7.6.3
       Keywords:                    |  Operating System:  Unknown/Multiple
   Architecture:  Unknown/Multiple  |   Type of failure:  None/Unknown
     Difficulty:  Unknown           |         Test Case:
     Blocked By:                    |          Blocking:
Related Tickets:                    |
------------------------------------+-------------------------------------
 This task is a precursor to ticket #7883.  The goal is to standardize
 a broader set of atomic ops -- mainly CAS and fetch-and-add -- and
 then in #7883 we will optimize those primops to have native support in
 the LLVM backend and thus avoid the extra C calls.

 Currently, the goal is only to support word-sized (4 or 8 byte)
 compare-and-swap and fetch-and-add operations.  In the future, this
 goal could be expanded to support 16-byte CAS, implemented by CMPXCHG16B
 on
 x86.

 Even supporting word sized CAS requires three different primops for
 operating on `MutVar`, `MutableArray`, and `MutableByteArray`,
 respectively.  Here are signatures for those primOps:

 {{{
 #!C
 gcptr stg_casMutVarzh ( gcptr mv, gcptr old, gcptr new )
  /* MutVar# s a -> a -> a -> State# s -> (# State#, Int#, Any a #) */

 gcptr stg_casArrayzh ( gcptr arr, W_ ind, gcptr old, gcptr new );
 /* MutableArray# s a -> Int# -> a -> a -> State# s -> (# State# s, Int#,
 Any a #) */

 W_ stg_casIntArrayzh( gcptr arr, W_ ind, W_ old, W_ new );
 /* MutableByteArray# s -> Int# -> Int# -> Int# -> State# s -> (# State# s,
 Int# #) */
 }}}

 Fetch-and-add, on the other hand, only makes sense on a
 `MutableByteArray`:

 {{{
 #!C
 W_ stg_fetchAddIntArrayzh( gcptr arr, W_ ind, W_ incr );
 /* MutableByteArray# s -> Int# -> Int# -> State# s -> (# State# s, Int# #)
 */
 }}}

 Some systems, such as gcc and LLVM, provide other variants, such as
 "fetch-and-multiply".  However, these do not have direct support on
 x86 and are usually just simulated with CAS.  For example, see the
 [http://llvm.org/docs/Atomics.html#atomics-and-codegen LLVM documentation
 here].
 User-exposed Haskell libraries for atomic operations may employ the
 same approach using the CAS primOps above to simulate other
 fetch-and-X operations.

 Finally, a design question which bears thinking about is how to
 implement opaque "ticket" approach to CAS.  The basic idea is that a
 user does NOT present a regular pointer-based data value (e.g. `Int`)
 to CAS as the "old" value.  Rather, observations of prior values are
 kept in an opaque form (`Any` type) to prevent the compiler changing
 the pointer identity by, for example, unboxing and reboxing the value.

 The way this affects the **primOp** design is as follows.  The current
 implementation of casMutVar# and casArray# returns the NEW value
 rather than the old one if the CAS is successful.  The Haskell code
 that imports the primOps treats this new value as the opaque ticket.

 The user can in fact safely retrieve the real value from the `Any`-based
 "ticket" with a `peekTicket :: Ticket a -> a` operation.  But
 [http://hackage.haskell.org/packages/archive/atomic-primops/0.4/doc/html
 /Data-Atomics.html#g:1 peekTicket] MUST [https://github.com/rrnewton
 /haskell-lockfree-
 queue/blob/880fb11de354d84f6c9a8ee06d5d4a887bd449a4/AtomicPrimops/Data/Atomics.hs#L114
 be marked NOINLINE] to hide the `unsafeCoerce`
 operation inside it from compiler optimizations.  Otherwise, the
 code path that carries the ticket from one CAS operation to the
 next is subject to optimization [1].  In particular, `unsafeCoerce` simply
 becomes a type-level operation, not a function call with the usual
 constraints on control and dataflow.

 Returning to the issue of primOp design, if we wished to avoid the
 somewhat non-standard design of returning the new value from CAS, we
 //could// allow users to create `Ticket` values themselves, rather
 retrieving them as the result of previous read and CAS IO operations.
 Yet this would entail an additional `NOINLINE` (unsafeCoerce) function
 call, and I believe it would be prone to error.

 Further, if you look at
 [https://github.com/ghc/ghc/blob/25ad01533aaaf1ccf69e1b7216e66de025b8131b/rts...
 the primOp implementation for stg_casMutVarzh],
 you will see that there is //already// a branch, which is necessary for
 updating GC-related meta-data.  Thus there is no additional overhead
 to return the new rather than the old value from CAS operations in the
 successful case. Haskell-level CAS is already a compound operation,
 not merely a single machine code instruction.

 By contrast, the C intrinsic `__sync_val_compare_and_swap` can
 become only a single machine operation, and it returns the "old"
 value irrespective of whether the CAS succeeded or failed.  In fact,
 an equality comparison on that old value is necessary to determine if
 the operation succeeded.  In contrast, the Haskell CAS primOps above
 return an unboxed tuple with an explicit success bit.  Thus returning
 the old value in the success case is completely redundant, and
 returning the new value instead allows us to acheive two goals
 (the CAS and the opacifying coercion) with one
 operation at no additional cost.

 [1] P.S. As a concrete example of what happens when `unsafeCoerce` is not
 hidden, consider the following code as a starting point:

 {{{
 x = unsafeCoerce y
 z = f y
 }}}

 `x` would seem to have nothing to do with `f`, and yet
 later in the compiler, one can end up with Core like this:
 {{{
 z = f x
 }}}

 This exposes the type information and activates optimizations, which
 was the source of [https://github.com/rrnewton/haskell-lockfree-
 queue/issues/5 one bug] with the
 `lockfree-queue` package that was fixed by adding NOINLINE to
 `peekTicket`.

-- 
Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8157
GHC http://www.haskell.org/ghc/
The Glasgow Haskell Compiler

[GHC] #8157: Add a broader set of (C/CMM-based) atomic memory operations

GHC

GHC

GHC

GHC

tags

participants (1)