[GHC] #8157: Add a broader set of (C/CMM-based) atomic memory operations

#8157: Add a broader set of (C/CMM-based) atomic memory operations ------------------------------------+------------------------------------- Reporter: rrnewton | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.6.3 Keywords: | Operating System: Unknown/Multiple Architecture: Unknown/Multiple | Type of failure: None/Unknown Difficulty: Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | ------------------------------------+------------------------------------- This task is a precursor to ticket #7883. The goal is to standardize a broader set of atomic ops -- mainly CAS and fetch-and-add -- and then in #7883 we will optimize those primops to have native support in the LLVM backend and thus avoid the extra C calls. Currently, the goal is only to support word-sized (4 or 8 byte) compare-and-swap and fetch-and-add operations. In the future, this goal could be expanded to support 16-byte CAS, implemented by CMPXCHG16B on x86. Even supporting word sized CAS requires three different primops for operating on `MutVar`, `MutableArray`, and `MutableByteArray`, respectively. Here are signatures for those primOps: {{{ #!C gcptr stg_casMutVarzh ( gcptr mv, gcptr old, gcptr new ) /* MutVar# s a -> a -> a -> State# s -> (# State#, Int#, Any a #) */ gcptr stg_casArrayzh ( gcptr arr, W_ ind, gcptr old, gcptr new ); /* MutableArray# s a -> Int# -> a -> a -> State# s -> (# State# s, Int#, Any a #) */ W_ stg_casIntArrayzh( gcptr arr, W_ ind, W_ old, W_ new ); /* MutableByteArray# s -> Int# -> Int# -> Int# -> State# s -> (# State# s, Int# #) */ }}} Fetch-and-add, on the other hand, only makes sense on a `MutableByteArray`: {{{ #!C W_ stg_fetchAddIntArrayzh( gcptr arr, W_ ind, W_ incr ); /* MutableByteArray# s -> Int# -> Int# -> State# s -> (# State# s, Int# #) */ }}} Some systems, such as gcc and LLVM, provide other variants, such as "fetch-and-multiply". However, these do not have direct support on x86 and are usually just simulated with CAS. For example, see the [http://llvm.org/docs/Atomics.html#atomics-and-codegen LLVM documentation here]. User-exposed Haskell libraries for atomic operations may employ the same approach using the CAS primOps above to simulate other fetch-and-X operations. Finally, a design question which bears thinking about is how to implement opaque "ticket" approach to CAS. The basic idea is that a user does NOT present a regular pointer-based data value (e.g. `Int`) to CAS as the "old" value. Rather, observations of prior values are kept in an opaque form (`Any` type) to prevent the compiler changing the pointer identity by, for example, unboxing and reboxing the value. The way this affects the **primOp** design is as follows. The current implementation of casMutVar# and casArray# returns the NEW value rather than the old one if the CAS is successful. The Haskell code that imports the primOps treats this new value as the opaque ticket. The user can in fact safely retrieve the real value from the `Any`-based "ticket" with a `peekTicket :: Ticket a -> a` operation. But [http://hackage.haskell.org/packages/archive/atomic-primops/0.4/doc/html /Data-Atomics.html#g:1 peekTicket] MUST [https://github.com/rrnewton /haskell-lockfree- queue/blob/880fb11de354d84f6c9a8ee06d5d4a887bd449a4/AtomicPrimops/Data/Atomics.hs#L114 be marked NOINLINE] to hide the `unsafeCoerce` operation inside it from compiler optimizations. Otherwise, the code path that carries the ticket from one CAS operation to the next is subject to optimization [1]. In particular, `unsafeCoerce` simply becomes a type-level operation, not a function call with the usual constraints on control and dataflow. Returning to the issue of primOp design, if we wished to avoid the somewhat non-standard design of returning the new value from CAS, we //could// allow users to create `Ticket` values themselves, rather retrieving them as the result of previous read and CAS IO operations. Yet this would entail an additional `NOINLINE` (unsafeCoerce) function call, and I believe it would be prone to error. Further, if you look at [https://github.com/ghc/ghc/blob/25ad01533aaaf1ccf69e1b7216e66de025b8131b/rts... the primOp implementation for stg_casMutVarzh], you will see that there is //already// a branch, which is necessary for updating GC-related meta-data. Thus there is no additional overhead to return the new rather than the old value from CAS operations in the successful case. Haskell-level CAS is already a compound operation, not merely a single machine code instruction. By contrast, the C intrinsic `__sync_val_compare_and_swap` can become only a single machine operation, and it returns the "old" value irrespective of whether the CAS succeeded or failed. In fact, an equality comparison on that old value is necessary to determine if the operation succeeded. In contrast, the Haskell CAS primOps above return an unboxed tuple with an explicit success bit. Thus returning the old value in the success case is completely redundant, and returning the new value instead allows us to acheive two goals (the CAS and the opacifying coercion) with one operation at no additional cost. [1] P.S. As a concrete example of what happens when `unsafeCoerce` is not hidden, consider the following code as a starting point: {{{ x = unsafeCoerce y z = f y }}} `x` would seem to have nothing to do with `f`, and yet later in the compiler, one can end up with Core like this: {{{ z = f x }}} This exposes the type information and activates optimizations, which was the source of [https://github.com/rrnewton/haskell-lockfree- queue/issues/5 one bug] with the `lockfree-queue` package that was fixed by adding NOINLINE to `peekTicket`. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8157 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8157: Add a broader set of (C/CMM-based) atomic memory operations -------------------------------------+------------------------------------ Reporter: rrnewton | Owner: rrnewton Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.6.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Unknown/Multiple Type of failure: None/Unknown | Difficulty: Unknown Test Case: | Blocked By: Blocking: | Related Tickets: -------------------------------------+------------------------------------ Changes (by rrnewton): * owner: => rrnewton -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8157#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8157: Add a broader set of (C/CMM-based) atomic memory operations -------------------------------------+------------------------------------ Reporter: rrnewton | Owner: rrnewton Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.6.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Unknown/Multiple Type of failure: None/Unknown | Difficulty: Unknown Test Case: | Blocked By: Blocking: | Related Tickets: -------------------------------------+------------------------------------ Comment (by rrnewton): UPDATE: Before this task is complete we should also convert the following C functions to primops: {{{ EXTERN_INLINE void write_barrier(void); EXTERN_INLINE void store_load_barrier(void); EXTERN_INLINE void load_load_barrier(void); }}} Otherwise Carter does not have a path to replace these with their LLVM alternatives (and thus get rid of the C calls). -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8157#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8157: Add a broader set of (C/CMM-based) atomic memory operations -------------------------------------+------------------------------------ Reporter: rrnewton | Owner: rrnewton Type: bug | Status: closed Priority: normal | Milestone: Component: Compiler | Version: 7.6.3 Resolution: fixed | Keywords: Operating System: Unknown/Multiple | Architecture: Unknown/Multiple Type of failure: None/Unknown | Difficulty: Unknown Test Case: | Blocked By: Blocking: | Related Tickets: -------------------------------------+------------------------------------ Changes (by rrnewton): * status: new => closed * resolution: => fixed Comment: The basic functionality described above has now been implemented in GHC 7.8. This ticket is closed, but please refer to #7883, which discusses the next round of improvements for atomic ops. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8157#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC