One last comment -- none of the above is to suggest that I don't think we should eventually have a memory model (a la Java or C++11). But I (and Johan) don't think the addition of the primops Johan listed should wait on it. Further, I don't think these primops make the state of affairs any worse, given that we've already had the combination of IORef operations & parallel IO Threads for a long time, without a memory model.

I think the informal agreement we've been muddling along with is something like this:

IORef operations have the same behavior as the analogous C operations -- no implied synchronization
all IORef ops are "volatile" wrt GHC (GHC won't reordered)
atomicModifyIORef does what its name implies

Though I confess, I'm personally unclear on what the agreement is in at least two places:

What Haskell operations constitute grabbing a "lock" to protect IORef reads and writes? (We often use MVar based strategies for locking, but do they give a guarantee that they provide the necessary memory fences for the previous/subsequent IORef operations?)
Is the de-facto "volatile" status I implied before extended to the backends (C / LLVM)? I don't know but assume not. Note that even if not, this doesn't cause a problem for the proposed atomic primops, all of which are themselves

Perhaps I and others get away with this level of murkiness because we depend on IORefs so little, with so much happening in the pure code ;-).

Ah, and last of all -- while we do need to sort out all this stuff -- I want to point out that adding Johan's proposed primops isn't the key decision point. That ship sailed with 7.2 ;-). This is just about fleshing out what's already there (e.g. fetch and Xor in addition to fetch and Add) and improving the implementations by going to in-line primops.

Best,

-Ryan

On Mon, May 5, 2014 at 12:25 AM, Ryan Newton <rrnewton@gmail.com> wrote:

For Johan's primops to work, each primop must represent a full memory fence that is respected both by the architecture, and by both compilers (GHC & LLVM). Since I don't think GHC is a problem, let's talk about LLVM. We need to verify that LLVM understands not to float regular loads and stores past one of its own atomic instructions. If that is the case (even without anything being marked "volatile"), then I think we are in ok shape, right?

Clarification -- this is assuming we're using the "SequentiallyConsistent" setting in the LLVM backend to get full fences on each op, which correspond to the gcc-compatible __sync_* builtins:

http://llvm.org/docs/Atomics.html#sequentiallyconsistent