One last comment -- none of the above is to suggest that I don't think we should eventually have a memory model (a la Java or C++11).  But I (and Johan) don't think the addition of the primops Johan listed should wait on it.  Further, I don't think these primops make the state of affairs any worse, given that we've already had the combination of IORef operations & parallel IO Threads for a long time, without a memory model.

I think the informal agreement we've been muddling along with is something like this:
Though I confess, I'm personally unclear on what the agreement is in at least two places:
Perhaps I and others get away with this level of murkiness because we depend on IORefs so little, with so much happening in the pure code ;-).

Ah, and last of all -- while we do need to sort out all this stuff -- I want to point out that adding Johan's proposed primops isn't the key decision point.  That ship sailed with 7.2 ;-).  This is just about fleshing out what's already there (e.g. fetch and Xor in addition to fetch and Add) and improving the implementations by going to in-line primops.

Best,
  -Ryan


On Mon, May 5, 2014 at 12:25 AM, Ryan Newton <rrnewton@gmail.com> wrote:
For Johan's primops to work, each primop must represent a full memory fence that is respected both by the architecture, and by both compilers (GHC & LLVM).  Since I don't think GHC is a problem, let's talk about LLVM.  We need to verify that LLVM understands not to float regular loads and stores past one of its own atomic instructions.  If that is the case (even without anything being marked "volatile"), then I think we are in ok shape, right?

Clarification -- this is assuming we're using the "SequentiallyConsistent" setting in the LLVM backend to get full fences on each op, which correspond to the gcc-compatible __sync_* builtins: