Re: Adding atomic primops

5 May 2014

      One last comment -- none of the above is to suggest that I don't think we
should eventually have a memory model (a la Java or C++11).  But I (and
Johan) don't think the addition of the primops Johan listed should wait on
it.  Further, I don't think these primops make the state of affairs any
worse, given that we've *already* had the combination of IORef operations &
parallel IO Threads for a long time, without a memory model.

I think the informal agreement we've been muddling along with is something
like this:

   - IORef operations have the same behavior as the analogous C operations
   -- no implied synchronization
   - all IORef ops are "volatile" wrt GHC (GHC won't reordered)
   - atomicModifyIORef does what its name implies

Though I confess, I'm personally unclear on what the agreement is in at
least two places:

   - What Haskell operations constitute grabbing a "lock" to protect IORef
   reads and writes?  (We often use MVar based strategies for locking, but do
   they give a *guarantee* that they provide the necessary memory fences
   for the previous/subsequent IORef operations?)
   - Is the de-facto "volatile" status I implied before extended to the
   backends (C / LLVM)?  I don't know but assume not.  Note that even if not,
   this doesn't cause a problem for the proposed atomic primops, all of which
   are themselves

Perhaps I and others get away with this level of murkiness because we
depend on IORefs so little, with so much happening in the pure code ;-).

Ah, and last of all -- while we do need to sort out all this stuff -- I
want to point out that adding Johan's proposed primops isn't the key
decision point.  That ship sailed with 7.2 ;-).  This is just about
fleshing out what's already there (e.g. fetch and Xor in addition to fetch
and Add) and improving the implementations by going to in-line primops.

Best,
  -Ryan

On Mon, May 5, 2014 at 12:25 AM, Ryan Newton  wrote:
...
For Johan's primops to work, each primop must represent a full memory
...
fence that is respected both by the architecture, and by *both*compilers (GHC & LLVM).  Since I don't think GHC is a problem, let's talk
about LLVM.  We need to verify that LLVM understands not to float regular
loads and stores past one of its own atomic instructions.  If that is the
case (even without anything being marked "volatile"), then I think we are
in ok shape, right?
Clarification -- this is assuming we're using the "SequentiallyConsistent"
setting in the LLVM backend to get full fences on each op, which correspond
to the gcc-compatible __sync_* builtins:
http://llvm.org/docs/Atomics.html#sequentiallyconsistent