
One last comment -- none of the above is to suggest that I don't think we
should eventually have a memory model (a la Java or C++11). But I (and
Johan) don't think the addition of the primops Johan listed should wait on
it. Further, I don't think these primops make the state of affairs any
worse, given that we've *already* had the combination of IORef operations &
parallel IO Threads for a long time, without a memory model.
I think the informal agreement we've been muddling along with is something
like this:
- IORef operations have the same behavior as the analogous C operations
-- no implied synchronization
- all IORef ops are "volatile" wrt GHC (GHC won't reordered)
- atomicModifyIORef does what its name implies
Though I confess, I'm personally unclear on what the agreement is in at
least two places:
- What Haskell operations constitute grabbing a "lock" to protect IORef
reads and writes? (We often use MVar based strategies for locking, but do
they give a *guarantee* that they provide the necessary memory fences
for the previous/subsequent IORef operations?)
- Is the de-facto "volatile" status I implied before extended to the
backends (C / LLVM)? I don't know but assume not. Note that even if not,
this doesn't cause a problem for the proposed atomic primops, all of which
are themselves
Perhaps I and others get away with this level of murkiness because we
depend on IORefs so little, with so much happening in the pure code ;-).
Ah, and last of all -- while we do need to sort out all this stuff -- I
want to point out that adding Johan's proposed primops isn't the key
decision point. That ship sailed with 7.2 ;-). This is just about
fleshing out what's already there (e.g. fetch and Xor in addition to fetch
and Add) and improving the implementations by going to in-line primops.
Best,
-Ryan
On Mon, May 5, 2014 at 12:25 AM, Ryan Newton
For Johan's primops to work, each primop must represent a full memory
fence that is respected both by the architecture, and by *both*compilers (GHC & LLVM). Since I don't think GHC is a problem, let's talk about LLVM. We need to verify that LLVM understands not to float regular loads and stores past one of its own atomic instructions. If that is the case (even without anything being marked "volatile"), then I think we are in ok shape, right?
Clarification -- this is assuming we're using the "SequentiallyConsistent" setting in the LLVM backend to get full fences on each op, which correspond to the gcc-compatible __sync_* builtins: