
Hi, I found myself needing atomic operations on basic, mutable numeric values. I wrote a short design doc that I'd like feedback on: https://ghc.haskell.org/trac/ghc/wiki/AtomicPrimops I will try to implement these for 7.10. -- Johan

Hey Johan, Superficial comments for now. Are you planning on supporting word variants? Callish mach-ops, as generated from primops, are not sunk across, but this behavior today is accidental (see Note [Foreign calls clobber heap]), so be sure to adjust the notes. Edward Excerpts from Johan Tibell's message of 2014-05-04 03:10:12 -0700:
Hi,
I found myself needing atomic operations on basic, mutable numeric values. I wrote a short design doc that I'd like feedback on:
https://ghc.haskell.org/trac/ghc/wiki/AtomicPrimops
I will try to implement these for 7.10.
-- Johan

Hi all, I'm not sure how to continue from here. The current backend story feels a bit shaky. Do we think we accurately need to reflect the memory model in Cmm before we should venture into this area at all, or are we comfortable relying on the current implementation of CallishMachOps with the knowledge that if we ever want to move loads/stores past those calls, we need to address the memory model issue then? I'm less concerned about actually picking a memory model. I think we should borrow what C/C++ did and just pick a couple (maybe even just 2) of the consistency levels on their "ladder" and support those. We can always add more relaxed ordering guarantees later.

I'd be pretty comfortable relying on the existing implementation for now as
long as we know it is the sort of thing we should shore up before too long.
Ultimately "Threads Cannot Be Implemented As a
Library"http://www.cs.nmsu.edu/~scooper/CS574/papers/threads-library.pdfbut
you can fake it for a long time. The user base of such features is
pretty savvy and if we're force to go a bit overboard adding fences
prophylactically it doesn't cost nearly as much today as it did even 2
years ago.
-Edward
On Mon, May 12, 2014 at 11:18 PM, Johan Tibell
Hi all,
I'm not sure how to continue from here. The current backend story feels a bit shaky. Do we think we accurately need to reflect the memory model in Cmm before we should venture into this area at all, or are we comfortable relying on the current implementation of CallishMachOps with the knowledge that if we ever want to move loads/stores past those calls, we need to address the memory model issue then?
I'm less concerned about actually picking a memory model. I think we should borrow what C/C++ did and just pick a couple (maybe even just 2) of the consistency levels on their "ladder" and support those. We can always add more relaxed ordering guarantees later.
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

On 12/05/2014 14:18, Johan Tibell wrote:
Hi all,
I'm not sure how to continue from here. The current backend story feels a bit shaky. Do we think we accurately need to reflect the memory model in Cmm before we should venture into this area at all, or are we comfortable relying on the current implementation of CallishMachOps with the knowledge that if we ever want to move loads/stores past those calls, we need to address the memory model issue then?
Just document what you expect the semantics to be (e.g. what kind of barrier each operation is), and any future optimisations in CmmSink and elsewhere will need to respect that. I'd put that documentation in CmmSink which is where we already have similar kinds of documentation. It would probably be better to move it into a wiki page at some point though. Cheers, Simon

I will update the wiki page (and later CmmSink) with the guarantees we expect CallishMachOps to provide. We do need to pick what kind of guarantee we want to provide. Options are here: http://en.cppreference.com/w/cpp/atomic/memory_order Do we want to have these CallishMachOps to imply a full memory fence CPU instruction or some more relaxed ordering (e.g. only atomicity)?

Hey Johan,
on https://ghc.haskell.org/trac/ghc/ticket/7883 Ryan helped articulate what
he'd want wrt memory ordering semantics.
One point is that It might be totally valid and reasonable to provide
*both* variants, though if we only were to do one, the strong ordering
guarantees might be a better default, albeit your use case and others does
benefit from using the weakest / simplest primops possible,
On Thu, May 15, 2014 at 6:01 AM, Johan Tibell
I will update the wiki page (and later CmmSink) with the guarantees we expect CallishMachOps to provide. We do need to pick what kind of guarantee we want to provide. Options are here: http://en.cppreference.com/w/cpp/atomic/memory_order
Do we want to have these CallishMachOps to imply a full memory fence CPU instruction or some more relaxed ordering (e.g. only atomicity)?
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

After some discussion with Simon, I will go ahead and try to implement fully sequentially consistent versions of these primops. We can add other versions that take the consistency model as an argument later, if needed. On Thu, May 15, 2014 at 9:03 PM, Carter Schonwald < carter.schonwald@gmail.com> wrote:
Hey Johan, on https://ghc.haskell.org/trac/ghc/ticket/7883 Ryan helped articulate what he'd want wrt memory ordering semantics.
One point is that It might be totally valid and reasonable to provide *both* variants, though if we only were to do one, the strong ordering guarantees might be a better default, albeit your use case and others does benefit from using the weakest / simplest primops possible,
On Thu, May 15, 2014 at 6:01 AM, Johan Tibell
wrote: I will update the wiki page (and later CmmSink) with the guarantees we expect CallishMachOps to provide. We do need to pick what kind of guarantee we want to provide. Options are here: http://en.cppreference.com/w/cpp/atomic/memory_order
Do we want to have these CallishMachOps to imply a full memory fence CPU instruction or some more relaxed ordering (e.g. only atomicity)?
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

Sequentially consistent also corresponds to, according to the LLVM docs:
"the C++0x/C1x memory_order_seq_cst, Java volatile, and the gcc-compatible
__sync_* builtins", so we'll be in good company.
On Tue, May 20, 2014 at 5:01 PM, Johan Tibell
After some discussion with Simon, I will go ahead and try to implement fully sequentially consistent versions of these primops. We can add other versions that take the consistency model as an argument later, if needed.
On Thu, May 15, 2014 at 9:03 PM, Carter Schonwald < carter.schonwald@gmail.com> wrote:
Hey Johan, on https://ghc.haskell.org/trac/ghc/ticket/7883 Ryan helped articulate what he'd want wrt memory ordering semantics.
One point is that It might be totally valid and reasonable to provide *both* variants, though if we only were to do one, the strong ordering guarantees might be a better default, albeit your use case and others does benefit from using the weakest / simplest primops possible,
On Thu, May 15, 2014 at 6:01 AM, Johan Tibell
wrote: I will update the wiki page (and later CmmSink) with the guarantees we expect CallishMachOps to provide. We do need to pick what kind of guarantee we want to provide. Options are here: http://en.cppreference.com/w/cpp/atomic/memory_order
Do we want to have these CallishMachOps to imply a full memory fence CPU instruction or some more relaxed ordering (e.g. only atomicity)?
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

Yes, that's exactly what I'd like to see as well. Also, we must confess
that most worldwide GHC cycles are currently spent on x86. Arm will surely
become more important over time. But what's right for x86 is probably
right for us for the near/medium term.
On Tue, May 20, 2014 at 11:04 AM, Johan Tibell
Sequentially consistent also corresponds to, according to the LLVM docs: "the C++0x/C1x memory_order_seq_cst, Java volatile, and the gcc-compatible __sync_* builtins", so we'll be in good company.
On Tue, May 20, 2014 at 5:01 PM, Johan Tibell
wrote: After some discussion with Simon, I will go ahead and try to implement fully sequentially consistent versions of these primops. We can add other versions that take the consistency model as an argument later, if needed.
On Thu, May 15, 2014 at 9:03 PM, Carter Schonwald < carter.schonwald@gmail.com> wrote:
Hey Johan, on https://ghc.haskell.org/trac/ghc/ticket/7883 Ryan helped articulate what he'd want wrt memory ordering semantics.
One point is that It might be totally valid and reasonable to provide *both* variants, though if we only were to do one, the strong ordering guarantees might be a better default, albeit your use case and others does benefit from using the weakest / simplest primops possible,
On Thu, May 15, 2014 at 6:01 AM, Johan Tibell
wrote: I will update the wiki page (and later CmmSink) with the guarantees we expect CallishMachOps to provide. We do need to pick what kind of guarantee we want to provide. Options are here: http://en.cppreference.com/w/cpp/atomic/memory_order
Do we want to have these CallishMachOps to imply a full memory fence CPU instruction or some more relaxed ordering (e.g. only atomicity)?
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

I now have a working implementation: https://phabricator.haskell.org/D12
The code that's generated is pretty good, except for 1-2 extra moves that I
think are due to the thing being a CallishMachOp:
https://gist.github.com/tibbe/02e6dfdb5a63a9793651
The remaining TODO is to get things working in the LLVM codegen. It should
be difficult (we just need to output the corresponding LLVM intrinsics),
but I'm very familiar with that code.
On Tue, May 20, 2014 at 6:12 PM, Ryan Newton
Yes, that's exactly what I'd like to see as well. Also, we must confess that most worldwide GHC cycles are currently spent on x86. Arm will surely become more important over time. But what's right for x86 is probably right for us for the near/medium term.
On Tue, May 20, 2014 at 11:04 AM, Johan Tibell
wrote: Sequentially consistent also corresponds to, according to the LLVM docs: "the C++0x/C1x memory_order_seq_cst, Java volatile, and the gcc-compatible __sync_* builtins", so we'll be in good company.
On Tue, May 20, 2014 at 5:01 PM, Johan Tibell
wrote: After some discussion with Simon, I will go ahead and try to implement fully sequentially consistent versions of these primops. We can add other versions that take the consistency model as an argument later, if needed.
On Thu, May 15, 2014 at 9:03 PM, Carter Schonwald < carter.schonwald@gmail.com> wrote:
Hey Johan, on https://ghc.haskell.org/trac/ghc/ticket/7883 Ryan helped articulate what he'd want wrt memory ordering semantics.
One point is that It might be totally valid and reasonable to provide *both* variants, though if we only were to do one, the strong ordering guarantees might be a better default, albeit your use case and others does benefit from using the weakest / simplest primops possible,
On Thu, May 15, 2014 at 6:01 AM, Johan Tibell
wrote: I will update the wiki page (and later CmmSink) with the guarantees we expect CallishMachOps to provide. We do need to pick what kind of guarantee we want to provide. Options are here: http://en.cppreference.com/w/cpp/atomic/memory_order
Do we want to have these CallishMachOps to imply a full memory fence CPU instruction or some more relaxed ordering (e.g. only atomicity)?
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

The code review was moved to https://phabricator.haskell.org/D15
On Mon, Jun 9, 2014 at 1:16 PM, Johan Tibell
I now have a working implementation: https://phabricator.haskell.org/D12
The code that's generated is pretty good, except for 1-2 extra moves that I think are due to the thing being a CallishMachOp: https://gist.github.com/tibbe/02e6dfdb5a63a9793651
The remaining TODO is to get things working in the LLVM codegen. It should be difficult (we just need to output the corresponding LLVM intrinsics), but I'm very familiar with that code.
On Tue, May 20, 2014 at 6:12 PM, Ryan Newton
wrote: Yes, that's exactly what I'd like to see as well. Also, we must confess that most worldwide GHC cycles are currently spent on x86. Arm will surely become more important over time. But what's right for x86 is probably right for us for the near/medium term.
On Tue, May 20, 2014 at 11:04 AM, Johan Tibell
wrote: Sequentially consistent also corresponds to, according to the LLVM docs: "the C++0x/C1x memory_order_seq_cst, Java volatile, and the gcc-compatible __sync_* builtins", so we'll be in good company.
On Tue, May 20, 2014 at 5:01 PM, Johan Tibell
wrote: After some discussion with Simon, I will go ahead and try to implement fully sequentially consistent versions of these primops. We can add other versions that take the consistency model as an argument later, if needed.
On Thu, May 15, 2014 at 9:03 PM, Carter Schonwald < carter.schonwald@gmail.com> wrote:
Hey Johan, on https://ghc.haskell.org/trac/ghc/ticket/7883 Ryan helped articulate what he'd want wrt memory ordering semantics.
One point is that It might be totally valid and reasonable to provide *both* variants, though if we only were to do one, the strong ordering guarantees might be a better default, albeit your use case and others does benefit from using the weakest / simplest primops possible,
On Thu, May 15, 2014 at 6:01 AM, Johan Tibell
wrote: I will update the wiki page (and later CmmSink) with the guarantees we expect CallishMachOps to provide. We do need to pick what kind of guarantee we want to provide. Options are here: http://en.cppreference.com/w/cpp/atomic/memory_order
Do we want to have these CallishMachOps to imply a full memory fence CPU instruction or some more relaxed ordering (e.g. only atomicity)?
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

Hello,
IMHO I think the desire to include these features is leading to a slightly
cavalier attitude towards reordering concerns. Even assuming that the Cmm
code generator doesn't reorder reads/writes around `CallishMachOp`, I don't
see why this behavior should always be true, leading to possible future
pain. Also, it's possible that LLVM may decide to reorder memory accesses
AIUI because the underlying LLVM variables won't be synchronized.
In a nutshell, even though everything may work now I suspect it'll become
an ill-specified mess in a few years time. I don't have a ton of
experience implementing these sorts of features in compilers though, so
probably only worth a half cent.
John L.
On May 4, 2014 3:10 AM, "Johan Tibell"
Hi,
I found myself needing atomic operations on basic, mutable numeric values. I wrote a short design doc that I'd like feedback on:
https://ghc.haskell.org/trac/ghc/wiki/AtomicPrimops
I will try to implement these for 7.10.
-- Johan
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

ryan, any thoughts on the (valid!) concern John Lato raises?
I know you mentioned fighting ghc optimizations when writing concurrent
code a while ago, and i believe I asked if you could document the issues
some time so we can all think about how to make the sitch easier, could you
share your thoughts?
AFAIK, there isn't a clear "this is the memory ordering rules" even in an
informal sense for ghc core / stg / cmm, and maybe we need to pin down some
"informal" guarantees we want to hold?
-Carter
On Sun, May 4, 2014 at 6:32 PM, John Lato
Hello,
IMHO I think the desire to include these features is leading to a slightly cavalier attitude towards reordering concerns. Even assuming that the Cmm code generator doesn't reorder reads/writes around `CallishMachOp`, I don't see why this behavior should always be true, leading to possible future pain. Also, it's possible that LLVM may decide to reorder memory accesses AIUI because the underlying LLVM variables won't be synchronized.
In a nutshell, even though everything may work now I suspect it'll become an ill-specified mess in a few years time. I don't have a ton of experience implementing these sorts of features in compilers though, so probably only worth a half cent.
John L. On May 4, 2014 3:10 AM, "Johan Tibell"
wrote: Hi,
I found myself needing atomic operations on basic, mutable numeric values. I wrote a short design doc that I'd like feedback on:
https://ghc.haskell.org/trac/ghc/wiki/AtomicPrimops
I will try to implement these for 7.10.
-- Johan
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

Hi John & Carter, Deferring for a moment issues with CAS on lazy/pointer values, which pertain to Carter's question, I want to understand the ordering concern WRT only the "IntArray" primops that Johan listed. First, to my knowledge GHC doesn't do any reordering of effectful memory operations (e.g. readMutVar/writeMutVar). That is, I thought that threaded "State#" param was ironclad from the compiler's point of view. (In fact, if GHC does any optimization of these imperative bits, could someone point me to it?) For Johan's primops to work, each primop must represent a full memory fence that is respected both by the architecture, and by *both* compilers (GHC & LLVM). Since I don't think GHC is a problem, let's talk about LLVM. We need to verify that LLVM understands not to float regular loads and stores past one of its own atomic instructions. If that is the case (even without anything being marked "volatile"), then I think we are in ok shape, right? In the grand scheme of things isn't this a place where purity has done us good? All the memory traffic related to pure computation is and should be separate from these primops on mutable vars and arrays, and there shouldn't need to be any special ordering constraints *between* those two classes, should there? (Part 2)* CAS on lazy/pointer values* -- this bit *was *an ill-specified mess to begin with, for sure ;-). In part, the problem was that it was somewhat unusual and sussed out other cabal/GHChttps://github.com/rrnewton/haskell-lockfree/issues/10 issues (including a regression in 7.8https://github.com/rrnewton/haskell-lockfree/issues/28that I think is a GHC bug, but haven't filed yet [where a spurious "!" on a function arg of "Any" type changes behavior]). Even once the commitment was made to use the "ticketed" approach. (Carter, it's described in this trac ticket https://ghc.haskell.org/trac/ghc/ticket/8157.) Johan, FYI, I think we should actually backpedal on casMutVar# and casArray# and not expose a type like the one we have in 7.8: MutableArray# s a -> Int# -> a -> a -> State# s -> (# State# s, Int#, a #) I can't think of any good reason to use it, since we should never trust that "a" return value -- it's poison, and covering it up successfully requires careful attention to NOINLINEshttps://github.com/rrnewton/haskell-lockfree/issues/5. Rather, for 7.10, better to commit directly IMHO to the "Any" type to beat back the compiler: casMutVarTicketed# :: MutVar# RealWorld a -> Any a -> Any a -> State# RealWorld -> (# State# RealWorld, Int#, Any a #) Cheers, -Ryan
IMHO I think the desire to include these features is leading to a slightly cavalier attitude towards reordering concerns. Even assuming that the Cmm code generator doesn't reorder reads/writes around `CallishMachOp`, I don't see why this behavior should always be true, leading to possible future pain. Also, it's possible that LLVM may decide to reorder memory accesses AIUI because the underlying LLVM variables won't be synchronized.
In a nutshell, even though everything may work now I suspect it'll become an ill-specified mess in a few years time. I don't have a ton of experience implementing these sorts of features in compilers though, so probably only worth a half cent.
John L. On May 4, 2014 3:10 AM, "Johan Tibell"
wrote: Hi,
I found myself needing atomic operations on basic, mutable numeric values. I wrote a short design doc that I'd like feedback on:
https://ghc.haskell.org/trac/ghc/wiki/AtomicPrimops
I will try to implement these for 7.10.
-- Johan
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

For Johan's primops to work, each primop must represent a full memory fence that is respected both by the architecture, and by *both* compilers (GHC & LLVM). Since I don't think GHC is a problem, let's talk about LLVM. We need to verify that LLVM understands not to float regular loads and stores past one of its own atomic instructions. If that is the case (even without anything being marked "volatile"), then I think we are in ok shape, right?
Clarification -- this is assuming we're using the "SequentiallyConsistent" setting in the LLVM backend to get full fences on each op, which correspond to the gcc-compatible __sync_* builtins: http://llvm.org/docs/Atomics.html#sequentiallyconsistent

One last comment -- none of the above is to suggest that I don't think we
should eventually have a memory model (a la Java or C++11). But I (and
Johan) don't think the addition of the primops Johan listed should wait on
it. Further, I don't think these primops make the state of affairs any
worse, given that we've *already* had the combination of IORef operations &
parallel IO Threads for a long time, without a memory model.
I think the informal agreement we've been muddling along with is something
like this:
- IORef operations have the same behavior as the analogous C operations
-- no implied synchronization
- all IORef ops are "volatile" wrt GHC (GHC won't reordered)
- atomicModifyIORef does what its name implies
Though I confess, I'm personally unclear on what the agreement is in at
least two places:
- What Haskell operations constitute grabbing a "lock" to protect IORef
reads and writes? (We often use MVar based strategies for locking, but do
they give a *guarantee* that they provide the necessary memory fences
for the previous/subsequent IORef operations?)
- Is the de-facto "volatile" status I implied before extended to the
backends (C / LLVM)? I don't know but assume not. Note that even if not,
this doesn't cause a problem for the proposed atomic primops, all of which
are themselves
Perhaps I and others get away with this level of murkiness because we
depend on IORefs so little, with so much happening in the pure code ;-).
Ah, and last of all -- while we do need to sort out all this stuff -- I
want to point out that adding Johan's proposed primops isn't the key
decision point. That ship sailed with 7.2 ;-). This is just about
fleshing out what's already there (e.g. fetch and Xor in addition to fetch
and Add) and improving the implementations by going to in-line primops.
Best,
-Ryan
On Mon, May 5, 2014 at 12:25 AM, Ryan Newton
For Johan's primops to work, each primop must represent a full memory
fence that is respected both by the architecture, and by *both*compilers (GHC & LLVM). Since I don't think GHC is a problem, let's talk about LLVM. We need to verify that LLVM understands not to float regular loads and stores past one of its own atomic instructions. If that is the case (even without anything being marked "volatile"), then I think we are in ok shape, right?
Clarification -- this is assuming we're using the "SequentiallyConsistent" setting in the LLVM backend to get full fences on each op, which correspond to the gcc-compatible __sync_* builtins:

what sailed in ghc 7.2?
On Mon, May 5, 2014 at 12:46 AM, Ryan Newton
One last comment -- none of the above is to suggest that I don't think we should eventually have a memory model (a la Java or C++11). But I (and Johan) don't think the addition of the primops Johan listed should wait on it. Further, I don't think these primops make the state of affairs any worse, given that we've *already* had the combination of IORef operations & parallel IO Threads for a long time, without a memory model.
I think the informal agreement we've been muddling along with is something like this:
- IORef operations have the same behavior as the analogous C operations -- no implied synchronization - all IORef ops are "volatile" wrt GHC (GHC won't reordered) - atomicModifyIORef does what its name implies
Though I confess, I'm personally unclear on what the agreement is in at least two places:
- What Haskell operations constitute grabbing a "lock" to protect IORef reads and writes? (We often use MVar based strategies for locking, but do they give a *guarantee* that they provide the necessary memory fences for the previous/subsequent IORef operations?) - Is the de-facto "volatile" status I implied before extended to the backends (C / LLVM)? I don't know but assume not. Note that even if not, this doesn't cause a problem for the proposed atomic primops, all of which are themselves
Perhaps I and others get away with this level of murkiness because we depend on IORefs so little, with so much happening in the pure code ;-).
Ah, and last of all -- while we do need to sort out all this stuff -- I want to point out that adding Johan's proposed primops isn't the key decision point. That ship sailed with 7.2 ;-). This is just about fleshing out what's already there (e.g. fetch and Xor in addition to fetch and Add) and improving the implementations by going to in-line primops.
Best, -Ryan
On Mon, May 5, 2014 at 12:25 AM, Ryan Newton
wrote: For Johan's primops to work, each primop must represent a full memory
fence that is respected both by the architecture, and by *both*compilers (GHC & LLVM). Since I don't think GHC is a problem, let's talk about LLVM. We need to verify that LLVM understands not to float regular loads and stores past one of its own atomic instructions. If that is the case (even without anything being marked "volatile"), then I think we are in ok shape, right?
Clarification -- this is assuming we're using the "SequentiallyConsistent" setting in the LLVM backend to get full fences on each op, which correspond to the gcc-compatible __sync_* builtins:
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

Oh, just the first CAS primop -- the initial decision to include these
kinds of ops.
Sent from my phone.
On May 5, 2014 12:59 AM, "Carter Schonwald"
what sailed in ghc 7.2?
On Mon, May 5, 2014 at 12:46 AM, Ryan Newton
wrote: One last comment -- none of the above is to suggest that I don't think we should eventually have a memory model (a la Java or C++11). But I (and Johan) don't think the addition of the primops Johan listed should wait on it. Further, I don't think these primops make the state of affairs any worse, given that we've *already* had the combination of IORef operations & parallel IO Threads for a long time, without a memory model.
I think the informal agreement we've been muddling along with is something like this:
- IORef operations have the same behavior as the analogous C operations -- no implied synchronization - all IORef ops are "volatile" wrt GHC (GHC won't reordered) - atomicModifyIORef does what its name implies
Though I confess, I'm personally unclear on what the agreement is in at least two places:
- What Haskell operations constitute grabbing a "lock" to protect IORef reads and writes? (We often use MVar based strategies for locking, but do they give a *guarantee* that they provide the necessary memory fences for the previous/subsequent IORef operations?) - Is the de-facto "volatile" status I implied before extended to the backends (C / LLVM)? I don't know but assume not. Note that even if not, this doesn't cause a problem for the proposed atomic primops, all of which are themselves
Perhaps I and others get away with this level of murkiness because we depend on IORefs so little, with so much happening in the pure code ;-).
Ah, and last of all -- while we do need to sort out all this stuff -- I want to point out that adding Johan's proposed primops isn't the key decision point. That ship sailed with 7.2 ;-). This is just about fleshing out what's already there (e.g. fetch and Xor in addition to fetch and Add) and improving the implementations by going to in-line primops.
Best, -Ryan
On Mon, May 5, 2014 at 12:25 AM, Ryan Newton
wrote: For Johan's primops to work, each primop must represent a full memory
fence that is respected both by the architecture, and by *both*compilers (GHC & LLVM). Since I don't think GHC is a problem, let's talk about LLVM. We need to verify that LLVM understands not to float regular loads and stores past one of its own atomic instructions. If that is the case (even without anything being marked "volatile"), then I think we are in ok shape, right?
Clarification -- this is assuming we're using the "SequentiallyConsistent" setting in the LLVM backend to get full fences on each op, which correspond to the gcc-compatible __sync_* builtins:
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

Hi Ryan,
It's a little late here, but I've tried to put all my thoughts in order on
this topic.
On Sun, May 4, 2014 at 9:18 PM, Ryan Newton
Hi John & Carter,
Deferring for a moment issues with CAS on lazy/pointer values, which pertain to Carter's question, I want to understand the ordering concern WRT only the "IntArray" primops that Johan listed. First, to my knowledge GHC doesn't do any reordering of effectful memory operations (e.g. readMutVar/writeMutVar). That is, I thought that threaded "State#" param was ironclad from the compiler's point of view. (In fact, if GHC does any optimization of these imperative bits, could someone point me to it?)
My understanding is that GHC does do reordering of memory operations, in the CmmSink pass ( http://www.haskell.org/ghc/docs/latest/html/libraries/ghc/src/CmmSink.html, http://blog.ezyang.com/2014/01/so-you-want-to-add-a-new-concurrency-primitiv...). In particular ghc may currently reorder loads, although it doesn't seem to reorder stores (at this time). And ghc *currently* only performs these reorderings in limited cases.
For Johan's primops to work, each primop must represent a full memory fence that is respected both by the architecture, and by *both* compilers (GHC & LLVM). Since I don't think GHC is a problem, let's talk about LLVM. We need to verify that LLVM understands not to float regular loads and stores past one of its own atomic instructions. If that is the case (even without anything being marked "volatile"), then I think we are in ok shape, right?
Provided the operations are SequentiallyConsistent (could probably use Acquire/Release for reads/writes) I think LLVM will handle this appropriately.
In the grand scheme of things isn't this a place where purity has done us good? All the memory traffic related to pure computation is and should be separate from these primops on mutable vars and arrays, and there shouldn't need to be any special ordering constraints *between* those two classes, should there?
Perhaps. But it's still important to get ordering right WRT multiple mutable ops, so I'm not sure that purity actually helps that much, except that we can get away with underspecified operations moreso than in C++. (also x86 has a stronger model than other archs, it's possible that we currently get away with stuff that's unsafe on other platforms) I have to confess that I don't understand what you mean by suggesting we should implement these primops without having an underlying memory model. What does atomicReadIntArray# mean if there's no memory model in which its defined? How can you be sure that other ghc hackers won't break some assumptions that the implementations rely upon, if they don't have a memory model to describe how operations should behave? This is strongly related to the TODO: in Johan's proposal: just because that may be true now (I don't know if it is!) doesn't mean it will remain so in the future. Of course, given the limited number of GHC hackers, everything may work out fine in the end. It just seems to me that it would be more sensible to define the semantics first and build the primops around them, rather than primops first and semantics later. Best, John
(Part 2)* CAS on lazy/pointer values* -- this bit *was *an ill-specified mess to begin with, for sure ;-). In part, the problem was that it was somewhat unusual and sussed out other cabal/GHChttps://github.com/rrnewton/haskell-lockfree/issues/10 issues (including a regression in 7.8https://github.com/rrnewton/haskell-lockfree/issues/28that I think is a GHC bug, but haven't filed yet [where a spurious "!" on a function arg of "Any" type changes behavior]). Even once the commitment was made to use the "ticketed" approach. (Carter, it's described in this trac ticket https://ghc.haskell.org/trac/ghc/ticket/8157.)
Johan, FYI, I think we should actually backpedal on casMutVar# and casArray# and not expose a type like the one we have in 7.8:
MutableArray# s a -> Int# -> a -> a -> State# s -> (# State# s, Int#, a #)
I can't think of any good reason to use it, since we should never trust that "a" return value -- it's poison, and covering it up successfully requires careful attention to NOINLINEshttps://github.com/rrnewton/haskell-lockfree/issues/5. Rather, for 7.10, better to commit directly IMHO to the "Any" type to beat back the compiler:
casMutVarTicketed# :: MutVar# RealWorld a -> Any a -> Any a -> State# RealWorld -> (# State# RealWorld, Int#, Any a #)
Cheers, -Ryan
IMHO I think the desire to include these features is leading to a slightly cavalier attitude towards reordering concerns. Even assuming that the Cmm code generator doesn't reorder reads/writes around `CallishMachOp`, I don't see why this behavior should always be true, leading to possible future pain. Also, it's possible that LLVM may decide to reorder memory accesses AIUI because the underlying LLVM variables won't be synchronized.
In a nutshell, even though everything may work now I suspect it'll become an ill-specified mess in a few years time. I don't have a ton of experience implementing these sorts of features in compilers though, so probably only worth a half cent.
John L. On May 4, 2014 3:10 AM, "Johan Tibell"
wrote: Hi,
I found myself needing atomic operations on basic, mutable numeric values. I wrote a short design doc that I'd like feedback on:
https://ghc.haskell.org/trac/ghc/wiki/AtomicPrimops
I will try to implement these for 7.10.
-- Johan
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

Hi John, My understanding is that GHC does do reordering of memory operations, in
the CmmSink pass ( http://www.haskell.org/ghc/docs/latest/html/libraries/ghc/src/CmmSink.html,
http://blog.ezyang.com/2014/01/so-you-want-to-add-a-new-concurrency-primitiv...). In particular ghc may currently reorder loads, although it doesn't seem to reorder stores (at this time). And ghc *currently* only performs these reorderings in limited cases.
Thanks! That's a great summary by ezyang and I hadn't yet read it. I have to confess that I don't understand what you mean by suggesting we
should implement these primops without having an underlying memory model.
Oh, no, I think we should work on getting as close as possible to said memory model. But given how much work it's been for other languages I hope we can pipeline some of the implementation tasks, at the same time as that is going on. For example, we've got one GSOC working on a concurrent data structure, and it would be great if they could have a GHC branch to test inlined primops, even if there is much more work to do to verify the safety of that prototype before it makes it to the master branch. (There's also the social question of getting someone with enough time to do this memory model -- or port it from other languages. Suggestions?)
What does atomicReadIntArray# mean if there's no memory model in which its defined? How can you be sure that other ghc hackers won't break some assumptions that the implementations rely upon, if they don't have a memory model to describe how operations should behave? This is strongly related to the TODO: in Johan's proposal: just because that may be true now (I don't know if it is!) doesn't mean it will remain so in the future.
I totally agree. I'm just not sure where we should go with it. I think as a practical matter we need some kind of memory model fuzz tester for GHC, since we certainly won't have formal verification throughout the whole compiler. There's been some neat work on this in the memory models community, it seems. It sounds like Jan-Willem also suggests that there should be invasive changes made to the compiler to reify the memory model as a DAG of dependencies on the individual operations. I guess the concrete choice for Johan's proposed changes is to figure out if its possible to quickly hack the relevant CMM passes that might reorder (to be primop aware), rather than make the more general improvement that Jan-Willem suggested in the comments on ezyang's post. Of course, given the limited number of GHC hackers, everything may work out
fine in the end. It just seems to me that it would be more sensible to define the semantics first and build the primops around them, rather than primops first and semantics later.
I think it's a bit of a chicken and egg, because we need to get interesting enough data structures and libraries to make someone care enough to do this work ;-). I think we need a collaboration with someone who does memory model stuff for a living if we could get them interested or make this into "research" somehow.

This would be fantastic and I'd be happy to work with you to expose the
extra ops through the atomic-primops pkg.
This would continue our trend of supporting gcc intrinsic style atomics
(full fence, no configurabilit of the memory model a la lllvm).
These word sized ops are most important, but ultimately there are the other
sizes to support as well -- even 128 bit.
On Sunday, May 4, 2014, Johan Tibell
Hi,
I found myself needing atomic operations on basic, mutable numeric values. I wrote a short design doc that I'd like feedback on:
https://ghc.haskell.org/trac/ghc/wiki/AtomicPrimops
I will try to implement these for 7.10.
-- Johan
-- Sent from Gmail Mobile

On 04/05/14 11:10, Johan Tibell wrote:
I found myself needing atomic operations on basic, mutable numeric values. I wrote a short design doc that I'd like feedback on:
https://ghc.haskell.org/trac/ghc/wiki/AtomicPrimops
I will try to implement these for 7.10.
Right now, all CmmUnsafeForeignCalls (this includes CallishMachOps) are assumed to read and clobber arbitrary memory, so we won't commute reads or writes past them. We could relax this in the future, so the best thing to do would be to write down what semantics you expect (hah, I realise how difficult this is! An informal description will have to suffice.) Cheers, Simon
participants (7)
-
Carter Schonwald
-
Edward Kmett
-
Edward Z. Yang
-
Johan Tibell
-
John Lato
-
Ryan Newton
-
Simon Marlow