Adding atomic primops

Johan Tibell

4 May 2014 4 May '14

10:10 a.m.

Hi, I found myself needing atomic operations on basic, mutable numeric values. I wrote a short design doc that I'd like feedback on: https://ghc.haskell.org/trac/ghc/wiki/AtomicPrimops I will try to implement these for 7.10. -- Johan

Attachments:

attachment.html (text/html — 446 bytes)

Show replies by date

Edward Z. Yang

4 May 4 May

10:33 a.m.

Hey Johan, Superficial comments for now. Are you planning on supporting word variants? Callish mach-ops, as generated from primops, are not sunk across, but this behavior today is accidental (see Note [Foreign calls clobber heap]), so be sure to adjust the notes. Edward Excerpts from Johan Tibell's message of 2014-05-04 03:10:12 -0700:

...

Hi,

I found myself needing atomic operations on basic, mutable numeric values. I wrote a short design doc that I'd like feedback on:

https://ghc.haskell.org/trac/ghc/wiki/AtomicPrimops

I will try to implement these for 7.10.

-- Johan

Johan Tibell

12 May 12 May

1:18 p.m.

Hi all, I'm not sure how to continue from here. The current backend story feels a bit shaky. Do we think we accurately need to reflect the memory model in Cmm before we should venture into this area at all, or are we comfortable relying on the current implementation of CallishMachOps with the knowledge that if we ever want to move loads/stores past those calls, we need to address the memory model issue then? I'm less concerned about actually picking a memory model. I think we should borrow what C/C++ did and just pick a couple (maybe even just 2) of the consistency levels on their "ladder" and support those. We can always add more relaxed ordering guarantees later.

Edward Kmett

2:01 p.m.

I'd be pretty comfortable relying on the existing implementation for now as long as we know it is the sort of thing we should shore up before too long. Ultimately "Threads Cannot Be Implemented As a Library"http://www.cs.nmsu.edu/~scooper/CS574/papers/threads-library.pdfbut you can fake it for a long time. The user base of such features is pretty savvy and if we're force to go a bit overboard adding fences prophylactically it doesn't cost nearly as much today as it did even 2 years ago. -Edward On Mon, May 12, 2014 at 11:18 PM, Johan Tibell wrote:

...

Hi all,

I'm not sure how to continue from here. The current backend story feels a bit shaky. Do we think we accurately need to reflect the memory model in Cmm before we should venture into this area at all, or are we comfortable relying on the current implementation of CallishMachOps with the knowledge that if we ever want to move loads/stores past those calls, we need to address the memory model issue then?

I'm less concerned about actually picking a memory model. I think we should borrow what C/C++ did and just pick a couple (maybe even just 2) of the consistency levels on their "ladder" and support those. We can always add more relaxed ordering guarantees later.

_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

Simon Marlow

13 May 13 May

11:08 a.m.

On 12/05/2014 14:18, Johan Tibell wrote:

...

Hi all,

I'm not sure how to continue from here. The current backend story feels a bit shaky. Do we think we accurately need to reflect the memory model in Cmm before we should venture into this area at all, or are we comfortable relying on the current implementation of CallishMachOps with the knowledge that if we ever want to move loads/stores past those calls, we need to address the memory model issue then?

Just document what you expect the semantics to be (e.g. what kind of barrier each operation is), and any future optimisations in CmmSink and elsewhere will need to respect that. I'd put that documentation in CmmSink which is where we already have similar kinds of documentation. It would probably be better to move it into a wiki page at some point though. Cheers, Simon

Johan Tibell

15 May 15 May

10:01 a.m.

I will update the wiki page (and later CmmSink) with the guarantees we expect CallishMachOps to provide. We do need to pick what kind of guarantee we want to provide. Options are here: http://en.cppreference.com/w/cpp/atomic/memory_order Do we want to have these CallishMachOps to imply a full memory fence CPU instruction or some more relaxed ordering (e.g. only atomicity)?

Carter Schonwald

7:03 p.m.

Hey Johan, on https://ghc.haskell.org/trac/ghc/ticket/7883 Ryan helped articulate what he'd want wrt memory ordering semantics. One point is that It might be totally valid and reasonable to provide *both* variants, though if we only were to do one, the strong ordering guarantees might be a better default, albeit your use case and others does benefit from using the weakest / simplest primops possible, On Thu, May 15, 2014 at 6:01 AM, Johan Tibell wrote:

...

I will update the wiki page (and later CmmSink) with the guarantees we expect CallishMachOps to provide. We do need to pick what kind of guarantee we want to provide. Options are here: http://en.cppreference.com/w/cpp/atomic/memory_order

Do we want to have these CallishMachOps to imply a full memory fence CPU instruction or some more relaxed ordering (e.g. only atomicity)?

_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

Johan Tibell

20 May 20 May

3:01 p.m.

After some discussion with Simon, I will go ahead and try to implement fully sequentially consistent versions of these primops. We can add other versions that take the consistency model as an argument later, if needed. On Thu, May 15, 2014 at 9:03 PM, Carter Schonwald < carter.schonwald@gmail.com> wrote:

...

Hey Johan, on https://ghc.haskell.org/trac/ghc/ticket/7883 Ryan helped articulate what he'd want wrt memory ordering semantics.

One point is that It might be totally valid and reasonable to provide *both* variants, though if we only were to do one, the strong ordering guarantees might be a better default, albeit your use case and others does benefit from using the weakest / simplest primops possible,

On Thu, May 15, 2014 at 6:01 AM, Johan Tibell wrote:

...
I will update the wiki page (and later CmmSink) with the guarantees we expect CallishMachOps to provide. We do need to pick what kind of guarantee we want to provide. Options are here: http://en.cppreference.com/w/cpp/atomic/memory_order

Do we want to have these CallishMachOps to imply a full memory fence CPU instruction or some more relaxed ordering (e.g. only atomicity)?

_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

Johan Tibell

3:04 p.m.

Sequentially consistent also corresponds to, according to the LLVM docs: "the C++0x/C1x memory_order_seq_cst, Java volatile, and the gcc-compatible __sync_* builtins", so we'll be in good company. On Tue, May 20, 2014 at 5:01 PM, Johan Tibell wrote:

...

After some discussion with Simon, I will go ahead and try to implement fully sequentially consistent versions of these primops. We can add other versions that take the consistency model as an argument later, if needed.

On Thu, May 15, 2014 at 9:03 PM, Carter Schonwald < carter.schonwald@gmail.com> wrote:

...
Hey Johan, on https://ghc.haskell.org/trac/ghc/ticket/7883 Ryan helped articulate what he'd want wrt memory ordering semantics.

One point is that It might be totally valid and reasonable to provide *both* variants, though if we only were to do one, the strong ordering guarantees might be a better default, albeit your use case and others does benefit from using the weakest / simplest primops possible,

On Thu, May 15, 2014 at 6:01 AM, Johan Tibell wrote:

...
I will update the wiki page (and later CmmSink) with the guarantees we expect CallishMachOps to provide. We do need to pick what kind of guarantee we want to provide. Options are here: http://en.cppreference.com/w/cpp/atomic/memory_order

Do we want to have these CallishMachOps to imply a full memory fence CPU instruction or some more relaxed ordering (e.g. only atomicity)?

_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

Ryan Newton

4:12 p.m.

Yes, that's exactly what I'd like to see as well. Also, we must confess that most worldwide GHC cycles are currently spent on x86. Arm will surely become more important over time. But what's right for x86 is probably right for us for the near/medium term. On Tue, May 20, 2014 at 11:04 AM, Johan Tibell wrote:

...

Sequentially consistent also corresponds to, according to the LLVM docs: "the C++0x/C1x memory_order_seq_cst, Java volatile, and the gcc-compatible __sync_* builtins", so we'll be in good company.

On Tue, May 20, 2014 at 5:01 PM, Johan Tibell wrote:

...
After some discussion with Simon, I will go ahead and try to implement fully sequentially consistent versions of these primops. We can add other versions that take the consistency model as an argument later, if needed.

On Thu, May 15, 2014 at 9:03 PM, Carter Schonwald < carter.schonwald@gmail.com> wrote:

...
Hey Johan, on https://ghc.haskell.org/trac/ghc/ticket/7883 Ryan helped articulate what he'd want wrt memory ordering semantics.

One point is that It might be totally valid and reasonable to provide *both* variants, though if we only were to do one, the strong ordering guarantees might be a better default, albeit your use case and others does benefit from using the weakest / simplest primops possible,

On Thu, May 15, 2014 at 6:01 AM, Johan Tibell wrote:

...
I will update the wiki page (and later CmmSink) with the guarantees we expect CallishMachOps to provide. We do need to pick what kind of guarantee we want to provide. Options are here: http://en.cppreference.com/w/cpp/atomic/memory_order

Do we want to have these CallishMachOps to imply a full memory fence CPU instruction or some more relaxed ordering (e.g. only atomicity)?

_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

Johan Tibell

9 Jun 9 Jun

11:16 a.m.

I now have a working implementation: https://phabricator.haskell.org/D12 The code that's generated is pretty good, except for 1-2 extra moves that I think are due to the thing being a CallishMachOp: https://gist.github.com/tibbe/02e6dfdb5a63a9793651 The remaining TODO is to get things working in the LLVM codegen. It should be difficult (we just need to output the corresponding LLVM intrinsics), but I'm very familiar with that code. On Tue, May 20, 2014 at 6:12 PM, Ryan Newton wrote:

...

Yes, that's exactly what I'd like to see as well. Also, we must confess that most worldwide GHC cycles are currently spent on x86. Arm will surely become more important over time. But what's right for x86 is probably right for us for the near/medium term.

On Tue, May 20, 2014 at 11:04 AM, Johan Tibell wrote:

...
Sequentially consistent also corresponds to, according to the LLVM docs: "the C++0x/C1x memory_order_seq_cst, Java volatile, and the gcc-compatible __sync_* builtins", so we'll be in good company.

On Tue, May 20, 2014 at 5:01 PM, Johan Tibell wrote:

...
After some discussion with Simon, I will go ahead and try to implement fully sequentially consistent versions of these primops. We can add other versions that take the consistency model as an argument later, if needed.

On Thu, May 15, 2014 at 9:03 PM, Carter Schonwald < carter.schonwald@gmail.com> wrote:

...
Hey Johan, on https://ghc.haskell.org/trac/ghc/ticket/7883 Ryan helped articulate what he'd want wrt memory ordering semantics.

One point is that It might be totally valid and reasonable to provide *both* variants, though if we only were to do one, the strong ordering guarantees might be a better default, albeit your use case and others does benefit from using the weakest / simplest primops possible,

On Thu, May 15, 2014 at 6:01 AM, Johan Tibell wrote:

...
I will update the wiki page (and later CmmSink) with the guarantees we expect CallishMachOps to provide. We do need to pick what kind of guarantee we want to provide. Options are here: http://en.cppreference.com/w/cpp/atomic/memory_order

Do we want to have these CallishMachOps to imply a full memory fence CPU instruction or some more relaxed ordering (e.g. only atomicity)?

_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

Johan Tibell

6 p.m.

The code review was moved to https://phabricator.haskell.org/D15 On Mon, Jun 9, 2014 at 1:16 PM, Johan Tibell wrote:

...

I now have a working implementation: https://phabricator.haskell.org/D12

The code that's generated is pretty good, except for 1-2 extra moves that I think are due to the thing being a CallishMachOp: https://gist.github.com/tibbe/02e6dfdb5a63a9793651

The remaining TODO is to get things working in the LLVM codegen. It should be difficult (we just need to output the corresponding LLVM intrinsics), but I'm very familiar with that code.

On Tue, May 20, 2014 at 6:12 PM, Ryan Newton wrote:

...
Yes, that's exactly what I'd like to see as well. Also, we must confess that most worldwide GHC cycles are currently spent on x86. Arm will surely become more important over time. But what's right for x86 is probably right for us for the near/medium term.

On Tue, May 20, 2014 at 11:04 AM, Johan Tibell wrote:

...
Sequentially consistent also corresponds to, according to the LLVM docs: "the C++0x/C1x memory_order_seq_cst, Java volatile, and the gcc-compatible __sync_* builtins", so we'll be in good company.

On Tue, May 20, 2014 at 5:01 PM, Johan Tibell wrote:

...
After some discussion with Simon, I will go ahead and try to implement fully sequentially consistent versions of these primops. We can add other versions that take the consistency model as an argument later, if needed.

On Thu, May 15, 2014 at 9:03 PM, Carter Schonwald < carter.schonwald@gmail.com> wrote:

...
Hey Johan, on https://ghc.haskell.org/trac/ghc/ticket/7883 Ryan helped articulate what he'd want wrt memory ordering semantics.

One point is that It might be totally valid and reasonable to provide *both* variants, though if we only were to do one, the strong ordering guarantees might be a better default, albeit your use case and others does benefit from using the weakest / simplest primops possible,

On Thu, May 15, 2014 at 6:01 AM, Johan Tibell wrote:

...
I will update the wiki page (and later CmmSink) with the guarantees we expect CallishMachOps to provide. We do need to pick what kind of guarantee we want to provide. Options are here: http://en.cppreference.com/w/cpp/atomic/memory_order

Do we want to have these CallishMachOps to imply a full memory fence CPU instruction or some more relaxed ordering (e.g. only atomicity)?

_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

John Lato

4 May 4 May

10:32 p.m.

Hello, IMHO I think the desire to include these features is leading to a slightly cavalier attitude towards reordering concerns. Even assuming that the Cmm code generator doesn't reorder reads/writes around `CallishMachOp`, I don't see why this behavior should always be true, leading to possible future pain. Also, it's possible that LLVM may decide to reorder memory accesses AIUI because the underlying LLVM variables won't be synchronized. In a nutshell, even though everything may work now I suspect it'll become an ill-specified mess in a few years time. I don't have a ton of experience implementing these sorts of features in compilers though, so probably only worth a half cent. John L. On May 4, 2014 3:10 AM, "Johan Tibell" wrote:

...

Hi,

I found myself needing atomic operations on basic, mutable numeric values. I wrote a short design doc that I'd like feedback on:

https://ghc.haskell.org/trac/ghc/wiki/AtomicPrimops

I will try to implement these for 7.10.

-- Johan

_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

Carter Schonwald

5 May 5 May

3:14 a.m.

ryan, any thoughts on the (valid!) concern John Lato raises? I know you mentioned fighting ghc optimizations when writing concurrent code a while ago, and i believe I asked if you could document the issues some time so we can all think about how to make the sitch easier, could you share your thoughts? AFAIK, there isn't a clear "this is the memory ordering rules" even in an informal sense for ghc core / stg / cmm, and maybe we need to pin down some "informal" guarantees we want to hold? -Carter On Sun, May 4, 2014 at 6:32 PM, John Lato wrote:

...

Hello,

IMHO I think the desire to include these features is leading to a slightly cavalier attitude towards reordering concerns. Even assuming that the Cmm code generator doesn't reorder reads/writes around `CallishMachOp`, I don't see why this behavior should always be true, leading to possible future pain. Also, it's possible that LLVM may decide to reorder memory accesses AIUI because the underlying LLVM variables won't be synchronized.

In a nutshell, even though everything may work now I suspect it'll become an ill-specified mess in a few years time. I don't have a ton of experience implementing these sorts of features in compilers though, so probably only worth a half cent.

John L. On May 4, 2014 3:10 AM, "Johan Tibell" wrote:

...
Hi,

I found myself needing atomic operations on basic, mutable numeric values. I wrote a short design doc that I'd like feedback on:

https://ghc.haskell.org/trac/ghc/wiki/AtomicPrimops

I will try to implement these for 7.10.

-- Johan

_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

Ryan Newton

4:18 a.m.

Hi John & Carter, Deferring for a moment issues with CAS on lazy/pointer values, which pertain to Carter's question, I want to understand the ordering concern WRT only the "IntArray" primops that Johan listed. First, to my knowledge GHC doesn't do any reordering of effectful memory operations (e.g. readMutVar/writeMutVar). That is, I thought that threaded "State#" param was ironclad from the compiler's point of view. (In fact, if GHC does any optimization of these imperative bits, could someone point me to it?) For Johan's primops to work, each primop must represent a full memory fence that is respected both by the architecture, and by *both* compilers (GHC & LLVM). Since I don't think GHC is a problem, let's talk about LLVM. We need to verify that LLVM understands not to float regular loads and stores past one of its own atomic instructions. If that is the case (even without anything being marked "volatile"), then I think we are in ok shape, right? In the grand scheme of things isn't this a place where purity has done us good? All the memory traffic related to pure computation is and should be separate from these primops on mutable vars and arrays, and there shouldn't need to be any special ordering constraints *between* those two classes, should there? (Part 2)* CAS on lazy/pointer values* -- this bit *was *an ill-specified mess to begin with, for sure ;-). In part, the problem was that it was somewhat unusual and sussed out other cabal/GHChttps://github.com/rrnewton/haskell-lockfree/issues/10 issues (including a regression in 7.8https://github.com/rrnewton/haskell-lockfree/issues/28that I think is a GHC bug, but haven't filed yet [where a spurious "!" on a function arg of "Any" type changes behavior]). Even once the commitment was made to use the "ticketed" approach. (Carter, it's described in this trac ticket https://ghc.haskell.org/trac/ghc/ticket/8157.) Johan, FYI, I think we should actually backpedal on casMutVar# and casArray# and not expose a type like the one we have in 7.8: MutableArray# s a -> Int# -> a -> a -> State# s -> (# State# s, Int#, a #) I can't think of any good reason to use it, since we should never trust that "a" return value -- it's poison, and covering it up successfully requires careful attention to NOINLINEshttps://github.com/rrnewton/haskell-lockfree/issues/5. Rather, for 7.10, better to commit directly IMHO to the "Any" type to beat back the compiler: casMutVarTicketed# :: MutVar# RealWorld a -> Any a -> Any a -> State# RealWorld -> (# State# RealWorld, Int#, Any a #) Cheers, -Ryan

...

IMHO I think the desire to include these features is leading to a slightly cavalier attitude towards reordering concerns. Even assuming that the Cmm code generator doesn't reorder reads/writes around `CallishMachOp`, I don't see why this behavior should always be true, leading to possible future pain. Also, it's possible that LLVM may decide to reorder memory accesses AIUI because the underlying LLVM variables won't be synchronized.

In a nutshell, even though everything may work now I suspect it'll become an ill-specified mess in a few years time. I don't have a ton of experience implementing these sorts of features in compilers though, so probably only worth a half cent.

John L. On May 4, 2014 3:10 AM, "Johan Tibell" wrote:

...
Hi,

I found myself needing atomic operations on basic, mutable numeric values. I wrote a short design doc that I'd like feedback on:

https://ghc.haskell.org/trac/ghc/wiki/AtomicPrimops

I will try to implement these for 7.10.

-- Johan

_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

Ryan Newton

4:25 a.m.

...

For Johan's primops to work, each primop must represent a full memory fence that is respected both by the architecture, and by *both* compilers (GHC & LLVM). Since I don't think GHC is a problem, let's talk about LLVM. We need to verify that LLVM understands not to float regular loads and stores past one of its own atomic instructions. If that is the case (even without anything being marked "volatile"), then I think we are in ok shape, right?

Clarification -- this is assuming we're using the "SequentiallyConsistent" setting in the LLVM backend to get full fences on each op, which correspond to the gcc-compatible __sync_* builtins: http://llvm.org/docs/Atomics.html#sequentiallyconsistent

Ryan Newton

4:46 a.m.

One last comment -- none of the above is to suggest that I don't think we should eventually have a memory model (a la Java or C++11). But I (and Johan) don't think the addition of the primops Johan listed should wait on it. Further, I don't think these primops make the state of affairs any worse, given that we've *already* had the combination of IORef operations & parallel IO Threads for a long time, without a memory model. I think the informal agreement we've been muddling along with is something like this: - IORef operations have the same behavior as the analogous C operations -- no implied synchronization - all IORef ops are "volatile" wrt GHC (GHC won't reordered) - atomicModifyIORef does what its name implies Though I confess, I'm personally unclear on what the agreement is in at least two places: - What Haskell operations constitute grabbing a "lock" to protect IORef reads and writes? (We often use MVar based strategies for locking, but do they give a *guarantee* that they provide the necessary memory fences for the previous/subsequent IORef operations?) - Is the de-facto "volatile" status I implied before extended to the backends (C / LLVM)? I don't know but assume not. Note that even if not, this doesn't cause a problem for the proposed atomic primops, all of which are themselves Perhaps I and others get away with this level of murkiness because we depend on IORefs so little, with so much happening in the pure code ;-). Ah, and last of all -- while we do need to sort out all this stuff -- I want to point out that adding Johan's proposed primops isn't the key decision point. That ship sailed with 7.2 ;-). This is just about fleshing out what's already there (e.g. fetch and Xor in addition to fetch and Add) and improving the implementations by going to in-line primops. Best, -Ryan On Mon, May 5, 2014 at 12:25 AM, Ryan Newton wrote:

...

For Johan's primops to work, each primop must represent a full memory

...
fence that is respected both by the architecture, and by *both*compilers (GHC & LLVM). Since I don't think GHC is a problem, let's talk about LLVM. We need to verify that LLVM understands not to float regular loads and stores past one of its own atomic instructions. If that is the case (even without anything being marked "volatile"), then I think we are in ok shape, right?

Clarification -- this is assuming we're using the "SequentiallyConsistent" setting in the LLVM backend to get full fences on each op, which correspond to the gcc-compatible __sync_* builtins:

http://llvm.org/docs/Atomics.html#sequentiallyconsistent

Carter Schonwald

4:59 a.m.

what sailed in ghc 7.2? On Mon, May 5, 2014 at 12:46 AM, Ryan Newton wrote:

...

One last comment -- none of the above is to suggest that I don't think we should eventually have a memory model (a la Java or C++11). But I (and Johan) don't think the addition of the primops Johan listed should wait on it. Further, I don't think these primops make the state of affairs any worse, given that we've *already* had the combination of IORef operations & parallel IO Threads for a long time, without a memory model.

I think the informal agreement we've been muddling along with is something like this:

- IORef operations have the same behavior as the analogous C operations -- no implied synchronization - all IORef ops are "volatile" wrt GHC (GHC won't reordered) - atomicModifyIORef does what its name implies

Though I confess, I'm personally unclear on what the agreement is in at least two places:

- What Haskell operations constitute grabbing a "lock" to protect IORef reads and writes? (We often use MVar based strategies for locking, but do they give a *guarantee* that they provide the necessary memory fences for the previous/subsequent IORef operations?) - Is the de-facto "volatile" status I implied before extended to the backends (C / LLVM)? I don't know but assume not. Note that even if not, this doesn't cause a problem for the proposed atomic primops, all of which are themselves

Perhaps I and others get away with this level of murkiness because we depend on IORefs so little, with so much happening in the pure code ;-).

Ah, and last of all -- while we do need to sort out all this stuff -- I want to point out that adding Johan's proposed primops isn't the key decision point. That ship sailed with 7.2 ;-). This is just about fleshing out what's already there (e.g. fetch and Xor in addition to fetch and Add) and improving the implementations by going to in-line primops.

Best, -Ryan

On Mon, May 5, 2014 at 12:25 AM, Ryan Newton wrote:

...
For Johan's primops to work, each primop must represent a full memory

...
fence that is respected both by the architecture, and by *both*compilers (GHC & LLVM). Since I don't think GHC is a problem, let's talk about LLVM. We need to verify that LLVM understands not to float regular loads and stores past one of its own atomic instructions. If that is the case (even without anything being marked "volatile"), then I think we are in ok shape, right?

Clarification -- this is assuming we're using the "SequentiallyConsistent" setting in the LLVM backend to get full fences on each op, which correspond to the gcc-compatible __sync_* builtins:

http://llvm.org/docs/Atomics.html#sequentiallyconsistent

_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

Ryan Newton

5:04 a.m.

Oh, just the first CAS primop -- the initial decision to include these kinds of ops. Sent from my phone. On May 5, 2014 12:59 AM, "Carter Schonwald" wrote:

...

what sailed in ghc 7.2?

On Mon, May 5, 2014 at 12:46 AM, Ryan Newton wrote:

...
One last comment -- none of the above is to suggest that I don't think we should eventually have a memory model (a la Java or C++11). But I (and Johan) don't think the addition of the primops Johan listed should wait on it. Further, I don't think these primops make the state of affairs any worse, given that we've *already* had the combination of IORef operations & parallel IO Threads for a long time, without a memory model.

I think the informal agreement we've been muddling along with is something like this:

- IORef operations have the same behavior as the analogous C operations -- no implied synchronization - all IORef ops are "volatile" wrt GHC (GHC won't reordered) - atomicModifyIORef does what its name implies

Though I confess, I'm personally unclear on what the agreement is in at least two places:

- What Haskell operations constitute grabbing a "lock" to protect IORef reads and writes? (We often use MVar based strategies for locking, but do they give a *guarantee* that they provide the necessary memory fences for the previous/subsequent IORef operations?) - Is the de-facto "volatile" status I implied before extended to the backends (C / LLVM)? I don't know but assume not. Note that even if not, this doesn't cause a problem for the proposed atomic primops, all of which are themselves

Perhaps I and others get away with this level of murkiness because we depend on IORefs so little, with so much happening in the pure code ;-).

Ah, and last of all -- while we do need to sort out all this stuff -- I want to point out that adding Johan's proposed primops isn't the key decision point. That ship sailed with 7.2 ;-). This is just about fleshing out what's already there (e.g. fetch and Xor in addition to fetch and Add) and improving the implementations by going to in-line primops.

Best, -Ryan

On Mon, May 5, 2014 at 12:25 AM, Ryan Newton wrote:

...
For Johan's primops to work, each primop must represent a full memory

...
fence that is respected both by the architecture, and by *both*compilers (GHC & LLVM). Since I don't think GHC is a problem, let's talk about LLVM. We need to verify that LLVM understands not to float regular loads and stores past one of its own atomic instructions. If that is the case (even without anything being marked "volatile"), then I think we are in ok shape, right?

Clarification -- this is assuming we're using the "SequentiallyConsistent" setting in the LLVM backend to get full fences on each op, which correspond to the gcc-compatible __sync_* builtins:

http://llvm.org/docs/Atomics.html#sequentiallyconsistent

_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

John Lato

6:17 a.m.

Hi Ryan, It's a little late here, but I've tried to put all my thoughts in order on this topic. On Sun, May 4, 2014 at 9:18 PM, Ryan Newton wrote:

...

Hi John & Carter,

Deferring for a moment issues with CAS on lazy/pointer values, which pertain to Carter's question, I want to understand the ordering concern WRT only the "IntArray" primops that Johan listed. First, to my knowledge GHC doesn't do any reordering of effectful memory operations (e.g. readMutVar/writeMutVar). That is, I thought that threaded "State#" param was ironclad from the compiler's point of view. (In fact, if GHC does any optimization of these imperative bits, could someone point me to it?)

My understanding is that GHC does do reordering of memory operations, in the CmmSink pass ( http://www.haskell.org/ghc/docs/latest/html/libraries/ghc/src/CmmSink.html, http://blog.ezyang.com/2014/01/so-you-want-to-add-a-new-concurrency-primitiv...). In particular ghc may currently reorder loads, although it doesn't seem to reorder stores (at this time). And ghc *currently* only performs these reorderings in limited cases.

...

For Johan's primops to work, each primop must represent a full memory fence that is respected both by the architecture, and by *both* compilers (GHC & LLVM). Since I don't think GHC is a problem, let's talk about LLVM. We need to verify that LLVM understands not to float regular loads and stores past one of its own atomic instructions. If that is the case (even without anything being marked "volatile"), then I think we are in ok shape, right?

Provided the operations are SequentiallyConsistent (could probably use Acquire/Release for reads/writes) I think LLVM will handle this appropriately.

...

In the grand scheme of things isn't this a place where purity has done us good? All the memory traffic related to pure computation is and should be separate from these primops on mutable vars and arrays, and there shouldn't need to be any special ordering constraints *between* those two classes, should there?

Perhaps. But it's still important to get ordering right WRT multiple mutable ops, so I'm not sure that purity actually helps that much, except that we can get away with underspecified operations moreso than in C++. (also x86 has a stronger model than other archs, it's possible that we currently get away with stuff that's unsafe on other platforms) I have to confess that I don't understand what you mean by suggesting we should implement these primops without having an underlying memory model. What does atomicReadIntArray# mean if there's no memory model in which its defined? How can you be sure that other ghc hackers won't break some assumptions that the implementations rely upon, if they don't have a memory model to describe how operations should behave? This is strongly related to the TODO: in Johan's proposal: just because that may be true now (I don't know if it is!) doesn't mean it will remain so in the future. Of course, given the limited number of GHC hackers, everything may work out fine in the end. It just seems to me that it would be more sensible to define the semantics first and build the primops around them, rather than primops first and semantics later. Best, John

...

(Part 2)* CAS on lazy/pointer values* -- this bit *was *an ill-specified mess to begin with, for sure ;-). In part, the problem was that it was somewhat unusual and sussed out other cabal/GHChttps://github.com/rrnewton/haskell-lockfree/issues/10 issues (including a regression in 7.8https://github.com/rrnewton/haskell-lockfree/issues/28that I think is a GHC bug, but haven't filed yet [where a spurious "!" on a function arg of "Any" type changes behavior]). Even once the commitment was made to use the "ticketed" approach. (Carter, it's described in this trac ticket https://ghc.haskell.org/trac/ghc/ticket/8157.)

Johan, FYI, I think we should actually backpedal on casMutVar# and casArray# and not expose a type like the one we have in 7.8:

MutableArray# s a -> Int# -> a -> a -> State# s -> (# State# s, Int#, a #)

I can't think of any good reason to use it, since we should never trust that "a" return value -- it's poison, and covering it up successfully requires careful attention to NOINLINEshttps://github.com/rrnewton/haskell-lockfree/issues/5. Rather, for 7.10, better to commit directly IMHO to the "Any" type to beat back the compiler:

casMutVarTicketed# :: MutVar# RealWorld a -> Any a -> Any a -> State# RealWorld -> (# State# RealWorld, Int#, Any a #)

Cheers, -Ryan

...
IMHO I think the desire to include these features is leading to a slightly cavalier attitude towards reordering concerns. Even assuming that the Cmm code generator doesn't reorder reads/writes around `CallishMachOp`, I don't see why this behavior should always be true, leading to possible future pain. Also, it's possible that LLVM may decide to reorder memory accesses AIUI because the underlying LLVM variables won't be synchronized.

In a nutshell, even though everything may work now I suspect it'll become an ill-specified mess in a few years time. I don't have a ton of experience implementing these sorts of features in compilers though, so probably only worth a half cent.

John L. On May 4, 2014 3:10 AM, "Johan Tibell" wrote:

...
Hi,

I found myself needing atomic operations on basic, mutable numeric values. I wrote a short design doc that I'd like feedback on:

https://ghc.haskell.org/trac/ghc/wiki/AtomicPrimops

I will try to implement these for 7.10.

-- Johan

_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

Ryan Newton

6:08 p.m.

Hi John, My understanding is that GHC does do reordering of memory operations, in

...

the CmmSink pass ( http://www.haskell.org/ghc/docs/latest/html/libraries/ghc/src/CmmSink.html,

http://blog.ezyang.com/2014/01/so-you-want-to-add-a-new-concurrency-primitiv...). In particular ghc may currently reorder loads, although it doesn't seem to reorder stores (at this time). And ghc *currently* only performs these reorderings in limited cases.

Thanks! That's a great summary by ezyang and I hadn't yet read it. I have to confess that I don't understand what you mean by suggesting we

...

should implement these primops without having an underlying memory model.

Oh, no, I think we should work on getting as close as possible to said memory model. But given how much work it's been for other languages I hope we can pipeline some of the implementation tasks, at the same time as that is going on. For example, we've got one GSOC working on a concurrent data structure, and it would be great if they could have a GHC branch to test inlined primops, even if there is much more work to do to verify the safety of that prototype before it makes it to the master branch. (There's also the social question of getting someone with enough time to do this memory model -- or port it from other languages. Suggestions?)

...

What does atomicReadIntArray# mean if there's no memory model in which its defined? How can you be sure that other ghc hackers won't break some assumptions that the implementations rely upon, if they don't have a memory model to describe how operations should behave? This is strongly related to the TODO: in Johan's proposal: just because that may be true now (I don't know if it is!) doesn't mean it will remain so in the future.

I totally agree. I'm just not sure where we should go with it. I think as a practical matter we need some kind of memory model fuzz tester for GHC, since we certainly won't have formal verification throughout the whole compiler. There's been some neat work on this in the memory models community, it seems. It sounds like Jan-Willem also suggests that there should be invasive changes made to the compiler to reify the memory model as a DAG of dependencies on the individual operations. I guess the concrete choice for Johan's proposed changes is to figure out if its possible to quickly hack the relevant CMM passes that might reorder (to be primop aware), rather than make the more general improvement that Jan-Willem suggested in the comments on ezyang's post. Of course, given the limited number of GHC hackers, everything may work out

...

fine in the end. It just seems to me that it would be more sensible to define the semantics first and build the primops around them, rather than primops first and semantics later.

I think it's a bit of a chicken and egg, because we need to get interesting enough data structures and libraries to make someone care enough to do this work ;-). I think we need a collaboration with someone who does memory model stuff for a living if we could get them interested or make this into "research" somehow.

Ryan Newton

1:34 a.m.

This would be fantastic and I'd be happy to work with you to expose the extra ops through the atomic-primops pkg. This would continue our trend of supporting gcc intrinsic style atomics (full fence, no configurabilit of the memory model a la lllvm). These word sized ops are most important, but ultimately there are the other sizes to support as well -- even 128 bit. On Sunday, May 4, 2014, Johan Tibell wrote:

...

Hi,

I found myself needing atomic operations on basic, mutable numeric values. I wrote a short design doc that I'd like feedback on:

https://ghc.haskell.org/trac/ghc/wiki/AtomicPrimops

I will try to implement these for 7.10.

-- Johan

-- Sent from Gmail Mobile

Simon Marlow

7:33 p.m.

On 04/05/14 11:10, Johan Tibell wrote:

...

I found myself needing atomic operations on basic, mutable numeric values. I wrote a short design doc that I'd like feedback on:

https://ghc.haskell.org/trac/ghc/wiki/AtomicPrimops

I will try to implement these for 7.10.

Right now, all CmmUnsafeForeignCalls (this includes CallishMachOps) are assumed to read and clobber arbitrary memory, so we won't commute reads or writes past them. We could relax this in the future, so the best thing to do would be to write down what semantics you expect (hah, I realise how difficult this is! An informal description will have to suffice.) Cheers, Simon

4058

Age (days ago)

4094

Last active (days ago)

List overview

Download

22 comments

7 participants

participants (7)

Carter Schonwald
Edward Kmett
Edward Z. Yang
Johan Tibell
John Lato
Ryan Newton
Simon Marlow