On Sun, Jul 21, 2013 at 3:32 AM, Carter Schonwald <carter.schonwald@gmail.com> wrote:
ok, could you add those comments (about additional operations to consider) to the ticket?

Sure.  Just did that.
 
relatedly: if we want these atomic ops to use the sequential analogues when we're not using the threaded run time system, does that mean 
we need to have a symbol / constant variable exposed in the RTS we link in, so that the inline code branches on a linktime constant value / symbol (something like "isThreadedRTS:: Bool", )  or some sort of analogue thereof?  

I think it will take some care to mimic the semantics perfectly.  Why not just leave the real atomic ops even in non-threaded mode, at least at first?  Later we can optimize it if we find that people are using concurrent data structures heavily in non-threaded mode ;-). 
 
one nice thing about doing such, is that if at some point link time optimization is added, the branch would go away! On the other hand, it could be argued that the cost of the call to the CAS primops in their current form isn't that much more expensive than such a branch. 

Indeed, I'm much more concerned about performance in the threaded case and making sure they're correct.