
At Mon, 16 May 2011 22:31:14 +0100, Simon Marlow wrote:
Good example - so it looks like we don't get full sequential consistency on x86 (actually I'd been thinking only about write ordering and forgetting that reads could be reordered around writes).
But that's bad because it means Haskell has a memory model, and we have to say what it is, or at least say that ordering is undefined.
Right. So I think the memory model is something along the lines of the no-crash property you mentioned--i.e., readIORef will return some value written with writeIORef and not a mish-mash of multiple writes--combined with the model of the underlying hardware. Maybe atomicModifyIORef should serve as a barrier, too.
But I've never heard anyone claim that a prerequisite to Haskell being useful as a parallel programming language is a well-defined memory model. I think there's a couple of reasons for that:
- deterministic parallel programming models (e.g. Strategies, monad-par) don't care about memory models. These are the first port of call for parallel programming.
Okay, well, I make this claim as a C/C++ programmer more used to writing low-level/kernel code than functional code. So I'm thinking less of things like deterministic scientific codes and more along the lines of network servers processing lots of messages and other asynchronous events happening in a non-deterministic order anyway. I think several programming patterns would be useful in Haskell that require some understanding of the memory model. One that particularly jumps to mind is the read-copy-update (RCU) pattern for frequently accessed but seldom updated data (configuration state, routing tables, etc.) As you've described them, IORefs are well suited to such a pattern because reads are very cheap and updates happen through an atomic pointer write. But if the documentation doesn't say that readIORef is supposed to be cheap (or imply so by mentioning that readIORef exposes the underlying hardware's memory consistency), then there's no way to tell that IORefs are suitable for RCU, so people may think they have to do something uglier using peek and poke.
- If you have to use concurrency, then none of MVars, atomicModifyIORef or STM care about memory models either.
So the memory model only becomes visible when you use concurrency with shared IORefs (without atomicModifyIORef) or bare peek/poke, which is pretty rare and easily avoided.
Actually: http://hackage.haskell.org/packages/archive/base/latest/doc/html/Control-Con... There's nothing in the documentation for MVars that says anything about sequential consistency. If you take my example from the previous email, replace writeIORef with (\p v -> modifyMVar_ p $ return v), replace all other occurrences of IORef with MVar, nothing in the docs suggests you won't see the "critical section" message printed twice. Presumably modifyMVar must take a spinlock. Moreover, to be correct on x86, the spinlock must execute an LFENCE after acquiring the lock and an SFENCE prior to releasing the lock. But does readMVar acquire the spinlock, or is it optimized to take advantage of pointer-sized writes being atomic? One can argue that an optimized readMVar is better, because you can always force serialization with modifyMVar. Or one can argue that, for consistency, readMVar should be identical to a takeMVar followed by a putMVar, since people can use IORefs for less deterministic behavior. The latter is certainly what the *current* code does, but until the behavior is documented, I'd worry about some later version of ghc optimizing readMVar and changing the consistency. Systems have memory models for a reason; you can't get away from them entirely for all applications. Haskell's strength, I think, is in making sure that 99+% of code can't possibly depend on the memory model. For functional and ST code, you don't even need to look at the code to know that this is true--safety is guaranteed by the types (modulo some unsafe stuff I hope will be even easier to detect with ghc 7.2...). But for that last little bit of tricky code, the best you can do is specify the behavior of the building blocks and maybe provide some useful architecture-independent wrappers for things (e.g., abstracted memory barriers). David