
On 17/05/2011 00:44, dm-list-haskell-cafe@scs.stanford.edu wrote:
But I've never heard anyone claim that a prerequisite to Haskell being useful as a parallel programming language is a well-defined memory model. I think there's a couple of reasons for that:
- deterministic parallel programming models (e.g. Strategies, monad-par) don't care about memory models. These are the first port of call for parallel programming.
Okay, well, I make this claim as a C/C++ programmer more used to writing low-level/kernel code than functional code. So I'm thinking less of things like deterministic scientific codes and more along the lines of network servers processing lots of messages and other asynchronous events happening in a non-deterministic order anyway.
I think several programming patterns would be useful in Haskell that require some understanding of the memory model. One that particularly jumps to mind is the read-copy-update (RCU) pattern for frequently accessed but seldom updated data (configuration state, routing tables, etc.)
As you've described them, IORefs are well suited to such a pattern because reads are very cheap and updates happen through an atomic pointer write. But if the documentation doesn't say that readIORef is supposed to be cheap (or imply so by mentioning that readIORef exposes the underlying hardware's memory consistency), then there's no way to tell that IORefs are suitable for RCU, so people may think they have to do something uglier using peek and poke.
Ok. I'm not sure how feasible RCU is with IORefs, or even whether it's necessary. After all, the usual pattern of having readers use readIORef while writers use atomicModifyIORef gives the RCU cost model (zero overhead for readers, expensive writes) with far less complexity. Garbage collection does the job of reclamation automatically. Have I missed something here? A slight improvement over this scheme is to use TVar with readTVarIO for the readers (no transaction involved), and transactions for the writers. This greatly increases the scope of what a writer can do, since they can perform an update on a bunch of state at the same time. The situation is complicated somewhat by generational garbage collection, which can create weird performance artifacts when mutation is involved.
Actually:
http://hackage.haskell.org/packages/archive/base/latest/doc/html/Control-Con...
There's nothing in the documentation for MVars that says anything about sequential consistency.
That's true, though I think most readers would assume sequential consistency in the absence of any statements to the contrary (obviously you are a counter example ;-).
If you take my example from the previous email, replace writeIORef with (\p v -> modifyMVar_ p $ return v), replace all other occurrences of IORef with MVar, nothing in the docs suggests you won't see the "critical section" message printed twice.
There is an operational semantics in the Concurrent Haskell paper that does not admit the behaviour you describe, but I'll add something to the docs to that effect.
Presumably modifyMVar must take a spinlock. Moreover, to be correct on x86, the spinlock must execute an LFENCE after acquiring the lock and an SFENCE prior to releasing the lock. But does readMVar acquire the spinlock, or is it optimized to take advantage of pointer-sized writes being atomic?
That's a good point - readMVar cannot be optimised to avoid the lock. In fact, readMVar is just readMVar m = do x <- takeMVar m; putMVar m x; return x although there have been suggestions that we should make it atomic. If we were to do so, it would still have to use a barrier to avoid reordering.
Systems have memory models for a reason; you can't get away from them entirely for all applications. Haskell's strength, I think, is in making sure that 99+% of code can't possibly depend on the memory model. For functional and ST code, you don't even need to look at the code to know that this is true--safety is guaranteed by the types (modulo some unsafe stuff I hope will be even easier to detect with ghc 7.2...). But for that last little bit of tricky code, the best you can do is specify the behavior of the building blocks and maybe provide some useful architecture-independent wrappers for things (e.g., abstracted memory barriers).
Agree 100%. Cheers, Simon