Re: [Haskell-cafe] ANNOUNCE: iterIO-0.1 - iteratee-based IO with pipe operators

On 12/05/2011 16:04, David Mazieres expires 2011-08-10 PDT wrote:
At Thu, 12 May 2011 09:57:13 +0100, Simon Marlow wrote:
So to answer my own question from earlier, I did a bit of benchmarking, and it seems that on my machine (a 2.4 GHz Intel Xeon 3060, running linux 2.6.38), I get the following costs:
9 ns - return () :: IO () -- baseline (meaningless in itself) 13 ns - unsafeUnmask $ return () -- with interrupts enabled 18 ns - unsafeUnmask $ return () -- inside a mask_
13 ns - ffi -- a null FFI call (getpid cached by libc) 18 ns - unsafeUnmask ffi -- with interrupts enabled 22 ns - unsafeUnmask ffi -- inside a mask_
Those are lower than I was expecting, but look plausible. There's room for improvement too (by inlining some or all of unsafeUnmask#).
Do you mean inline unsafeUnmask, or unmaskAsyncExceptions#? I tried inlining unsafeUnmask by writing my own version and giving it the INLINE pragma, and it didn't affect performance at all.
Right, I meant inlining unmaskAsyncExceptions#, which would require compiler support.
However, the general case of unsafeUnmask E, where E is something more complex than return (), will be more expensive because a new closure for E has to be created. e.g. try "return x" instead of "return ()", and try to make sure that the closure has to be created once per unsafeUnmask, not lifted out and shared.
Okay. I'm surprised by getpid example wasn't already stressing this, but, indeed, I see a tiny difference with the following code:
ffi>>= return . (1 +) -- where ffi calls getpid
13 ns - no unmasking 20 ns - unsafeUnmask when not inside _mask 25 ns - unsafeUnmask when benchmark loop in inside one big _mask
So now we're talking about 28 cycles or something instead of 22. Still not a huge deal.
Ok, sounds reasonable.
There are no locks here, thanks to the message-passing implementation we use for throwTo between processors.
Okay, that sounds good. So then there is no guarantee about ordering of throwTo exceptions? That seems like a good thing since there are other mechanisms for synchronization.
What kind of ordering guarantee did you have in mind? We do guarantee that in throwTo t e1 throwTo t e2 Thread t will receive e1 before e2 (obviously, because throwTo is synchronous and only returns when the exception has been raised). Pending exceptions are processed in LIFO order (for no good reason other than performance), so there's no kind of fairness guarantee of the kind you get with MVars. One thread doing throwTo can be starved by others. I don't think that's a serious problem. Cheers, Simon

At Thu, 12 May 2011 16:45:02 +0100, Simon Marlow wrote:
There are no locks here, thanks to the message-passing implementation we use for throwTo between processors.
Okay, that sounds good. So then there is no guarantee about ordering of throwTo exceptions? That seems like a good thing since there are other mechanisms for synchronization.
What kind of ordering guarantee did you have in mind? We do guarantee that in
throwTo t e1 throwTo t e2
Thread t will receive e1 before e2 (obviously, because throwTo is synchronous and only returns when the exception has been raised). ... Pending exceptions are processed in LIFO order (for no good reason other than performance)...
I mean, suppose you have three CPUs, A, B, and C running threads ta, tb, and tc. Then should the following order of events be permitted? A B C throwTo tc e1 throwTo tb e2 catch e2 poke p x peek p (sees x) catch e1 I would argue that this is just fine, that one should rely on MVars if one cares about ordering. But I'm not sure what "Pending exceptions are processed in LIFO order" means in the presence of relaxed memory consistency. The reason I'm asking is that I want to make sure I never end up having to pay the overhead of an MFENCE instruction or equivalent every time I use unmaskAsyncExceptions#... David

On 12/05/2011 18:24, dm-list-haskell-cafe@scs.stanford.edu wrote:
At Thu, 12 May 2011 16:45:02 +0100, Simon Marlow wrote:
There are no locks here, thanks to the message-passing implementation we use for throwTo between processors.
Okay, that sounds good. So then there is no guarantee about ordering of throwTo exceptions? That seems like a good thing since there are other mechanisms for synchronization.
What kind of ordering guarantee did you have in mind? We do guarantee that in
throwTo t e1 throwTo t e2
Thread t will receive e1 before e2 (obviously, because throwTo is synchronous and only returns when the exception has been raised). ... Pending exceptions are processed in LIFO order (for no good reason other than performance)...
I mean, suppose you have three CPUs, A, B, and C running threads ta, tb, and tc. Then should the following order of events be permitted?
A B C throwTo tc e1 throwTo tb e2 catch e2 poke p x peek p (sees x) catch e1
I would argue that this is just fine, that one should rely on MVars if one cares about ordering. But I'm not sure what "Pending exceptions are processed in LIFO order" means in the presence of relaxed memory consistency.
Oh, that can't happen. A's first throwTo only returns when the exception has been raised in C - throwTo is like a synchronous communication in this sense. We went to-and-fro on this aspect of the throwTo design a few times. The synchronous semantics for throwTo tends to be more useful for the programmer, but is harder to implement. If you want asynchronous throwTo, you can always get it with forkIO.throwTo. As far as memory consistency goes, we claim to provide sequential consistency for IORef and IOArray operations, but not for peeks and pokes.
The reason I'm asking is that I want to make sure I never end up having to pay the overhead of an MFENCE instruction or equivalent every time I use unmaskAsyncExceptions#...
Right, I don't think that should be necessary. Cheers, Simon

On 13 May 2011 19:06, Simon Marlow
consistency for IORef and IOArray operations, but not for peeks and pokes.
Hi Simon, Could you please point me to more information about the sequential consistency of IORefs? I was looking for something about this recently but couldn't find it. I don't see anything in the Haddock for Data.IORef. Cheers, Bernie.

On 13/05/2011 21:12, Bernie Pope wrote:
On 13 May 2011 19:06, Simon Marlow
mailto:marlowsd@gmail.com> wrote: As far as memory consistency goes, we claim to provide sequential consistency for IORef and IOArray operations, but not for peeks and pokes.
Hi Simon,
Could you please point me to more information about the sequential consistency of IORefs? I was looking for something about this recently but couldn't find it. I don't see anything in the Haddock for Data.IORef.
Yes, it's not actually documented as far as I know, and we should fix that. But if you think about it, sequential consistency is really the only sensible policy: suppose one processor creates a heap object and writes a reference to it in the IORef, then another processor reads the IORef. The writes that created the heap object must be visible to the second processor, otherwise it will encounter uninitialised memory and crash. So sequential consistency is necessary to ensure concurrent programs can't crash. Now perhaps it's possible to have a relaxed memory model that provides the no-crashes guarantee but still allows IORef writes to be reordered (e.g. some kind of causal consistency). That might be important if there is some processor arcitecture that provides that memory model, but as far as I know there isn't. For some background there was a discussion about this on the haskell-prime mailing list a few years ago, I think. Cheers, Simon

On 16 May 2011 19:56, Simon Marlow
On 13/05/2011 21:12, Bernie Pope wrote:
Could you please point me to more information about the sequential
consistency of IORefs? I was looking for something about this recently but couldn't find it. I don't see anything in the Haddock for Data.IORef.
Yes, it's not actually documented as far as I know, and we should fix that.
Thanks Simon. I was thinking about this in the context of a blog post by Lennart Augustsson: http://augustss.blogspot.com/2011/04/ugly-memoization-heres-problem-that-i.h... He says that "There's no guarantee about readIORef and writeIORef when doing multi-threading.". But I was wondering if that was true, and if it were, what the consequences would be. If you read his reply to my question on the blog, then I believe that he was saying that sequential consistency was not guaranteed. If you have time to read his blog article, I wonder if you could comment on the need (or lack of need) for MVars or atomicModifyIORef? If I understand correctly, it would be okay to use readIORef/writeIORef, assuming that it is okay for some computations to be repeated.
But if you think about it, sequential consistency is really the only sensible policy: suppose one processor creates a heap object and writes a reference to it in the IORef, then another processor reads the IORef. The writes that created the heap object must be visible to the second processor, otherwise it will encounter uninitialised memory and crash. So sequential consistency is necessary to ensure concurrent programs can't crash.
Yes, I agree, and that was what I was thinking. Otherwise well-typed programs could go horribly wrong. For some background there was a discussion about this on the haskell-prime
mailing list a few years ago, I think.
Thanks, I try to dig it up. Cheers, Bernie.

At Tue, 17 May 2011 02:18:55 +1000, Bernie Pope wrote:
http://augustss.blogspot.com/2011/04/ ugly-memoization-heres-problem-that-i.html
He says that "There's no guarantee about readIORef and writeIORef when doing multi-threading.". But I was wondering if that was true, and if it were, what the consequences would be. If you read his reply to my question on the blog, then I believe that he was saying that sequential consistency was not guaranteed.
While I don't know how IORefs work and I'd love to understand this better, I can't imagine any IORef implementation in which memoIO (in the blog post) would give the wrong answer on x86. It might, of course, cause f x to be evaluated multiple times on the same x. However, on other CPUs (e.g., the DEC alpha), there could maybe, maybe be issues. Though I'm not sure, since to avoid crashes, the alpha implementation of IORef would need to include a memory barrier. The question is whether there is an architecture in which IORef avoids crashes AND memoIO can give you the wrong answer. Also, if Simon's original post means that IORef operations all contain barrier instructions, it could be that memoIO is simply correct and the blog post is simply wrong about needing MVars. David

At Mon, 16 May 2011 10:56:02 +0100, Simon Marlow wrote:
Yes, it's not actually documented as far as I know, and we should fix that. But if you think about it, sequential consistency is really the only sensible policy: suppose one processor creates a heap object and writes a reference to it in the IORef, then another processor reads the IORef. The writes that created the heap object must be visible to the second processor, otherwise it will encounter uninitialised memory and crash. So sequential consistency is necessary to ensure concurrent programs can't crash.
Now perhaps it's possible to have a relaxed memory model that provides the no-crashes guarantee but still allows IORef writes to be reordered (e.g. some kind of causal consistency). That might be important if there is some processor arcitecture that provides that memory model, but as far as I know there isn't.
Actually, in your heap object example, it sounds like you only really care about preserving program order, rather than write atomicity. Thus, you can get away with less-than-sequential consistency and not crash. The x86 is an example of a relaxed memory model that provides the no-crashes guarantee you are talking about. Specifically, the x86 deviates from sequential consistency in two ways 1. A load can finish before an earlier store to a different memory location. [intel, Sec. 8.2.3.4] 2. A thread can read its own writes early. [intel, 8.2.3.5] [Section references are to the intel architecture manual, vol 3a: http://www.intel.com/Assets/PDF/manual/253668.pdf] One could imagine an implementation of IORefs that relies on the fact that pointer writes are atomic and that program order is preserved to avoid mutex overhead for most calls. E.g.: struct IORef { spinlock_t lock; /* Only ever used by atomicModifyIORef */ HaskellValue *val; /* Updated atomically because pointer-sized writes are atomic */ }; HaskellValue * readIORef (struct IORef *ref) { return ref->val; } void writeIORef (struct IORef *ref, HaskellValue *val) { /* Note that if *val was initialized in the same thread, then by * the time another CPU sees ref->val, it will also see the * correct contents of *ref->val, because stores are seen in a * consistent order by other processors [intel, Sec. 8.2.3.7]. * * If *val was initialized in a different thread, then since this * thread has seen it, other threads will too, because x86 * guarantees stores are transitively visible [intel, Sec. 8.2.3.6]. */ ref->val = val; } /* modifyIORef is built out of readIORef and writeIORef */ HaskellValue * atomicModifyIORef (Struct IORef *ref, HaskellFunction *f) { HaskellValue *result; spinlock_acquire (&ref->lock); result = modifyIORef (ref, f); spinlock_release (&ref->lock); return result; } This is actually how I assumed IORefs worked. But then consider the following program: maybePrint :: IORef Bool -> IORef Bool -> IO () maybePrint myRef yourRef = do writeIORef myRef True yourVal <- readIORef yourRef unless yourVal $ putStrLn "critical section" main :: IO () main = do r1 <- newIORef False r2 <- newIORef False forkOS $ maybePrint r1 r2 forkOS $ maybePrint r2 r1 threadDelay 1000000 Under sequential consistency, the string "critical section" should be output at most once. However, with the above IORef implementation on x86, since a read can finish before a write to a different location, both threads might see False for the value of yourVal. To prevent this deviation from sequential consistency, you would need to do something like stick an MFENCE instruction at the end of writeIORef, and that would slow down the common case where you don't care about sequential consistency. In fact, I would argue that if you care about S.C., you should either be using atomicModifyIORef or MVars. Can you explain what actually happens inside the real IORef implementation? As an aside, these days one sees a lot of hand-wringing over the fact that CPU clock rates have been flat for a while and the only way to get more performance is through parallelism. "How are we going to teach programmers to write concurrent code when it's so hard to write and debug?" I've heard numerous people ask. Haskell could be a major step in the right direction, since in the absence of variables, it's impossible to have data races. (You can still have deadlock and other kinds of race condition, such as the one in maybePrint above, if you had my definition of IORef, but data races are by far the most pernicious concurrency problems.) Of course, the key to making Haskell useful in a parallel setting is that things like the memory model have to be fully specified... Thanks, David

On 16/05/11 20:31, dm-list-haskell-cafe@scs.stanford.edu wrote:
At Mon, 16 May 2011 10:56:02 +0100, Simon Marlow wrote:
Yes, it's not actually documented as far as I know, and we should fix that. But if you think about it, sequential consistency is really the only sensible policy: suppose one processor creates a heap object and writes a reference to it in the IORef, then another processor reads the IORef. The writes that created the heap object must be visible to the second processor, otherwise it will encounter uninitialised memory and crash. So sequential consistency is necessary to ensure concurrent programs can't crash.
Now perhaps it's possible to have a relaxed memory model that provides the no-crashes guarantee but still allows IORef writes to be reordered (e.g. some kind of causal consistency). That might be important if there is some processor arcitecture that provides that memory model, but as far as I know there isn't.
Actually, in your heap object example, it sounds like you only really care about preserving program order, rather than write atomicity. Thus, you can get away with less-than-sequential consistency and not crash.
The x86 is an example of a relaxed memory model that provides the no-crashes guarantee you are talking about. Specifically, the x86 deviates from sequential consistency in two ways
1. A load can finish before an earlier store to a different memory location. [intel, Sec. 8.2.3.4]
2. A thread can read its own writes early. [intel, 8.2.3.5]
[Section references are to the intel architecture manual, vol 3a: http://www.intel.com/Assets/PDF/manual/253668.pdf]
One could imagine an implementation of IORefs that relies on the fact that pointer writes are atomic and that program order is preserved to avoid mutex overhead for most calls. E.g.:
struct IORef { spinlock_t lock; /* Only ever used by atomicModifyIORef */ HaskellValue *val; /* Updated atomically because pointer-sized writes are atomic */ };
HaskellValue * readIORef (struct IORef *ref) { return ref->val; }
void writeIORef (struct IORef *ref, HaskellValue *val) { /* Note that if *val was initialized in the same thread, then by * the time another CPU sees ref->val, it will also see the * correct contents of *ref->val, because stores are seen in a * consistent order by other processors [intel, Sec. 8.2.3.7]. * * If *val was initialized in a different thread, then since this * thread has seen it, other threads will too, because x86 * guarantees stores are transitively visible [intel, Sec. 8.2.3.6]. */ ref->val = val; }
/* modifyIORef is built out of readIORef and writeIORef */
HaskellValue * atomicModifyIORef (Struct IORef *ref, HaskellFunction *f) { HaskellValue *result; spinlock_acquire (&ref->lock);
result = modifyIORef (ref, f);
spinlock_release (&ref->lock); return result; }
This is actually how I assumed IORefs worked.
Right, that is how IORefs work. (well, atomicModifyIORef is a bit different, but the differences aren't important here)
But then consider the following program:
maybePrint :: IORef Bool -> IORef Bool -> IO () maybePrint myRef yourRef = do writeIORef myRef True yourVal<- readIORef yourRef unless yourVal $ putStrLn "critical section"
main :: IO () main = do r1<- newIORef False r2<- newIORef False forkOS $ maybePrint r1 r2 forkOS $ maybePrint r2 r1 threadDelay 1000000
Under sequential consistency, the string "critical section" should be output at most once. However, with the above IORef implementation on x86, since a read can finish before a write to a different location, both threads might see False for the value of yourVal.
To prevent this deviation from sequential consistency, you would need to do something like stick an MFENCE instruction at the end of writeIORef, and that would slow down the common case where you don't care about sequential consistency. In fact, I would argue that if you care about S.C., you should either be using atomicModifyIORef or MVars.
Good example - so it looks like we don't get full sequential consistency on x86 (actually I'd been thinking only about write ordering and forgetting that reads could be reordered around writes). But that's bad because it means Haskell has a memory model, and we have to say what it is, or at least say that ordering is undefined. In practice I don't think anyone actually does use IORef in this way. Typically you need at least one atomicModifyIORef somewhere, and that acts as a barrier.
As an aside, these days one sees a lot of hand-wringing over the fact that CPU clock rates have been flat for a while and the only way to get more performance is through parallelism. "How are we going to teach programmers to write concurrent code when it's so hard to write and debug?" I've heard numerous people ask.
Haskell could be a major step in the right direction, since in the absence of variables, it's impossible to have data races. (You can still have deadlock and other kinds of race condition, such as the one in maybePrint above, if you had my definition of IORef, but data races are by far the most pernicious concurrency problems.) Of course, the key to making Haskell useful in a parallel setting is that things like the memory model have to be fully specified...
Hmm, as it happens I have rather a lot to say on this particular matter! But I've never heard anyone claim that a prerequisite to Haskell being useful as a parallel programming language is a well-defined memory model. I think there's a couple of reasons for that: - deterministic parallel programming models (e.g. Strategies, monad-par) don't care about memory models. These are the first port of call for parallel programming. - If you have to use concurrency, then none of MVars, atomicModifyIORef or STM care about memory models either. So the memory model only becomes visible when you use concurrency with shared IORefs (without atomicModifyIORef) or bare peek/poke, which is pretty rare and easily avoided. Cheers, Simon

At Mon, 16 May 2011 22:31:14 +0100, Simon Marlow wrote:
Good example - so it looks like we don't get full sequential consistency on x86 (actually I'd been thinking only about write ordering and forgetting that reads could be reordered around writes).
But that's bad because it means Haskell has a memory model, and we have to say what it is, or at least say that ordering is undefined.
Right. So I think the memory model is something along the lines of the no-crash property you mentioned--i.e., readIORef will return some value written with writeIORef and not a mish-mash of multiple writes--combined with the model of the underlying hardware. Maybe atomicModifyIORef should serve as a barrier, too.
But I've never heard anyone claim that a prerequisite to Haskell being useful as a parallel programming language is a well-defined memory model. I think there's a couple of reasons for that:
- deterministic parallel programming models (e.g. Strategies, monad-par) don't care about memory models. These are the first port of call for parallel programming.
Okay, well, I make this claim as a C/C++ programmer more used to writing low-level/kernel code than functional code. So I'm thinking less of things like deterministic scientific codes and more along the lines of network servers processing lots of messages and other asynchronous events happening in a non-deterministic order anyway. I think several programming patterns would be useful in Haskell that require some understanding of the memory model. One that particularly jumps to mind is the read-copy-update (RCU) pattern for frequently accessed but seldom updated data (configuration state, routing tables, etc.) As you've described them, IORefs are well suited to such a pattern because reads are very cheap and updates happen through an atomic pointer write. But if the documentation doesn't say that readIORef is supposed to be cheap (or imply so by mentioning that readIORef exposes the underlying hardware's memory consistency), then there's no way to tell that IORefs are suitable for RCU, so people may think they have to do something uglier using peek and poke.
- If you have to use concurrency, then none of MVars, atomicModifyIORef or STM care about memory models either.
So the memory model only becomes visible when you use concurrency with shared IORefs (without atomicModifyIORef) or bare peek/poke, which is pretty rare and easily avoided.
Actually: http://hackage.haskell.org/packages/archive/base/latest/doc/html/Control-Con... There's nothing in the documentation for MVars that says anything about sequential consistency. If you take my example from the previous email, replace writeIORef with (\p v -> modifyMVar_ p $ return v), replace all other occurrences of IORef with MVar, nothing in the docs suggests you won't see the "critical section" message printed twice. Presumably modifyMVar must take a spinlock. Moreover, to be correct on x86, the spinlock must execute an LFENCE after acquiring the lock and an SFENCE prior to releasing the lock. But does readMVar acquire the spinlock, or is it optimized to take advantage of pointer-sized writes being atomic? One can argue that an optimized readMVar is better, because you can always force serialization with modifyMVar. Or one can argue that, for consistency, readMVar should be identical to a takeMVar followed by a putMVar, since people can use IORefs for less deterministic behavior. The latter is certainly what the *current* code does, but until the behavior is documented, I'd worry about some later version of ghc optimizing readMVar and changing the consistency. Systems have memory models for a reason; you can't get away from them entirely for all applications. Haskell's strength, I think, is in making sure that 99+% of code can't possibly depend on the memory model. For functional and ST code, you don't even need to look at the code to know that this is true--safety is guaranteed by the types (modulo some unsafe stuff I hope will be even easier to detect with ghc 7.2...). But for that last little bit of tricky code, the best you can do is specify the behavior of the building blocks and maybe provide some useful architecture-independent wrappers for things (e.g., abstracted memory barriers). David

On 17/05/2011 00:44, dm-list-haskell-cafe@scs.stanford.edu wrote:
But I've never heard anyone claim that a prerequisite to Haskell being useful as a parallel programming language is a well-defined memory model. I think there's a couple of reasons for that:
- deterministic parallel programming models (e.g. Strategies, monad-par) don't care about memory models. These are the first port of call for parallel programming.
Okay, well, I make this claim as a C/C++ programmer more used to writing low-level/kernel code than functional code. So I'm thinking less of things like deterministic scientific codes and more along the lines of network servers processing lots of messages and other asynchronous events happening in a non-deterministic order anyway.
I think several programming patterns would be useful in Haskell that require some understanding of the memory model. One that particularly jumps to mind is the read-copy-update (RCU) pattern for frequently accessed but seldom updated data (configuration state, routing tables, etc.)
As you've described them, IORefs are well suited to such a pattern because reads are very cheap and updates happen through an atomic pointer write. But if the documentation doesn't say that readIORef is supposed to be cheap (or imply so by mentioning that readIORef exposes the underlying hardware's memory consistency), then there's no way to tell that IORefs are suitable for RCU, so people may think they have to do something uglier using peek and poke.
Ok. I'm not sure how feasible RCU is with IORefs, or even whether it's necessary. After all, the usual pattern of having readers use readIORef while writers use atomicModifyIORef gives the RCU cost model (zero overhead for readers, expensive writes) with far less complexity. Garbage collection does the job of reclamation automatically. Have I missed something here? A slight improvement over this scheme is to use TVar with readTVarIO for the readers (no transaction involved), and transactions for the writers. This greatly increases the scope of what a writer can do, since they can perform an update on a bunch of state at the same time. The situation is complicated somewhat by generational garbage collection, which can create weird performance artifacts when mutation is involved.
Actually:
http://hackage.haskell.org/packages/archive/base/latest/doc/html/Control-Con...
There's nothing in the documentation for MVars that says anything about sequential consistency.
That's true, though I think most readers would assume sequential consistency in the absence of any statements to the contrary (obviously you are a counter example ;-).
If you take my example from the previous email, replace writeIORef with (\p v -> modifyMVar_ p $ return v), replace all other occurrences of IORef with MVar, nothing in the docs suggests you won't see the "critical section" message printed twice.
There is an operational semantics in the Concurrent Haskell paper that does not admit the behaviour you describe, but I'll add something to the docs to that effect.
Presumably modifyMVar must take a spinlock. Moreover, to be correct on x86, the spinlock must execute an LFENCE after acquiring the lock and an SFENCE prior to releasing the lock. But does readMVar acquire the spinlock, or is it optimized to take advantage of pointer-sized writes being atomic?
That's a good point - readMVar cannot be optimised to avoid the lock. In fact, readMVar is just readMVar m = do x <- takeMVar m; putMVar m x; return x although there have been suggestions that we should make it atomic. If we were to do so, it would still have to use a barrier to avoid reordering.
Systems have memory models for a reason; you can't get away from them entirely for all applications. Haskell's strength, I think, is in making sure that 99+% of code can't possibly depend on the memory model. For functional and ST code, you don't even need to look at the code to know that this is true--safety is guaranteed by the types (modulo some unsafe stuff I hope will be even easier to detect with ghc 7.2...). But for that last little bit of tricky code, the best you can do is specify the behavior of the building blocks and maybe provide some useful architecture-independent wrappers for things (e.g., abstracted memory barriers).
Agree 100%. Cheers, Simon

At Wed, 18 May 2011 09:56:22 +0100, Simon Marlow wrote:
Ok. I'm not sure how feasible RCU is with IORefs, or even whether it's necessary. After all, the usual pattern of having readers use readIORef while writers use atomicModifyIORef gives the RCU cost model (zero overhead for readers, expensive writes) with far less complexity. Garbage collection does the job of reclamation automatically. Have I missed something here?
Right, that's what I was calling RCU. Usually the hard part in RCU is the garbage collection. Obviously if you needed to do something else like close a file handle, then IORefs are not sufficient. But for a lot of applications of RCU, IORefs plus garbage collection should be sufficient.
A slight improvement over this scheme is to use TVar with readTVarIO for the readers (no transaction involved), and transactions for the writers. This greatly increases the scope of what a writer can do, since they can perform an update on a bunch of state at the same time.
Good point.
There is an operational semantics in the Concurrent Haskell paper that does not admit the behaviour you describe, but I'll add something to the docs to that effect.
Ah, you got me. I probably should have looked at that paper, which is linked to from Control.Concurrent. Still, in some cases (not necessarily here), papers are static and code continues to evolve, so it's nice to stuff documented in haddock as well.
That's a good point - readMVar cannot be optimised to avoid the lock. In fact, readMVar is just
readMVar m = do x <- takeMVar m; putMVar m x; return x
although there have been suggestions that we should make it atomic. If we were to do so, it would still have to use a barrier to avoid reordering.
What would be even cooler would be if swapMVar could be made atomic. Or better yet, if there could be a compareAndSwapMVar, since on some architectures (though not x86) that could be a single instruction and allow for truly wait-free data types. (That might not be possible without sacrificing referential transparency, since the obvious implementation would involve comparing pointers rather than values.) David

On 16 May 2011 21:31,
At Mon, 16 May 2011 10:56:02 +0100, Simon Marlow wrote:
Yes, it's not actually documented as far as I know, and we should fix that. But if you think about it, sequential consistency is really the only sensible policy: suppose one processor creates a heap object and writes a reference to it in the IORef, then another processor reads the IORef. The writes that created the heap object must be visible to the second processor, otherwise it will encounter uninitialised memory and crash. So sequential consistency is necessary to ensure concurrent programs can't crash.
Now perhaps it's possible to have a relaxed memory model that provides the no-crashes guarantee but still allows IORef writes to be reordered (e.g. some kind of causal consistency). That might be important if there is some processor arcitecture that provides that memory model, but as far as I know there isn't.
Actually, in your heap object example, it sounds like you only really care about preserving program order, rather than write atomicity. Thus, you can get away with less-than-sequential consistency and not crash.
The x86 is an example of a relaxed memory model that provides the no-crashes guarantee you are talking about. Specifically, the x86 deviates from sequential consistency in two ways
1. A load can finish before an earlier store to a different memory location. [intel, Sec. 8.2.3.4]
2. A thread can read its own writes early. [intel, 8.2.3.5]
[Section references are to the intel architecture manual, vol 3a: http://www.intel.com/Assets/PDF/manual/253668.pdf]
One could imagine an implementation of IORefs that relies on the fact that pointer writes are atomic and that program order is preserved to avoid mutex overhead for most calls. E.g.:
struct IORef { spinlock_t lock; /* Only ever used by atomicModifyIORef */ HaskellValue *val; /* Updated atomically because pointer-sized writes are atomic */ };
HaskellValue * readIORef (struct IORef *ref) { return ref->val; }
void writeIORef (struct IORef *ref, HaskellValue *val) { /* Note that if *val was initialized in the same thread, then by * the time another CPU sees ref->val, it will also see the * correct contents of *ref->val, because stores are seen in a * consistent order by other processors [intel, Sec. 8.2.3.7]. * * If *val was initialized in a different thread, then since this * thread has seen it, other threads will too, because x86 * guarantees stores are transitively visible [intel, Sec. 8.2.3.6]. */ ref->val = val; }
/* modifyIORef is built out of readIORef and writeIORef */
HaskellValue * atomicModifyIORef (Struct IORef *ref, HaskellFunction *f) { HaskellValue *result; spinlock_acquire (&ref->lock);
result = modifyIORef (ref, f);
spinlock_release (&ref->lock); return result; }
This is actually how I assumed IORefs worked. But then consider the following program:
maybePrint :: IORef Bool -> IORef Bool -> IO () maybePrint myRef yourRef = do writeIORef myRef True yourVal <- readIORef yourRef unless yourVal $ putStrLn "critical section"
main :: IO () main = do r1 <- newIORef False r2 <- newIORef False forkOS $ maybePrint r1 r2 forkOS $ maybePrint r2 r1 threadDelay 1000000
Under sequential consistency, the string "critical section" should be output at most once. However, with the above IORef implementation on x86, since a read can finish before a write to a different location, both threads might see False for the value of yourVal.
To prevent this deviation from sequential consistency, you would need to do something like stick an MFENCE instruction at the end of writeIORef, and that would slow down the common case where you don't care about sequential consistency. In fact, I would argue that if you care about S.C., you should either be using atomicModifyIORef or MVars.
mfence is apparently slower than lock add. see http://blogs.oracle.com/dave/entry/instruction_selection_for_volatile_fences so using mfence would make it slower than atomicModifyIORef, and with weaker guarantees. not a good combination. Alexander
Can you explain what actually happens inside the real IORef implementation?
As an aside, these days one sees a lot of hand-wringing over the fact that CPU clock rates have been flat for a while and the only way to get more performance is through parallelism. "How are we going to teach programmers to write concurrent code when it's so hard to write and debug?" I've heard numerous people ask.
Haskell could be a major step in the right direction, since in the absence of variables, it's impossible to have data races. (You can still have deadlock and other kinds of race condition, such as the one in maybePrint above, if you had my definition of IORef, but data races are by far the most pernicious concurrency problems.) Of course, the key to making Haskell useful in a parallel setting is that things like the memory model have to be fully specified...
Thanks, David
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
participants (4)
-
Alexander Kjeldaas
-
Bernie Pope
-
dm-list-haskell-cafe@scs.stanford.edu
-
Simon Marlow