Lazy ST vs concurrency

newer
More history in perf.haskell.org...

David Feuer

30 Jan 2017 30 Jan '17

4:18 p.m.

I forgot to CC ghc-devs the first time, so here's another copy. I was working on #11760 this weekend, which has to do with concurrency breaking lazy ST. I came up with what I thought was a pretty decent solution ( https://phabricator.haskell.org/D3038 ). Simon Peyton Jones, however, is quite unhappy about the idea of sticking this weird unsafePerformIO-like code (noDup, which I originally implemented as (unsafePerformIO . evaluate), but which he finds ugly regardless of the details) into fmap and (>>=). He's also concerned that the noDuplicate# applications will kill performance in the multi-threaded case, and suggests he would rather leave lazy ST broken, or even remove it altogether, than use a fix that will make it slow sometimes, particularly since there haven't been a lot of reports of problems in the wild. My view is that leaving it broken, even if it only causes trouble occasionally, is simply not an option. If users can't rely on it to always give correct answers, then it's effectively useless. And for the sake of backwards compatibility, I think it's a lot better to keep it around, even if it runs slowly multithreaded, than to remove it altogether. Note to Simon PJ: Yes, it's ugly to stick that noDup in there. But lazy ST has always been a bit of deep magic. You can't *really* carry a moment of time around in your pocket and make its history happen only if necessary. We can make it work in GHC because its execution model is entirely based around graph reduction, so evaluation is capable of driving execution. Whereas lazy IO is extremely tricky because it causes effects observable in the real world, lazy ST is only *moderately* tricky, causing effects that we have to make sure don't lead to weird interactions between threads. I don't think it's terribly surprising that it needs to do a few more weird things to work properly. David

Show replies by date

Reid Barton

30 Jan 30 Jan

6:50 p.m.

I wrote a lazy ST microbenchmark (http://lpaste.net/351799) that uses nothing but lazy ST monad operations in the inner loop. With various caveats, it took around 3 times as long to run under +RTS -N2 after applying https://phabricator.haskell.org/D3038. The biggest caveat is that the cost of the `threadPaused` in `noDuplicate#` seems to be potentially proportional to the thread's stack depth, and I'm not sure how representative my microbenchmark is in that regard. I'm actually surprised the `noDuplicate#` version isn't an order of magnitude or so slower than that. Still, a 3x factor is a large price to pay. I don't yet understand what's going on here clearly enough to be sure that the `noDuplicate#` is necessary, or that we can't implement `noDuplicate#` more cheaply in the common case of no contention. My feeling is that if it turns out that we can implement the correct behavior cheaply, then it will be better to have left it broken for a little while than to first have a correct but slow implementation and then later replaced it with a correct and fast implementation. The latter is disruptive to two groups of people, those who are affected by the bug and also those who cannot afford to have their lazy ST code run 3 times slower; of which the former group is affected already, and we can advertise the existence of the bug until we have a workable solution. So I'm reluctant to go down this `noDuplicate#` path until we have exhausted our other options. In an ideal world with no users, it would be better to start with correct but slow, of course. Regards, Reid Barton On Mon, Jan 30, 2017 at 11:18 AM, David Feuer wrote:

...

I forgot to CC ghc-devs the first time, so here's another copy.

I was working on #11760 this weekend, which has to do with concurrency breaking lazy ST. I came up with what I thought was a pretty decent solution ( https://phabricator.haskell.org/D3038 ). Simon Peyton Jones, however, is quite unhappy about the idea of sticking this weird unsafePerformIO-like code (noDup, which I originally implemented as (unsafePerformIO . evaluate), but which he finds ugly regardless of the details) into fmap and (>>=). He's also concerned that the noDuplicate# applications will kill performance in the multi-threaded case, and suggests he would rather leave lazy ST broken, or even remove it altogether, than use a fix that will make it slow sometimes, particularly since there haven't been a lot of reports of problems in the wild.

My view is that leaving it broken, even if it only causes trouble occasionally, is simply not an option. If users can't rely on it to always give correct answers, then it's effectively useless. And for the sake of backwards compatibility, I think it's a lot better to keep it around, even if it runs slowly multithreaded, than to remove it altogether.

Note to Simon PJ: Yes, it's ugly to stick that noDup in there. But lazy ST has always been a bit of deep magic. You can't *really* carry a moment of time around in your pocket and make its history happen only if necessary. We can make it work in GHC because its execution model is entirely based around graph reduction, so evaluation is capable of driving execution. Whereas lazy IO is extremely tricky because it causes effects observable in the real world, lazy ST is only *moderately* tricky, causing effects that we have to make sure don't lead to weird interactions between threads. I don't think it's terribly surprising that it needs to do a few more weird things to work properly.

David _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

David Feuer

7:29 p.m.

On Monday, January 30, 2017 1:50:29 PM EST Reid Barton wrote:

...

I wrote a lazy ST microbenchmark (http://lpaste.net/351799) that uses nothing but lazy ST monad operations in the inner loop.

This benchmark doesn't really look like code I'd expect people to use in practice. Normally, they're using lazy ST because they actually need to use STRefs or STArrays in the loop! I suspect your test case is likely about the worst possible slowdown for the fix. And 3x slowdown in lazy ST doesn't necessarily translate to a 3x slowdown in an application using it. David

Simon Marlow

9:50 p.m.

On 30 January 2017 at 16:18, David Feuer wrote:

...

I forgot to CC ghc-devs the first time, so here's another copy.

I was working on #11760 this weekend, which has to do with concurrency breaking lazy ST. I came up with what I thought was a pretty decent solution ( https://phabricator.haskell.org/D3038 ). Simon Peyton Jones, however, is quite unhappy about the idea of sticking this weird unsafePerformIO-like code (noDup, which I originally implemented as (unsafePerformIO . evaluate), but which he finds ugly regardless of the details) into fmap and (>>=). He's also concerned that the noDuplicate# applications will kill performance in the multi-threaded case, and suggests he would rather leave lazy ST broken, or even remove it altogether, than use a fix that will make it slow sometimes, particularly since there haven't been a lot of reports of problems in the wild.

In a nutshell, I think we have to fix this despite the cost - the implementation is incorrect and unsafe. Unfortunately the mechanisms we have right now to fix it aren't ideal - noDuplicate# is a bigger hammer than we need. All we really need is some way to make a thunk atomic, it would require some special entry code to the thunk which did atomic eager-blackholing. Hmm, now that I think about it, perhaps we could just have a flag, -fatomic-eager-blackholing. We already do this for CAFs, incidentally. The idea is to compare-and-swap the blackhole info pointer into the thunk, and if we didn't win the race, just re-enter the thunk (which is now a blackhole). We already have the cmpxchg MachOp, so It shouldn't be more than a few lines in the code generator to implement it. It would be too expensive to do by default, but doing it just for Control.Monad.ST.Lazy should be ok and would fix the unsafety. (I haven't really thought this through, just an idea off the top of my head, so there could well be something I'm overlooking here...) Cheers Simon

...

My view is that leaving it broken, even if it only causes trouble occasionally, is simply not an option. If users can't rely on it to always give correct answers, then it's effectively useless. And for the sake of backwards compatibility, I think it's a lot better to keep it around, even if it runs slowly multithreaded, than to remove it altogether.

Note to Simon PJ: Yes, it's ugly to stick that noDup in there. But lazy ST has always been a bit of deep magic. You can't *really* carry a moment of time around in your pocket and make its history happen only if necessary. We can make it work in GHC because its execution model is entirely based around graph reduction, so evaluation is capable of driving execution. Whereas lazy IO is extremely tricky because it causes effects observable in the real world, lazy ST is only *moderately* tricky, causing effects that we have to make sure don't lead to weird interactions between threads. I don't think it's terribly surprising that it needs to do a few more weird things to work properly.

David

David Feuer

10:25 p.m.

On Monday, January 30, 2017 9:50:56 PM EST Simon Marlow wrote:

...

Unfortunately the mechanisms we have right now to fix it aren't ideal - noDuplicate# is a bigger hammer than we need.

Do you think you could explain this a bit more? What aspect of nuDuplicate# is overkill? What does it guard against that can't happen here?

...

All we really need is some way to make a thunk atomic, it would require some special entry code to the thunk which did atomic eager-blackholing. Hmm, now that I think about it, perhaps we could just have a flag, -fatomic-eager-blackholing.

If it's possible to use a primop to do this "locally", I think it would be very nice to get that as well as a global flag. If it affects code generation in an inherently global fashion, then of course we'll just have to live with that, and lots of NOINLINE. David

Ben Gamari

11:06 p.m.

David Feuer writes:

...

On Monday, January 30, 2017 9:50:56 PM EST Simon Marlow wrote:

...
Unfortunately the mechanisms we have right now to fix it aren't ideal - noDuplicate# is a bigger hammer than we need.

Do you think you could explain this a bit more? What aspect of nuDuplicate# is overkill? What does it guard against that can't happen here?

I suspect Simon is referring to the fact that noDuplicate# actually needs to call back into the RTS to claim ownership over all thunks on the stack before it can proceed.

...

...
All we really need is some way to make a thunk atomic, it would require some special entry code to the thunk which did atomic eager-blackholing. Hmm, now that I think about it, perhaps we could just have a flag, -fatomic-eager-blackholing.

Indeed this sounds quite reasonable.

...

If it's possible to use a primop to do this "locally", I think it would be very nice to get that as well as a global flag. If it affects code generation in an inherently global fashion, then of course we'll just have to live with that, and lots of NOINLINE.

I guess something like, eagerlyBlackhole :: a -> a Which would be lowered to a bit of code which would do eagerly blackhole the thunk before entering it? Cheers, - Ben

Simon Peyton Jones

10:56 p.m.

We don’t want to do this on a per-module basis do we, as -fatomic-eager-blackholing would suggest. Rather, on per-thunk basis, no? Which thunks, precisely? I think perhaps precisely thunks one of whose free variables has type (Sttate# s) for some s. These are thunks that consume a state token, and must do so no more than once. If entering such thunks was atomic, could we kill off noDuplicate#? I still don’t understand exactly what noDuplicate# does, what problem it solves, and how the problem it solves relates to this LazyST problem. We need some kind of fix for 8.2. Simon what do you suggest? Simon From: Simon Marlow [mailto:marlowsd@gmail.com] Sent: 30 January 2017 21:51 To: David Feuer Cc: Simon Peyton Jones ; ghc-devs@haskell.org Subject: Re: Lazy ST vs concurrency On 30 January 2017 at 16:18, David Feuer mailto:david@well-typed.com> wrote: I forgot to CC ghc-devs the first time, so here's another copy. I was working on #11760 this weekend, which has to do with concurrency breaking lazy ST. I came up with what I thought was a pretty decent solution ( https://phabricator.haskell.org/D3038 ). Simon Peyton Jones, however, is quite unhappy about the idea of sticking this weird unsafePerformIO-like code (noDup, which I originally implemented as (unsafePerformIO . evaluate), but which he finds ugly regardless of the details) into fmap and (>>=). He's also concerned that the noDuplicate# applications will kill performance in the multi-threaded case, and suggests he would rather leave lazy ST broken, or even remove it altogether, than use a fix that will make it slow sometimes, particularly since there haven't been a lot of reports of problems in the wild. In a nutshell, I think we have to fix this despite the cost - the implementation is incorrect and unsafe. Unfortunately the mechanisms we have right now to fix it aren't ideal - noDuplicate# is a bigger hammer than we need. All we really need is some way to make a thunk atomic, it would require some special entry code to the thunk which did atomic eager-blackholing. Hmm, now that I think about it, perhaps we could just have a flag, -fatomic-eager-blackholing. We already do this for CAFs, incidentally. The idea is to compare-and-swap the blackhole info pointer into the thunk, and if we didn't win the race, just re-enter the thunk (which is now a blackhole). We already have the cmpxchg MachOp, so It shouldn't be more than a few lines in the code generator to implement it. It would be too expensive to do by default, but doing it just for Control.Monad.ST.Lazy should be ok and would fix the unsafety. (I haven't really thought this through, just an idea off the top of my head, so there could well be something I'm overlooking here...) Cheers Simon My view is that leaving it broken, even if it only causes trouble occasionally, is simply not an option. If users can't rely on it to always give correct answers, then it's effectively useless. And for the sake of backwards compatibility, I think it's a lot better to keep it around, even if it runs slowly multithreaded, than to remove it altogether. Note to Simon PJ: Yes, it's ugly to stick that noDup in there. But lazy ST has always been a bit of deep magic. You can't *really* carry a moment of time around in your pocket and make its history happen only if necessary. We can make it work in GHC because its execution model is entirely based around graph reduction, so evaluation is capable of driving execution. Whereas lazy IO is extremely tricky because it causes effects observable in the real world, lazy ST is only *moderately* tricky, causing effects that we have to make sure don't lead to weird interactions between threads. I don't think it's terribly surprising that it needs to do a few more weird things to work properly. David

Simon Marlow

31 Jan 31 Jan

8:59 a.m.

On 30 January 2017 at 22:56, Simon Peyton Jones wrote:

...

We don’t want to do this on a per-module basis do we, as -fatomic-eager-blackholing would suggest. Rather, on per-thunk basis, no? Which thunks, precisely? I think perhaps *precisely thunks one of whose free variables has type (Sttate# s) for some s.* These are thunks that consume a state token, and must do so no more than once.

...

If entering such thunks was atomic, could we kill off noDuplicate#?

I still don’t understand exactly what noDuplicate# does, what problem it solves, and how the problem it solves relates to this LazyST problem.

Back in our "Haskell on a Shared Memory Multiprocessor" paper ( http://simonmar.github.io/bib/papers/multiproc.pdf) we described a scheme to try to avoid duplication of work when multiple cores evaluate the same thunk. This is normally applied lazily, because it involves walking the stack and atomically black-holing thunks pointed to by update frames. The noDuplicate# primop just invokes the stack walk immediately; the idea is to try to prevent multiple threads from evaluating a thunk containing unsafePerformIO. It's expensive. It's also not foolproof, because if you already happened to create two copies of the unsafePerformIO thunk then noDuplicate# can't help. I've never really liked it for these reasons, but I don't know a better way. We have unsafeDupablePerformIO that doesn't call noDuplicate#, and the programmer can use when the unsafePerformIO can safely be executed multiple times.

...

We need some kind of fix for 8.2. Simon what do you suggest?

David's current fix would be OK (along with a clear notice in the release notes etc. to note that the implementation got slower). I think -fatomic-eager-blackholing might "fix" it with less overhead, though. Ben's suggestion:

...

eagerlyBlackhole :: a -> a

is likely to be unreliable I think. We lack the control in the source language to tie it to a particular thunk. Cheers Simon

...

Simon

*From:* Simon Marlow [mailto:marlowsd@gmail.com] *Sent:* 30 January 2017 21:51 *To:* David Feuer *Cc:* Simon Peyton Jones ; ghc-devs@haskell.org *Subject:* Re: Lazy ST vs concurrency

On 30 January 2017 at 16:18, David Feuer wrote:

I forgot to CC ghc-devs the first time, so here's another copy.

I was working on #11760 this weekend, which has to do with concurrency breaking lazy ST. I came up with what I thought was a pretty decent solution ( https://phabricator.haskell.org/D3038 ). Simon Peyton Jones, however, is quite unhappy about the idea of sticking this weird unsafePerformIO-like code (noDup, which I originally implemented as (unsafePerformIO . evaluate), but which he finds ugly regardless of the details) into fmap and (>>=). He's also concerned that the noDuplicate# applications will kill performance in the multi-threaded case, and suggests he would rather leave lazy ST broken, or even remove it altogether, than use a fix that will make it slow sometimes, particularly since there haven't been a lot of reports of problems in the wild.

In a nutshell, I think we have to fix this despite the cost - the implementation is incorrect and unsafe.

Unfortunately the mechanisms we have right now to fix it aren't ideal - noDuplicate# is a bigger hammer than we need. All we really need is some way to make a thunk atomic, it would require some special entry code to the thunk which did atomic eager-blackholing. Hmm, now that I think about it, perhaps we could just have a flag, -fatomic-eager-blackholing. We already do this for CAFs, incidentally. The idea is to compare-and-swap the blackhole info pointer into the thunk, and if we didn't win the race, just re-enter the thunk (which is now a blackhole). We already have the cmpxchg MachOp, so It shouldn't be more than a few lines in the code generator to implement it. It would be too expensive to do by default, but doing it just for Control.Monad.ST.Lazy should be ok and would fix the unsafety.

(I haven't really thought this through, just an idea off the top of my head, so there could well be something I'm overlooking here...)

Cheers

Simon

My view is that leaving it broken, even if it only causes trouble occasionally, is simply not an option. If users can't rely on it to always give correct answers, then it's effectively useless. And for the sake of backwards compatibility, I think it's a lot better to keep it around, even if it runs slowly multithreaded, than to remove it altogether.

Note to Simon PJ: Yes, it's ugly to stick that noDup in there. But lazy ST has always been a bit of deep magic. You can't *really* carry a moment of time around in your pocket and make its history happen only if necessary. We can make it work in GHC because its execution model is entirely based around graph reduction, so evaluation is capable of driving execution. Whereas lazy IO is extremely tricky because it causes effects observable in the real world, lazy ST is only *moderately* tricky, causing effects that we have to make sure don't lead to weird interactions between threads. I don't think it's terribly surprising that it needs to do a few more weird things to work properly.

David

Simon Peyton Jones

9:11 a.m.

If we could identify exactly the thunks we wanted to be atomic, then yes, that would be better than a whole-module solution. However I'm not sure how to do that - doing it on the basis of a free variable with State# type doesn't work if the State# is buried in a data structure or a function closure, for instance. I disagree. Having a free State# variable is precisely necessary and sufficient, I claim. Can you provide a counter-example? Informal proof: · The model is that a value of type (State# t) is a linear value that we mutate in-place. We must not consume it twice. · Evaluating a thunk that has a free (State# t) variable is precisely “consuming” it. So we should only do that once. I think -fatomic-eager-blackholing might "fix" it with less overhead, though But precisely where would you have to use that flag? Inlining could meant that the code appears anywhere! Once we have the ability to atomically-blackhole a thunk, we can just use my criterion above, I claim. Stopgap story for 8.2. I am far from convinced that putting unsafePerformIO in the impl of (>>=) for the ST monad will be correct; but if you tell me it is, and if it is surrounded with huge banners saying that this is the wrong solution, and pointing to a new ticket to fix it, then OK. Simon From: Simon Marlow [mailto:marlowsd@gmail.com] Sent: 31 January 2017 08:59 To: Simon Peyton Jones Cc: David Feuer ; ghc-devs@haskell.org Subject: Re: Lazy ST vs concurrency On 30 January 2017 at 22:56, Simon Peyton Jones mailto:simonpj@microsoft.com> wrote: We don’t want to do this on a per-module basis do we, as -fatomic-eager-blackholing would suggest. Rather, on per-thunk basis, no? Which thunks, precisely? I think perhaps precisely thunks one of whose free variables has type (Sttate# s) for some s. These are thunks that consume a state token, and must do so no more than once. If we could identify exactly the thunks we wanted to be atomic, then yes, that would be better than a whole-module solution. However I'm not sure how to do that - doing it on the basis of a free variable with State# type doesn't work if the State# is buried in a data structure or a function closure, for instance. If entering such thunks was atomic, could we kill off noDuplicate#? I still don’t understand exactly what noDuplicate# does, what problem it solves, and how the problem it solves relates to this LazyST problem. Back in our "Haskell on a Shared Memory Multiprocessor" paper (http://simonmar.github.io/bib/papers/multiproc.pdf https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fsimonmar.github.io%2Fbib%2Fpapers%2Fmultiproc.pdf&data=02%7C01%7Csimonpj%40microsoft.com%7C49b93aee78394d54fcab08d449b76706%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C1%7C636214499439419212&sdata=81aU2TCVDxdNFl7CIHd8GxWUUdmUn%2FdRO4bOi2ScpVw%3D&reserved=0) we described a scheme to try to avoid duplication of work when multiple cores evaluate the same thunk. This is normally applied lazily, because it involves walking the stack and atomically black-holing thunks pointed to by update frames. The noDuplicate# primop just invokes the stack walk immediately; the idea is to try to prevent multiple threads from evaluating a thunk containing unsafePerformIO. It's expensive. It's also not foolproof, because if you already happened to create two copies of the unsafePerformIO thunk then noDuplicate# can't help. I've never really liked it for these reasons, but I don't know a better way. We have unsafeDupablePerformIO that doesn't call noDuplicate#, and the programmer can use when the unsafePerformIO can safely be executed multiple times. We need some kind of fix for 8.2. Simon what do you suggest? David's current fix would be OK (along with a clear notice in the release notes etc. to note that the implementation got slower). I think -fatomic-eager-blackholing might "fix" it with less overhead, though. Ben's suggestion:

...

eagerlyBlackhole :: a -> a

is likely to be unreliable I think. We lack the control in the source language to tie it to a particular thunk. Cheers Simon Simon From: Simon Marlow [mailto:marlowsd@gmail.commailto:marlowsd@gmail.com] Sent: 30 January 2017 21:51 To: David Feuer mailto:david@well-typed.com> Cc: Simon Peyton Jones mailto:simonpj@microsoft.com>; ghc-devs@haskell.orgmailto:ghc-devs@haskell.org Subject: Re: Lazy ST vs concurrency On 30 January 2017 at 16:18, David Feuer mailto:david@well-typed.com> wrote: I forgot to CC ghc-devs the first time, so here's another copy. I was working on #11760 this weekend, which has to do with concurrency breaking lazy ST. I came up with what I thought was a pretty decent solution ( https://phabricator.haskell.org/D3038 ). Simon Peyton Jones, however, is quite unhappy about the idea of sticking this weird unsafePerformIO-like code (noDup, which I originally implemented as (unsafePerformIO . evaluate), but which he finds ugly regardless of the details) into fmap and (>>=). He's also concerned that the noDuplicate# applications will kill performance in the multi-threaded case, and suggests he would rather leave lazy ST broken, or even remove it altogether, than use a fix that will make it slow sometimes, particularly since there haven't been a lot of reports of problems in the wild. In a nutshell, I think we have to fix this despite the cost - the implementation is incorrect and unsafe. Unfortunately the mechanisms we have right now to fix it aren't ideal - noDuplicate# is a bigger hammer than we need. All we really need is some way to make a thunk atomic, it would require some special entry code to the thunk which did atomic eager-blackholing. Hmm, now that I think about it, perhaps we could just have a flag, -fatomic-eager-blackholing. We already do this for CAFs, incidentally. The idea is to compare-and-swap the blackhole info pointer into the thunk, and if we didn't win the race, just re-enter the thunk (which is now a blackhole). We already have the cmpxchg MachOp, so It shouldn't be more than a few lines in the code generator to implement it. It would be too expensive to do by default, but doing it just for Control.Monad.ST.Lazy should be ok and would fix the unsafety. (I haven't really thought this through, just an idea off the top of my head, so there could well be something I'm overlooking here...) Cheers Simon My view is that leaving it broken, even if it only causes trouble occasionally, is simply not an option. If users can't rely on it to always give correct answers, then it's effectively useless. And for the sake of backwards compatibility, I think it's a lot better to keep it around, even if it runs slowly multithreaded, than to remove it altogether. Note to Simon PJ: Yes, it's ugly to stick that noDup in there. But lazy ST has always been a bit of deep magic. You can't *really* carry a moment of time around in your pocket and make its history happen only if necessary. We can make it work in GHC because its execution model is entirely based around graph reduction, so evaluation is capable of driving execution. Whereas lazy IO is extremely tricky because it causes effects observable in the real world, lazy ST is only *moderately* tricky, causing effects that we have to make sure don't lead to weird interactions between threads. I don't think it's terribly surprising that it needs to do a few more weird things to work properly. David

Simon Marlow

9:25 a.m.

On 31 January 2017 at 09:11, Simon Peyton Jones wrote:

...

If we could identify exactly the thunks we wanted to be atomic, then yes, that would be better than a whole-module solution. However I'm not sure how to do that - doing it on the basis of a free variable with State# type doesn't work if the State# is buried in a data structure or a function closure, for instance.

I disagree. Having a free State# variable is precisely necessary and sufficient, I claim. Can you provide a counter-example?

Sure, what I had in mind is something like this, defining a local unsafePerformIO: \(s :: State# s) -> let unsafePerformIO = \g -> g s thunk = unsafePerformIO (\s -> ... ) in ... and "thunk" doesn't have a free variable of type State#. Cheers Simon

...

Informal proof:

· The model is that a value of type (State# t) is a linear value that we mutate in-place. We must not consume it twice.

· Evaluating a thunk that has a free (State# t) variable is precisely “consuming” it. So we should only do that once

...

I think -fatomic-eager-blackholing might "fix" it with less overhead, though

But precisely where would you have to use that flag? Inlining could meant that the code appears anywhere! Once we have the ability to atomically-blackhole a thunk, we can just use my criterion above, I claim.

Stopgap story for 8.2. I am far from convinced that putting unsafePerformIO in the impl of (>>=) for the ST monad will be correct; but if you tell me it is, and if it is surrounded with huge banners saying that this is the wrong solution, and pointing to a new ticket to fix it, then OK.

Arguably this isn't all that urgent, given that it's been broken for 8 years or so.

...

Simon

*From:* Simon Marlow [mailto:marlowsd@gmail.com] *Sent:* 31 January 2017 08:59 *To:* Simon Peyton Jones *Cc:* David Feuer ; ghc-devs@haskell.org

*Subject:* Re: Lazy ST vs concurrency

On 30 January 2017 at 22:56, Simon Peyton Jones wrote:

We don’t want to do this on a per-module basis do we, as -fatomic-eager-blackholing would suggest. Rather, on per-thunk basis, no? Which thunks, precisely? I think perhaps *precisely thunks one of whose free variables has type (Sttate# s) for some s.* These are thunks that consume a state token, and must do so no more than once.

If we could identify exactly the thunks we wanted to be atomic, then yes, that would be better than a whole-module solution. However I'm not sure how to do that - doing it on the basis of a free variable with State# type doesn't work if the State# is buried in a data structure or a function closure, for instance.

If entering such thunks was atomic, could we kill off noDuplicate#?

I still don’t understand exactly what noDuplicate# does, what problem it solves, and how the problem it solves relates to this LazyST problem.

Back in our "Haskell on a Shared Memory Multiprocessor" paper ( http://simonmar.github.io/bib/papers/multiproc.pdf https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fsimonmar.github.io%2Fbib%2Fpapers%2Fmultiproc.pdf&data=02%7C01%7Csimonpj%40microsoft.com%7C49b93aee78394d54fcab08d449b76706%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C1%7C636214499439419212&sdata=81aU2TCVDxdNFl7CIHd8GxWUUdmUn%2FdRO4bOi2ScpVw%3D&reserved=0) we described a scheme to try to avoid duplication of work when multiple cores evaluate the same thunk. This is normally applied lazily, because it involves walking the stack and atomically black-holing thunks pointed to by update frames. The noDuplicate# primop just invokes the stack walk immediately; the idea is to try to prevent multiple threads from evaluating a thunk containing unsafePerformIO.

It's expensive. It's also not foolproof, because if you already happened to create two copies of the unsafePerformIO thunk then noDuplicate# can't help. I've never really liked it for these reasons, but I don't know a better way. We have unsafeDupablePerformIO that doesn't call noDuplicate#, and the programmer can use when the unsafePerformIO can safely be executed multiple times.

We need some kind of fix for 8.2. Simon what do you suggest?

David's current fix would be OK (along with a clear notice in the release notes etc. to note that the implementation got slower). I think -fatomic-eager-blackholing might "fix" it with less overhead, though.

Ben's suggestion:

...
eagerlyBlackhole :: a -> a

is likely to be unreliable I think. We lack the control in the source language to tie it to a particular thunk.

Cheers

Simon

Simon

*From:* Simon Marlow [mailto:marlowsd@gmail.com] *Sent:* 30 January 2017 21:51 *To:* David Feuer *Cc:* Simon Peyton Jones ; ghc-devs@haskell.org *Subject:* Re: Lazy ST vs concurrency

On 30 January 2017 at 16:18, David Feuer wrote:

I forgot to CC ghc-devs the first time, so here's another copy.

I was working on #11760 this weekend, which has to do with concurrency breaking lazy ST. I came up with what I thought was a pretty decent solution ( https://phabricator.haskell.org/D3038 ). Simon Peyton Jones, however, is quite unhappy about the idea of sticking this weird unsafePerformIO-like code (noDup, which I originally implemented as (unsafePerformIO . evaluate), but which he finds ugly regardless of the details) into fmap and (>>=). He's also concerned that the noDuplicate# applications will kill performance in the multi-threaded case, and suggests he would rather leave lazy ST broken, or even remove it altogether, than use a fix that will make it slow sometimes, particularly since there haven't been a lot of reports of problems in the wild.

In a nutshell, I think we have to fix this despite the cost - the implementation is incorrect and unsafe.

Unfortunately the mechanisms we have right now to fix it aren't ideal - noDuplicate# is a bigger hammer than we need. All we really need is some way to make a thunk atomic, it would require some special entry code to the thunk which did atomic eager-blackholing. Hmm, now that I think about it, perhaps we could just have a flag, -fatomic-eager-blackholing. We already do this for CAFs, incidentally. The idea is to compare-and-swap the blackhole info pointer into the thunk, and if we didn't win the race, just re-enter the thunk (which is now a blackhole). We already have the cmpxchg MachOp, so It shouldn't be more than a few lines in the code generator to implement it. It would be too expensive to do by default, but doing it just for Control.Monad.ST.Lazy should be ok and would fix the unsafety.

(I haven't really thought this through, just an idea off the top of my head, so there could well be something I'm overlooking here...)

Cheers

Simon

My view is that leaving it broken, even if it only causes trouble occasionally, is simply not an option. If users can't rely on it to always give correct answers, then it's effectively useless. And for the sake of backwards compatibility, I think it's a lot better to keep it around, even if it runs slowly multithreaded, than to remove it altogether.

Note to Simon PJ: Yes, it's ugly to stick that noDup in there. But lazy ST has always been a bit of deep magic. You can't *really* carry a moment of time around in your pocket and make its history happen only if necessary. We can make it work in GHC because its execution model is entirely based around graph reduction, so evaluation is capable of driving execution. Whereas lazy IO is extremely tricky because it causes effects observable in the real world, lazy ST is only *moderately* tricky, causing effects that we have to make sure don't lead to weird interactions between threads. I don't think it's terribly surprising that it needs to do a few more weird things to work properly.

David

Simon Peyton Jones

10:02 a.m.

Huh. You are right. That’s horrible. OK, here’s another idea. Provide, applyOnce# :: (a->b) -> a -> b which behaves like applyOnce f x = f x but guarantees that any thunk (applyOnce# f x) will be evaluated with atomic eager black-holing. \(s :: State# s) -> let unsafePerformIO = \g -> g s thunk = applyOnce# unsafePerformIO (\s -> ... ) in ... Of course this does not guarantee safety. But I think it’d give a per-thunk way to specify it. Simon From: Simon Marlow [mailto:marlowsd@gmail.com] Sent: 31 January 2017 09:25 To: Simon Peyton Jones Cc: David Feuer ; ghc-devs@haskell.org Subject: Re: Lazy ST vs concurrency On 31 January 2017 at 09:11, Simon Peyton Jones mailto:simonpj@microsoft.com> wrote: If we could identify exactly the thunks we wanted to be atomic, then yes, that would be better than a whole-module solution. However I'm not sure how to do that - doing it on the basis of a free variable with State# type doesn't work if the State# is buried in a data structure or a function closure, for instance. I disagree. Having a free State# variable is precisely necessary and sufficient, I claim. Can you provide a counter-example? Sure, what I had in mind is something like this, defining a local unsafePerformIO: \(s :: State# s) -> let unsafePerformIO = \g -> g s thunk = unsafePerformIO (\s -> ... ) in ... and "thunk" doesn't have a free variable of type State#. Cheers Simon Informal proof: • The model is that a value of type (State# t) is a linear value that we mutate in-place. We must not consume it twice. • Evaluating a thunk that has a free (State# t) variable is precisely “consuming” it. So we should only do that once I think -fatomic-eager-blackholing might "fix" it with less overhead, though But precisely where would you have to use that flag? Inlining could meant that the code appears anywhere! Once we have the ability to atomically-blackhole a thunk, we can just use my criterion above, I claim. Stopgap story for 8.2. I am far from convinced that putting unsafePerformIO in the impl of (>>=) for the ST monad will be correct; but if you tell me it is, and if it is surrounded with huge banners saying that this is the wrong solution, and pointing to a new ticket to fix it, then OK. Arguably this isn't all that urgent, given that it's been broken for 8 years or so. Simon From: Simon Marlow [mailto:marlowsd@gmail.commailto:marlowsd@gmail.com] Sent: 31 January 2017 08:59 To: Simon Peyton Jones mailto:simonpj@microsoft.com> Cc: David Feuer mailto:david@well-typed.com>; ghc-devs@haskell.orgmailto:ghc-devs@haskell.org Subject: Re: Lazy ST vs concurrency On 30 January 2017 at 22:56, Simon Peyton Jones mailto:simonpj@microsoft.com> wrote: We don’t want to do this on a per-module basis do we, as -fatomic-eager-blackholing would suggest. Rather, on per-thunk basis, no? Which thunks, precisely? I think perhaps precisely thunks one of whose free variables has type (Sttate# s) for some s. These are thunks that consume a state token, and must do so no more than once. If we could identify exactly the thunks we wanted to be atomic, then yes, that would be better than a whole-module solution. However I'm not sure how to do that - doing it on the basis of a free variable with State# type doesn't work if the State# is buried in a data structure or a function closure, for instance. If entering such thunks was atomic, could we kill off noDuplicate#? I still don’t understand exactly what noDuplicate# does, what problem it solves, and how the problem it solves relates to this LazyST problem. Back in our "Haskell on a Shared Memory Multiprocessor" paper (http://simonmar.github.io/bib/papers/multiproc.pdf https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fsimonmar.github.io%2Fbib%2Fpapers%2Fmultiproc.pdf&data=02%7C01%7Csimonpj%40microsoft.com%7C49b93aee78394d54fcab08d449b76706%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C1%7C636214499439419212&sdata=81aU2TCVDxdNFl7CIHd8GxWUUdmUn%2FdRO4bOi2ScpVw%3D&reserved=0) we described a scheme to try to avoid duplication of work when multiple cores evaluate the same thunk. This is normally applied lazily, because it involves walking the stack and atomically black-holing thunks pointed to by update frames. The noDuplicate# primop just invokes the stack walk immediately; the idea is to try to prevent multiple threads from evaluating a thunk containing unsafePerformIO. It's expensive. It's also not foolproof, because if you already happened to create two copies of the unsafePerformIO thunk then noDuplicate# can't help. I've never really liked it for these reasons, but I don't know a better way. We have unsafeDupablePerformIO that doesn't call noDuplicate#, and the programmer can use when the unsafePerformIO can safely be executed multiple times. We need some kind of fix for 8.2. Simon what do you suggest? David's current fix would be OK (along with a clear notice in the release notes etc. to note that the implementation got slower). I think -fatomic-eager-blackholing might "fix" it with less overhead, though. Ben's suggestion:

...

eagerlyBlackhole :: a -> a

Simon Marlow

1 Feb 1 Feb

9:01 a.m.

On 31 January 2017 at 10:02, Simon Peyton Jones wrote:

...

Huh. You are right. That’s horrible.

Horrible indeed!

...

OK, here’s another idea. Provide,

applyOnce# :: (a->b) -> a -> b

which behaves like

applyOnce f x = f x

but guarantees that any thunk (applyOnce# f x) will be evaluated with atomic eager black-holing.

\(s :: State# s) ->

let unsafePerformIO = \g -> g s

thunk = applyOnce# unsafePerformIO (\s -> ... )

in

...

But what if GHC decided to add another thunk, e.g. let thunk = let x = applyOnce# unsafePerformIO (\s -> ...) in x now we need atomicity on both thunks, but there's no way to tell. (of course GHC probably wouldn't do this particularly transformation, but there are a whole range of other things that it might correctly do that would subvert the programmer's intention to make a single atomic thunk). noDuplicate# avoids this problem because it walks the whole stack, claiming the whole chain of thunks currently under evaluation. But this is a messy solution, I don't like it either. Cheers Simon

...

Of course this does not guarantee safety. But I think it’d give a per-thunk way to specify it.

Simon

*From:* Simon Marlow [mailto:marlowsd@gmail.com] *Sent:* 31 January 2017 09:25

*To:* Simon Peyton Jones *Cc:* David Feuer ; ghc-devs@haskell.org *Subject:* Re: Lazy ST vs concurrency

On 31 January 2017 at 09:11, Simon Peyton Jones wrote:

If we could identify exactly the thunks we wanted to be atomic, then yes, that would be better than a whole-module solution. However I'm not sure how to do that - doing it on the basis of a free variable with State# type doesn't work if the State# is buried in a data structure or a function closure, for instance.

I disagree. Having a free State# variable is precisely necessary and sufficient, I claim. Can you provide a counter-example?

Sure, what I had in mind is something like this, defining a local unsafePerformIO:

\(s :: State# s) ->

let unsafePerformIO = \g -> g s

thunk = unsafePerformIO (\s -> ... )

in

...

and "thunk" doesn't have a free variable of type State#.

Cheers

Simon

Informal proof:

· The model is that a value of type (State# t) is a linear value that we mutate in-place. We must not consume it twice.

· Evaluating a thunk that has a free (State# t) variable is precisely “consuming” it. So we should only do that once

I think -fatomic-eager-blackholing might "fix" it with less overhead, though

But precisely where would you have to use that flag? Inlining could meant that the code appears anywhere! Once we have the ability to atomically-blackhole a thunk, we can just use my criterion above, I claim.

Stopgap story for 8.2. I am far from convinced that putting unsafePerformIO in the impl of (>>=) for the ST monad will be correct; but if you tell me it is, and if it is surrounded with huge banners saying that this is the wrong solution, and pointing to a new ticket to fix it, then OK.

Arguably this isn't all that urgent, given that it's been broken for 8 years or so.

Simon

*From:* Simon Marlow [mailto:marlowsd@gmail.com] *Sent:* 31 January 2017 08:59 *To:* Simon Peyton Jones *Cc:* David Feuer ; ghc-devs@haskell.org

*Subject:* Re: Lazy ST vs concurrency

On 30 January 2017 at 22:56, Simon Peyton Jones wrote:

We don’t want to do this on a per-module basis do we, as -fatomic-eager-blackholing would suggest. Rather, on per-thunk basis, no? Which thunks, precisely? I think perhaps *precisely thunks one of whose free variables has type (Sttate# s) for some s.* These are thunks that consume a state token, and must do so no more than once.

If we could identify exactly the thunks we wanted to be atomic, then yes, that would be better than a whole-module solution. However I'm not sure how to do that - doing it on the basis of a free variable with State# type doesn't work if the State# is buried in a data structure or a function closure, for instance.

If entering such thunks was atomic, could we kill off noDuplicate#?

I still don’t understand exactly what noDuplicate# does, what problem it solves, and how the problem it solves relates to this LazyST problem.

Back in our "Haskell on a Shared Memory Multiprocessor" paper ( http://simonmar.github.io/bib/papers/multiproc.pdf https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fsimonmar.github.io%2Fbib%2Fpapers%2Fmultiproc.pdf&data=02%7C01%7Csimonpj%40microsoft.com%7C49b93aee78394d54fcab08d449b76706%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C1%7C636214499439419212&sdata=81aU2TCVDxdNFl7CIHd8GxWUUdmUn%2FdRO4bOi2ScpVw%3D&reserved=0) we described a scheme to try to avoid duplication of work when multiple cores evaluate the same thunk. This is normally applied lazily, because it involves walking the stack and atomically black-holing thunks pointed to by update frames. The noDuplicate# primop just invokes the stack walk immediately; the idea is to try to prevent multiple threads from evaluating a thunk containing unsafePerformIO.

It's expensive. It's also not foolproof, because if you already happened to create two copies of the unsafePerformIO thunk then noDuplicate# can't help. I've never really liked it for these reasons, but I don't know a better way. We have unsafeDupablePerformIO that doesn't call noDuplicate#, and the programmer can use when the unsafePerformIO can safely be executed multiple times.

We need some kind of fix for 8.2. Simon what do you suggest?

David's current fix would be OK (along with a clear notice in the release notes etc. to note that the implementation got slower). I think -fatomic-eager-blackholing might "fix" it with less overhead, though.

Ben's suggestion:

...
eagerlyBlackhole :: a -> a

is likely to be unreliable I think. We lack the control in the source language to tie it to a particular thunk.

Cheers

Simon

Simon

*From:* Simon Marlow [mailto:marlowsd@gmail.com] *Sent:* 30 January 2017 21:51 *To:* David Feuer *Cc:* Simon Peyton Jones ; ghc-devs@haskell.org *Subject:* Re: Lazy ST vs concurrency

On 30 January 2017 at 16:18, David Feuer wrote:

I forgot to CC ghc-devs the first time, so here's another copy.

I was working on #11760 this weekend, which has to do with concurrency breaking lazy ST. I came up with what I thought was a pretty decent solution ( https://phabricator.haskell.org/D3038 ). Simon Peyton Jones, however, is quite unhappy about the idea of sticking this weird unsafePerformIO-like code (noDup, which I originally implemented as (unsafePerformIO . evaluate), but which he finds ugly regardless of the details) into fmap and (>>=). He's also concerned that the noDuplicate# applications will kill performance in the multi-threaded case, and suggests he would rather leave lazy ST broken, or even remove it altogether, than use a fix that will make it slow sometimes, particularly since there haven't been a lot of reports of problems in the wild.

In a nutshell, I think we have to fix this despite the cost - the implementation is incorrect and unsafe.

Unfortunately the mechanisms we have right now to fix it aren't ideal - noDuplicate# is a bigger hammer than we need. All we really need is some way to make a thunk atomic, it would require some special entry code to the thunk which did atomic eager-blackholing. Hmm, now that I think about it, perhaps we could just have a flag, -fatomic-eager-blackholing. We already do this for CAFs, incidentally. The idea is to compare-and-swap the blackhole info pointer into the thunk, and if we didn't win the race, just re-enter the thunk (which is now a blackhole). We already have the cmpxchg MachOp, so It shouldn't be more than a few lines in the code generator to implement it. It would be too expensive to do by default, but doing it just for Control.Monad.ST.Lazy should be ok and would fix the unsafety.

(I haven't really thought this through, just an idea off the top of my head, so there could well be something I'm overlooking here...)

Cheers

Simon

My view is that leaving it broken, even if it only causes trouble occasionally, is simply not an option. If users can't rely on it to always give correct answers, then it's effectively useless. And for the sake of backwards compatibility, I think it's a lot better to keep it around, even if it runs slowly multithreaded, than to remove it altogether.

Note to Simon PJ: Yes, it's ugly to stick that noDup in there. But lazy ST has always been a bit of deep magic. You can't *really* carry a moment of time around in your pocket and make its history happen only if necessary. We can make it work in GHC because its execution model is entirely based around graph reduction, so evaluation is capable of driving execution. Whereas lazy IO is extremely tricky because it causes effects observable in the real world, lazy ST is only *moderately* tricky, causing effects that we have to make sure don't lead to weird interactions between threads. I don't think it's terribly surprising that it needs to do a few more weird things to work properly.

David

Simon Peyton Jones

9:48 a.m.

But what if GHC decided to add another thunk, e.g. Ugh. That is even more horrible. Clearly I’m not very good at thinking about this stuff. It smells wrong; we should find a simple, composable primitive that does the right thing. I’d like to leave some breadcrumbs of this thread, at very least pointing to it so we don’t repeat the thinking later. Meanwhile, for the present, what are we do to about LazyST? Simon From: Simon Marlow [mailto:marlowsd@gmail.com] Sent: 01 February 2017 09:02 To: Simon Peyton Jones Cc: David Feuer ; ghc-devs@haskell.org Subject: Re: Lazy ST vs concurrency On 31 January 2017 at 10:02, Simon Peyton Jones mailto:simonpj@microsoft.com> wrote: Huh. You are right. That’s horrible. Horrible indeed! OK, here’s another idea. Provide, applyOnce# :: (a->b) -> a -> b which behaves like applyOnce f x = f x but guarantees that any thunk (applyOnce# f x) will be evaluated with atomic eager black-holing. \(s :: State# s) -> let unsafePerformIO = \g -> g s thunk = applyOnce# unsafePerformIO (\s -> ... ) in ... But what if GHC decided to add another thunk, e.g. let thunk = let x = applyOnce# unsafePerformIO (\s -> ...) in x now we need atomicity on both thunks, but there's no way to tell. (of course GHC probably wouldn't do this particularly transformation, but there are a whole range of other things that it might correctly do that would subvert the programmer's intention to make a single atomic thunk). noDuplicate# avoids this problem because it walks the whole stack, claiming the whole chain of thunks currently under evaluation. But this is a messy solution, I don't like it either. Cheers Simon Of course this does not guarantee safety. But I think it’d give a per-thunk way to specify it. Simon From: Simon Marlow [mailto:marlowsd@gmail.commailto:marlowsd@gmail.com] Sent: 31 January 2017 09:25 To: Simon Peyton Jones mailto:simonpj@microsoft.com> Cc: David Feuer mailto:david@well-typed.com>; ghc-devs@haskell.orgmailto:ghc-devs@haskell.org Subject: Re: Lazy ST vs concurrency On 31 January 2017 at 09:11, Simon Peyton Jones mailto:simonpj@microsoft.com> wrote: If we could identify exactly the thunks we wanted to be atomic, then yes, that would be better than a whole-module solution. However I'm not sure how to do that - doing it on the basis of a free variable with State# type doesn't work if the State# is buried in a data structure or a function closure, for instance. I disagree. Having a free State# variable is precisely necessary and sufficient, I claim. Can you provide a counter-example? Sure, what I had in mind is something like this, defining a local unsafePerformIO: \(s :: State# s) -> let unsafePerformIO = \g -> g s thunk = unsafePerformIO (\s -> ... ) in ... and "thunk" doesn't have a free variable of type State#. Cheers Simon Informal proof: • The model is that a value of type (State# t) is a linear value that we mutate in-place. We must not consume it twice. • Evaluating a thunk that has a free (State# t) variable is precisely “consuming” it. So we should only do that once I think -fatomic-eager-blackholing might "fix" it with less overhead, though But precisely where would you have to use that flag? Inlining could meant that the code appears anywhere! Once we have the ability to atomically-blackhole a thunk, we can just use my criterion above, I claim. Stopgap story for 8.2. I am far from convinced that putting unsafePerformIO in the impl of (>>=) for the ST monad will be correct; but if you tell me it is, and if it is surrounded with huge banners saying that this is the wrong solution, and pointing to a new ticket to fix it, then OK. Arguably this isn't all that urgent, given that it's been broken for 8 years or so. Simon From: Simon Marlow [mailto:marlowsd@gmail.commailto:marlowsd@gmail.com] Sent: 31 January 2017 08:59 To: Simon Peyton Jones mailto:simonpj@microsoft.com> Cc: David Feuer mailto:david@well-typed.com>; ghc-devs@haskell.orgmailto:ghc-devs@haskell.org Subject: Re: Lazy ST vs concurrency On 30 January 2017 at 22:56, Simon Peyton Jones mailto:simonpj@microsoft.com> wrote: We don’t want to do this on a per-module basis do we, as -fatomic-eager-blackholing would suggest. Rather, on per-thunk basis, no? Which thunks, precisely? I think perhaps precisely thunks one of whose free variables has type (Sttate# s) for some s. These are thunks that consume a state token, and must do so no more than once. If we could identify exactly the thunks we wanted to be atomic, then yes, that would be better than a whole-module solution. However I'm not sure how to do that - doing it on the basis of a free variable with State# type doesn't work if the State# is buried in a data structure or a function closure, for instance. If entering such thunks was atomic, could we kill off noDuplicate#? I still don’t understand exactly what noDuplicate# does, what problem it solves, and how the problem it solves relates to this LazyST problem. Back in our "Haskell on a Shared Memory Multiprocessor" paper (http://simonmar.github.io/bib/papers/multiproc.pdf https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fsimonmar.github.io%2Fbib%2Fpapers%2Fmultiproc.pdf&data=02%7C01%7Csimonpj%40microsoft.com%7C49b93aee78394d54fcab08d449b76706%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C1%7C636214499439419212&sdata=81aU2TCVDxdNFl7CIHd8GxWUUdmUn%2FdRO4bOi2ScpVw%3D&reserved=0) we described a scheme to try to avoid duplication of work when multiple cores evaluate the same thunk. This is normally applied lazily, because it involves walking the stack and atomically black-holing thunks pointed to by update frames. The noDuplicate# primop just invokes the stack walk immediately; the idea is to try to prevent multiple threads from evaluating a thunk containing unsafePerformIO. It's expensive. It's also not foolproof, because if you already happened to create two copies of the unsafePerformIO thunk then noDuplicate# can't help. I've never really liked it for these reasons, but I don't know a better way. We have unsafeDupablePerformIO that doesn't call noDuplicate#, and the programmer can use when the unsafePerformIO can safely be executed multiple times. We need some kind of fix for 8.2. Simon what do you suggest? David's current fix would be OK (along with a clear notice in the release notes etc. to note that the implementation got slower). I think -fatomic-eager-blackholing might "fix" it with less overhead, though. Ben's suggestion:

...

eagerlyBlackhole :: a -> a

3176

Age (days ago)

3178

Last active (days ago)

List overview

Download

12 comments

5 participants

participants (5)

Ben Gamari
David Feuer
Reid Barton
Simon Marlow
Simon Peyton Jones

Lazy ST vs concurrency

tags

participants (5)