Re: Proposal: Use uninterruptibleMask for cleanup actions in Control.Exception

Hey Simon, thanks for the reply!
On Fri, Sep 5, 2014 at 6:39 PM, Simon Marlow
Eyal, thanks for bringing up this issue. It's been at the back of my mind for a while, but I've never really thought through the issues and consequences of changes. So this is a good opportunity to do that. You point out (in another email in the thread) that:
A) Cases that were not interruptible will remain the same. B) Cases that were interruptible were bugs and will be fixed.
However,
C) Some bugs will turn into deadlocks (unkillable threads)
Being able to recover from bugs is an important property in large long-running systems. So this is a serious problem. Hence why I always treat uninterruptibleMask with the deepest suspicion.
Recovering from various kinds of failures makes a lot of sense. But how can you recover from arbitrary invariants of the program being broken? For example, if you use a bracket on some semaphore monitoring a global resource. How do you recover from a bug of leaking semaphore tokens? Recovering from crashes of whole processes whose internal state can be recovered to a fresh, usable state, is a great feature. Recovering from thread crashes that share arbitrary mutable state with other threads is not practical, I believe.
Let's consider the case where we have an interruptible operation in the handler, and divide it into two (er three):
1. it blocks for a short bounded amount of time. 2. It blocks for a long time 3. It blocks indefinitely
These are all buggy, but in different ways. Only (1) is fixed by adding uninterruptibleMask. (2) is "fixed", but in exchange for an unresponsive thread - also undesirable. (3) was a bug in the application code, and turns into a deadlock with uninterruptibleMask, which is undesirable.
I think that (1) is by far the most common and is very prevalent. I think 0-time interruptible (that can block but almost never do) operations are the most common cleanup handlers. For (2) and (3), we need to choose the lesser evil: A) Deadlocks and/or unresponsiveness B) Arbitrary invariants being broken and leaks In my experience, A tends to manifest exactly where the bug is, and is therefore easy to debug and mostly a "performance bug" . B tends to manifest as difficult to explain behavior elsewhere from where the bug actually is, and is usually a "correctness bug", which is almost always worse. Therefore, I think A is a far far lesser evil than B, when (2) and (3) are involved. I'd like to reemphasize that this change will almost always fix the problem completely since the most common case is (1), and in rare cases, it will convert B to A, which is also, IMO, very desirable.
This is as far as I've got thinking through the issues so far. I wonder to what extent the programmer can and should mitigate these cases, and how much we can help them. I don't want unkillable threads, even when caused by buggy code.
Cheers, Simon
On 04/09/2014 16:46, Roman Cheplyaka wrote:
I find your arguments quite convincing. Count that as +1 from me.
Roman
_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries
-- Eyal

Ok, sorry for the delay, we still need a resolution on this one. So thanks to your persuasive comments I think I'm convinced. What finally tipped me over the edge was this: https://phabricator.haskell.org/diffusion/GHC/browse/master/libraries/base/C... It turns out I've been a victim of this "bug" myself :-) So let's fix it. But what is the cost? Adding an uninterruptibleMask won't be free. In the case of `catch`, since the mask is already built in to the primitive, we can just change it to be an uninterruptibleMask, and that applies to handle and onException too. For `finally` we can replace the mask with an uninterruptibleMask, but for `bracket` we have to add a new layer of uninterruptibleMask. Lots of documentation probably needs to be updated. Any chance you could make a patch and upload it to Phabricator? Cheers, Simon On 05/09/2014 18:34, Eyal Lotem wrote:
Hey Simon, thanks for the reply!
On Fri, Sep 5, 2014 at 6:39 PM, Simon Marlow
mailto:marlowsd@gmail.com> wrote: Eyal, thanks for bringing up this issue. It's been at the back of my mind for a while, but I've never really thought through the issues and consequences of changes. So this is a good opportunity to do that. You point out (in another email in the thread) that:
A) Cases that were not interruptible will remain the same. B) Cases that were interruptible were bugs and will be fixed.
However,
C) Some bugs will turn into deadlocks (unkillable threads)
Being able to recover from bugs is an important property in large long-running systems. So this is a serious problem. Hence why I always treat uninterruptibleMask with the deepest suspicion.
Recovering from various kinds of failures makes a lot of sense. But how can you recover from arbitrary invariants of the program being broken?
For example, if you use a bracket on some semaphore monitoring a global resource. How do you recover from a bug of leaking semaphore tokens?
Recovering from crashes of whole processes whose internal state can be recovered to a fresh, usable state, is a great feature. Recovering from thread crashes that share arbitrary mutable state with other threads is not practical, I believe.
Let's consider the case where we have an interruptible operation in the handler, and divide it into two (er three):
1. it blocks for a short bounded amount of time. 2. It blocks for a long time 3. It blocks indefinitely
These are all buggy, but in different ways. Only (1) is fixed by adding uninterruptibleMask. (2) is "fixed", but in exchange for an unresponsive thread - also undesirable. (3) was a bug in the application code, and turns into a deadlock with uninterruptibleMask, which is undesirable.
I think that (1) is by far the most common and is very prevalent. I think 0-time interruptible (that can block but almost never do) operations are the most common cleanup handlers.
For (2) and (3), we need to choose the lesser evil:
A) Deadlocks and/or unresponsiveness B) Arbitrary invariants being broken and leaks
In my experience, A tends to manifest exactly where the bug is, and is therefore easy to debug and mostly a "performance bug" . B tends to manifest as difficult to explain behavior elsewhere from where the bug actually is, and is usually a "correctness bug", which is almost always worse.
Therefore, I think A is a far far lesser evil than B, when (2) and (3) are involved.
I'd like to reemphasize that this change will almost always fix the problem completely since the most common case is (1), and in rare cases, it will convert B to A, which is also, IMO, very desirable.
This is as far as I've got thinking through the issues so far. I wonder to what extent the programmer can and should mitigate these cases, and how much we can help them. I don't want unkillable threads, even when caused by buggy code.
Cheers, Simon
On 04/09/2014 16:46, Roman Cheplyaka wrote:
I find your arguments quite convincing. Count that as +1 from me.
Roman
_________________________________________________ Libraries mailing list Libraries@haskell.org mailto:Libraries@haskell.org http://www.haskell.org/__mailman/listinfo/libraries http://www.haskell.org/mailman/listinfo/libraries
-- Eyal

Hi Simon,
Well, another issue that was raised (I forget by whom!) was the fact that stackoverflows are currently thrown externally from the executing thread and that this should really be changed into being thrown to the thread by itself, thereby avoiding the mask.
Cheers,
Merijn
On 24 Sep 2014, at 05:51 , Simon Marlow
Ok, sorry for the delay, we still need a resolution on this one.
So thanks to your persuasive comments I think I'm convinced. What finally tipped me over the edge was this:
https://phabricator.haskell.org/diffusion/GHC/browse/master/libraries/base/C...
It turns out I've been a victim of this "bug" myself :-) So let's fix it.
But what is the cost? Adding an uninterruptibleMask won't be free.
In the case of `catch`, since the mask is already built in to the primitive, we can just change it to be an uninterruptibleMask, and that applies to handle and onException too. For `finally` we can replace the mask with an uninterruptibleMask, but for `bracket` we have to add a new layer of uninterruptibleMask.
Lots of documentation probably needs to be updated. Any chance you could make a patch and upload it to Phabricator?
Cheers, Simon
On 05/09/2014 18:34, Eyal Lotem wrote:
Hey Simon, thanks for the reply!
On Fri, Sep 5, 2014 at 6:39 PM, Simon Marlow
mailto:marlowsd@gmail.com> wrote: Eyal, thanks for bringing up this issue. It's been at the back of my mind for a while, but I've never really thought through the issues and consequences of changes. So this is a good opportunity to do that. You point out (in another email in the thread) that:
A) Cases that were not interruptible will remain the same. B) Cases that were interruptible were bugs and will be fixed.
However,
C) Some bugs will turn into deadlocks (unkillable threads)
Being able to recover from bugs is an important property in large long-running systems. So this is a serious problem. Hence why I always treat uninterruptibleMask with the deepest suspicion.
Recovering from various kinds of failures makes a lot of sense. But how can you recover from arbitrary invariants of the program being broken?
For example, if you use a bracket on some semaphore monitoring a global resource. How do you recover from a bug of leaking semaphore tokens?
Recovering from crashes of whole processes whose internal state can be recovered to a fresh, usable state, is a great feature. Recovering from thread crashes that share arbitrary mutable state with other threads is not practical, I believe.
Let's consider the case where we have an interruptible operation in the handler, and divide it into two (er three):
1. it blocks for a short bounded amount of time. 2. It blocks for a long time 3. It blocks indefinitely
These are all buggy, but in different ways. Only (1) is fixed by adding uninterruptibleMask. (2) is "fixed", but in exchange for an unresponsive thread - also undesirable. (3) was a bug in the application code, and turns into a deadlock with uninterruptibleMask, which is undesirable.
I think that (1) is by far the most common and is very prevalent. I think 0-time interruptible (that can block but almost never do) operations are the most common cleanup handlers.
For (2) and (3), we need to choose the lesser evil:
A) Deadlocks and/or unresponsiveness B) Arbitrary invariants being broken and leaks
In my experience, A tends to manifest exactly where the bug is, and is therefore easy to debug and mostly a "performance bug" . B tends to manifest as difficult to explain behavior elsewhere from where the bug actually is, and is usually a "correctness bug", which is almost always worse.
Therefore, I think A is a far far lesser evil than B, when (2) and (3) are involved.
I'd like to reemphasize that this change will almost always fix the problem completely since the most common case is (1), and in rare cases, it will convert B to A, which is also, IMO, very desirable.
This is as far as I've got thinking through the issues so far. I wonder to what extent the programmer can and should mitigate these cases, and how much we can help them. I don't want unkillable threads, even when caused by buggy code.
Cheers, Simon
On 04/09/2014 16:46, Roman Cheplyaka wrote:
I find your arguments quite convincing. Count that as +1 from me.
Roman
_________________________________________________ Libraries mailing list Libraries@haskell.org mailto:Libraries@haskell.org http://www.haskell.org/__mailman/listinfo/libraries http://www.haskell.org/mailman/listinfo/libraries
-- Eyal
Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

On 24/09/2014 17:58, Merijn Verstraaten wrote:
Well, another issue that was raised (I forget by whom!) was the fact that stackoverflows are currently thrown externally from the executing thread and that this should really be changed into being thrown to the thread by itself, thereby avoiding the mask.
Yeah, a stack overflow is an asynchronous exception. Tt is not thrown by any particular thread, but it is treated as an async exception from the point of view of the receiving thread. And this is the right thing to do, since a stack overflow can occur absolutely anywhere, so it has exactly the characteristics of an async exception. So it is already the case that a stack overflow inside mask does not cause an exception. Instead the exception is deferred until the thread exits the mask. This proposal wouldn't change anything in that respect. Cheers, Simon
Cheers, Merijn
On 24 Sep 2014, at 05:51 , Simon Marlow
wrote: Ok, sorry for the delay, we still need a resolution on this one.
So thanks to your persuasive comments I think I'm convinced. What finally tipped me over the edge was this:
https://phabricator.haskell.org/diffusion/GHC/browse/master/libraries/base/C...
It turns out I've been a victim of this "bug" myself :-) So let's fix it.
But what is the cost? Adding an uninterruptibleMask won't be free.
In the case of `catch`, since the mask is already built in to the primitive, we can just change it to be an uninterruptibleMask, and that applies to handle and onException too. For `finally` we can replace the mask with an uninterruptibleMask, but for `bracket` we have to add a new layer of uninterruptibleMask.
Lots of documentation probably needs to be updated. Any chance you could make a patch and upload it to Phabricator?
Cheers, Simon
Hey Simon, thanks for the reply!
On Fri, Sep 5, 2014 at 6:39 PM, Simon Marlow
mailto:marlowsd@gmail.com> wrote: Eyal, thanks for bringing up this issue. It's been at the back of my mind for a while, but I've never really thought through the issues and consequences of changes. So this is a good opportunity to do that. You point out (in another email in the thread) that:
A) Cases that were not interruptible will remain the same. B) Cases that were interruptible were bugs and will be fixed.
However,
C) Some bugs will turn into deadlocks (unkillable threads)
Being able to recover from bugs is an important property in large long-running systems. So this is a serious problem. Hence why I always treat uninterruptibleMask with the deepest suspicion.
Recovering from various kinds of failures makes a lot of sense. But how can you recover from arbitrary invariants of the program being broken?
For example, if you use a bracket on some semaphore monitoring a global resource. How do you recover from a bug of leaking semaphore tokens?
Recovering from crashes of whole processes whose internal state can be recovered to a fresh, usable state, is a great feature. Recovering from thread crashes that share arbitrary mutable state with other threads is not practical, I believe.
Let's consider the case where we have an interruptible operation in the handler, and divide it into two (er three):
1. it blocks for a short bounded amount of time. 2. It blocks for a long time 3. It blocks indefinitely
These are all buggy, but in different ways. Only (1) is fixed by adding uninterruptibleMask. (2) is "fixed", but in exchange for an unresponsive thread - also undesirable. (3) was a bug in the application code, and turns into a deadlock with uninterruptibleMask, which is undesirable.
I think that (1) is by far the most common and is very prevalent. I think 0-time interruptible (that can block but almost never do) operations are the most common cleanup handlers.
For (2) and (3), we need to choose the lesser evil:
A) Deadlocks and/or unresponsiveness B) Arbitrary invariants being broken and leaks
In my experience, A tends to manifest exactly where the bug is, and is therefore easy to debug and mostly a "performance bug" . B tends to manifest as difficult to explain behavior elsewhere from where the bug actually is, and is usually a "correctness bug", which is almost always worse.
Therefore, I think A is a far far lesser evil than B, when (2) and (3) are involved.
I'd like to reemphasize that this change will almost always fix the problem completely since the most common case is (1), and in rare cases, it will convert B to A, which is also, IMO, very desirable.
This is as far as I've got thinking through the issues so far. I wonder to what extent the programmer can and should mitigate these cases, and how much we can help them. I don't want unkillable threads, even when caused by buggy code.
Cheers, Simon
On 04/09/2014 16:46, Roman Cheplyaka wrote:
I find your arguments quite convincing. Count that as +1 from me.
Roman
_________________________________________________ Libraries mailing list Libraries@haskell.org mailto:Libraries@haskell.org http://www.haskell.org/__mailman/listinfo/libraries http://www.haskell.org/mailman/listinfo/libraries
-- Eyal _______________________________________________ Libraries mailing
On 05/09/2014 18:34, Eyal Lotem wrote: list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries
participants (3)
-
Eyal Lotem
-
Merijn Verstraaten
-
Simon Marlow