Way to expose BLACKHOLES through an API?

Hi GHC users, When implementing certain concurrent systems-level software in Haskell it is good to be aware of all potentially blocking operations. Presently, blocking on an MVar is explicit (it only happens when you do a takeMVar), but blocking on a BLACKHOLE is implicit and can potentially happen anywhere. If there are known thunks where we, the programmers, know that contention might occur, would it be possible to create a variant of "Control.Monad.Evaluate" that allows us to construct non-blocking software: evaluate :: a -> IO a evaluateNonblocking :: a -> IO (Maybe a) It would simply return Nothing if the value is BLACKHOLE'd. Of course it may be helpful to also distinguish the evaluated and unevaluated states. Further, the above simple version allows data-races (it may become blackhole'd right after we evaluate). An extreme version would actively blackhole it to "lock" the thunk... but maybe that's overkill and there are some other good ideas out there. A mechanism like the proposed should, for example, allow us to consume just as much of a lazy Bytestring as has already been computed by a producer, WITHOUT blocking and waiting on that producer thread, or migrating the producer computation over to our own thread (blowing its cache). Thanks, -Ryan

On 07/11/2011 14:50, Ryan Newton wrote:
Hi GHC users,
When implementing certain concurrent systems-level software in Haskell it is good to be aware of all potentially blocking operations. Presently, blocking on an MVar is explicit (it only happens when you do a takeMVar), but blocking on a BLACKHOLE is implicit and can potentially happen anywhere.
If there are known thunks where we, the programmers, know that contention might occur, would it be possible to create a variant of "Control.Monad.Evaluate" that allows us to construct non-blocking software:
evaluate :: a -> IO a evaluateNonblocking :: a -> IO (Maybe a)
It would simply return Nothing if the value is BLACKHOLE'd. Of course it may be helpful to also distinguish the evaluated and unevaluated states. Further, the above simple version allows data-races (it may become blackhole'd right after we evaluate). An extreme version would actively blackhole it to "lock" the thunk... but maybe that's overkill and there are some other good ideas out there.
A mechanism like the proposed should, for example, allow us to consume just as much of a lazy Bytestring as has already been computed by a producer, WITHOUT blocking and waiting on that producer thread, or migrating the producer computation over to our own thread (blowing its cache).
The problem is that a thunk may depend on other thunks, which may or may not themselves be BLACKHOLEs. So you might be a long way into evaluating the argument and have accumulated a deep stack before you encounter the BLACKHOLE. Hmm, but there is something you could do. Suppose a thread could be in a mode in which instead of blocking on a BLACKHOLE it would just throw an asynchronous exception WouldBlock. Any computation in progress would be safely abandoned via the usual asynchronous exception mechanism, and you could catch the exception to implement your evaluateNonBlocking operation. I'm not sure this would actually be useful in practice, but it's certainly doable. Cheers, Simon

On Tue, 2011-11-08 at 15:43 +0000, Simon Marlow wrote:
Hmm, but there is something you could do. Suppose a thread could be in a mode in which instead of blocking on a BLACKHOLE it would just throw an asynchronous exception WouldBlock. Any computation in progress would be safely abandoned via the usual asynchronous exception mechanism, and you could catch the exception to implement your evaluateNonBlocking operation.
I'm not sure this would actually be useful in practice, but it's certainly doable.
The linux kernel folks have been discussing a similar idea on and off for the last few years. The idea is to "return" in another thread if the initial system call blocks. Perhaps there's an equivalent here. We have an evaluateThingy function and when the scheduler notices that thread is going to block for some reason (either any reason or some specific reason) we return from evaluateThingy with some info about the blocked thread. The thing that the kernel folks could never decide on was to do with thread identity: if it was the original thread that blocked and we return in a new thread, or if the original thread returns and a clone is the one that blocks. Or perhaps it's a crazy idea and it would never work at all :-) Duncan

Hi, On 16.11.2011, at 10:43, Duncan Coutts wrote:
On Tue, 2011-11-08 at 15:43 +0000, Simon Marlow wrote:
Hmm, but there is something you could do. Suppose a thread could be in a mode in which instead of blocking on a BLACKHOLE it would just throw an asynchronous exception WouldBlock. Any computation in progress would be safely abandoned via the usual asynchronous exception mechanism, and you could catch the exception to implement your evaluateNonBlocking operation.
I'm not sure this would actually be useful in practice, but it's certainly doable.
The linux kernel folks have been discussing a similar idea on and off for the last few years. The idea is to "return" in another thread if the initial system call blocks.
Perhaps there's an equivalent here. We have an evaluateThingy function and when the scheduler notices that thread is going to block for some reason (either any reason or some specific reason) we return from evaluateThingy with some info about the blocked thread.
The thing that the kernel folks could never decide on was to do with thread identity: if it was the original thread that blocked and we return in a new thread, or if the original thread returns and a clone is the one that blocks.
The difference between the requirements of the Linux kernel folks and the OP is that in the Linux kernel the evaluation has to continue, while in the Haskell case we know that the BLACKHOLE is already evaluated by someone else. I am obviously no expert on the GHC internals, but that is what I understood by reading the papers about the runtime system. So, in GHC I'd say it would make sense to stay in the original thread and throw the exception as Simon Marlow said. Just my 2 eurocents. Cheers, Jean

On 16/11/2011 10:20, Jean-Marie Gaillourdet wrote:
Hi,
On 16.11.2011, at 10:43, Duncan Coutts wrote:
On Tue, 2011-11-08 at 15:43 +0000, Simon Marlow wrote:
Hmm, but there is something you could do. Suppose a thread could be in a mode in which instead of blocking on a BLACKHOLE it would just throw an asynchronous exception WouldBlock. Any computation in progress would be safely abandoned via the usual asynchronous exception mechanism, and you could catch the exception to implement your evaluateNonBlocking operation.
I'm not sure this would actually be useful in practice, but it's certainly doable.
The linux kernel folks have been discussing a similar idea on and off for the last few years. The idea is to "return" in another thread if the initial system call blocks.
Perhaps there's an equivalent here. We have an evaluateThingy function and when the scheduler notices that thread is going to block for some reason (either any reason or some specific reason) we return from evaluateThingy with some info about the blocked thread.
The thing that the kernel folks could never decide on was to do with thread identity: if it was the original thread that blocked and we return in a new thread, or if the original thread returns and a clone is the one that blocks.
The difference between the requirements of the Linux kernel folks and the OP is that in the Linux kernel the evaluation has to continue, while in the Haskell case we know that the BLACKHOLE is already evaluated by someone else. I am obviously no expert on the GHC internals, but that is what I understood by reading the papers about the runtime system. So, in GHC I'd say it would make sense to stay in the original thread and throw the exception as Simon Marlow said.
We have a great solution to the thread identity problem already - just freeze the computation using an asynchronous exception, and return in the original thread. The freezing process stores the state of the computation (i.e. the stack) on the heap, where it can be resumed by just evaluating the same value again. So I'm still not sure why we would want to do this, and we need a concrete application to be sure that the design is useful. Ryan had some application in mind using lazy bytestrings, but I don't think I really understand how this scheme would help yet. Cheers, Simon
participants (4)
-
Duncan Coutts
-
Jean-Marie Gaillourdet
-
Ryan Newton
-
Simon Marlow