Question about indirectees of BLACKHOLE closures

Hi, I've been looking at BLACKHOLE closures and how the indirectee field is used and I have a few questions: Looking at evacuate for BLACKHOLE closures: case BLACKHOLE: { StgClosure *r; const StgInfoTable *i; r = ((StgInd*)q)->indirectee; if (GET_CLOSURE_TAG(r) == 0) { i = r->header.info; if (IS_FORWARDING_PTR(i)) { r = (StgClosure *)UN_FORWARDING_PTR(i); i = r->header.info; } if (i == &stg_TSO_info || i == &stg_WHITEHOLE_info || i == &stg_BLOCKING_QUEUE_CLEAN_info || i == &stg_BLOCKING_QUEUE_DIRTY_info) { copy(p,info,q,sizeofW(StgInd),gen_no); return; } ASSERT(i != &stg_IND_info); } q = r; *p = r; goto loop; } It seems like indirectee can be a TSO, WHITEHOLE, BLOCKING_QUEUE_CLEAN, BLOCKING_QUEUE_DIRTY, and it can't be IND. I'm wondering what does it mean for a BLACKHOLE to point to a - TSO - WHITEHOLE - BLOCKING_QUEUE_CLEAN - BLOCKING_QUEUE_DIRTY Is this documented somewhere or otherwise could someone give a few pointers on where to look in the code? Secondly, I also looked at the BLACKHOLE entry code, and it seems like it has a different assumption about what can indirectee field point to: INFO_TABLE(stg_BLACKHOLE,1,0,BLACKHOLE,"BLACKHOLE","BLACKHOLE") (P_ node) { W_ r, info, owner, bd; P_ p, bq, msg; TICK_ENT_DYN_IND(); /* tick */ retry: p = StgInd_indirectee(node); if (GETTAG(p) != 0) { return (p); } info = StgHeader_info(p); if (info == stg_IND_info) { // This could happen, if e.g. we got a BLOCKING_QUEUE that has // just been replaced with an IND by another thread in // wakeBlockingQueue(). goto retry; } if (info == stg_TSO_info || info == stg_BLOCKING_QUEUE_CLEAN_info || info == stg_BLOCKING_QUEUE_DIRTY_info) { ("ptr" msg) = ccall allocate(MyCapability() "ptr", BYTES_TO_WDS(SIZEOF_MessageBlackHole)); SET_HDR(msg, stg_MSG_BLACKHOLE_info, CCS_SYSTEM); MessageBlackHole_tso(msg) = CurrentTSO; MessageBlackHole_bh(msg) = node; (r) = ccall messageBlackHole(MyCapability() "ptr", msg "ptr"); if (r == 0) { goto retry; } else { StgTSO_why_blocked(CurrentTSO) = BlockedOnBlackHole::I16; StgTSO_block_info(CurrentTSO) = msg; jump stg_block_blackhole(node); } } else { ENTER(p); } } The difference is, when the tag of indirectee is 0, evacuate assumes that indirectee can't point to an IND, but BLACKHOLE entry code thinks it's possible and there's even a comment about why. (I don't understand the comment yet) I'm wondering if this code is correct, and why. Again any pointers would be appreciated. Thanks, Ömer

Hi Omer,
On 20 March 2018 at 13:05, Ömer Sinan Ağacan
Hi,
I've been looking at BLACKHOLE closures and how the indirectee field is used and I have a few questions:
Looking at evacuate for BLACKHOLE closures:
case BLACKHOLE: { StgClosure *r; const StgInfoTable *i; r = ((StgInd*)q)->indirectee; if (GET_CLOSURE_TAG(r) == 0) { i = r->header.info; if (IS_FORWARDING_PTR(i)) { r = (StgClosure *)UN_FORWARDING_PTR(i); i = r->header.info; } if (i == &stg_TSO_info || i == &stg_WHITEHOLE_info || i == &stg_BLOCKING_QUEUE_CLEAN_info || i == &stg_BLOCKING_QUEUE_DIRTY_info) { copy(p,info,q,sizeofW(StgInd),gen_no); return; } ASSERT(i != &stg_IND_info); } q = r; *p = r; goto loop; }
It seems like indirectee can be a TSO, WHITEHOLE, BLOCKING_QUEUE_CLEAN, BLOCKING_QUEUE_DIRTY, and it can't be IND. I'm wondering what does it mean for a BLACKHOLE to point to a
- TSO - WHITEHOLE - BLOCKING_QUEUE_CLEAN - BLOCKING_QUEUE_DIRTY
That sounds right to me.
Is this documented somewhere or otherwise could someone give a few pointers on where to look in the code?
Unfortunately I don't think we have good documentation for this, but you should look at the comments around messageBlackHole in Messages.c.
Secondly, I also looked at the BLACKHOLE entry code, and it seems like it has a different assumption about what can indirectee field point to:
INFO_TABLE(stg_BLACKHOLE,1,0,BLACKHOLE,"BLACKHOLE","BLACKHOLE") (P_ node) { W_ r, info, owner, bd; P_ p, bq, msg;
TICK_ENT_DYN_IND(); /* tick */
retry: p = StgInd_indirectee(node); if (GETTAG(p) != 0) { return (p); }
info = StgHeader_info(p); if (info == stg_IND_info) { // This could happen, if e.g. we got a BLOCKING_QUEUE that has // just been replaced with an IND by another thread in // wakeBlockingQueue(). goto retry; }
if (info == stg_TSO_info || info == stg_BLOCKING_QUEUE_CLEAN_info || info == stg_BLOCKING_QUEUE_DIRTY_info) { ("ptr" msg) = ccall allocate(MyCapability() "ptr", BYTES_TO_WDS(SIZEOF_ MessageBlackHole));
SET_HDR(msg, stg_MSG_BLACKHOLE_info, CCS_SYSTEM); MessageBlackHole_tso(msg) = CurrentTSO; MessageBlackHole_bh(msg) = node;
(r) = ccall messageBlackHole(MyCapability() "ptr", msg "ptr");
if (r == 0) { goto retry; } else { StgTSO_why_blocked(CurrentTSO) = BlockedOnBlackHole::I16; StgTSO_block_info(CurrentTSO) = msg; jump stg_block_blackhole(node); } } else { ENTER(p); } }
The difference is, when the tag of indirectee is 0, evacuate assumes that indirectee can't point to an IND, but BLACKHOLE entry code thinks it's possible and there's even a comment about why. (I don't understand the comment yet) I'm wondering if this code is correct, and why. Again any pointers would be appreciated.
Taking a quick look at the code, my guess is that: - a BLOCKING_QUEUE gets overwritten by an IND in wakeBlockingQueue() - but when this happens, the indirectee of the BLACKHOLE will also be overwritten to point to the value At runtime a thread might see an intermediate state because these mutations are happening in another thread, so we might follow the indirectee and see the IND. But this state can't be observed by the GC, because all mutator threads have stopped at a safe point. Cheers Simon
Thanks,
Ömer _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Added comments: https://phabricator.haskell.org/D4517
On 20 March 2018 at 14:58, Simon Marlow
Hi Omer,
On 20 March 2018 at 13:05, Ömer Sinan Ağacan
wrote: Hi,
I've been looking at BLACKHOLE closures and how the indirectee field is used and I have a few questions:
Looking at evacuate for BLACKHOLE closures:
case BLACKHOLE: { StgClosure *r; const StgInfoTable *i; r = ((StgInd*)q)->indirectee; if (GET_CLOSURE_TAG(r) == 0) { i = r->header.info; if (IS_FORWARDING_PTR(i)) { r = (StgClosure *)UN_FORWARDING_PTR(i); i = r->header.info; } if (i == &stg_TSO_info || i == &stg_WHITEHOLE_info || i == &stg_BLOCKING_QUEUE_CLEAN_info || i == &stg_BLOCKING_QUEUE_DIRTY_info) { copy(p,info,q,sizeofW(StgInd),gen_no); return; } ASSERT(i != &stg_IND_info); } q = r; *p = r; goto loop; }
It seems like indirectee can be a TSO, WHITEHOLE, BLOCKING_QUEUE_CLEAN, BLOCKING_QUEUE_DIRTY, and it can't be IND. I'm wondering what does it mean for a BLACKHOLE to point to a
- TSO - WHITEHOLE - BLOCKING_QUEUE_CLEAN - BLOCKING_QUEUE_DIRTY
That sounds right to me.
Is this documented somewhere or otherwise could someone give a few pointers on where to look in the code?
Unfortunately I don't think we have good documentation for this, but you should look at the comments around messageBlackHole in Messages.c.
Secondly, I also looked at the BLACKHOLE entry code, and it seems like it has a different assumption about what can indirectee field point to:
INFO_TABLE(stg_BLACKHOLE,1,0,BLACKHOLE,"BLACKHOLE","BLACKHOLE") (P_ node) { W_ r, info, owner, bd; P_ p, bq, msg;
TICK_ENT_DYN_IND(); /* tick */
retry: p = StgInd_indirectee(node); if (GETTAG(p) != 0) { return (p); }
info = StgHeader_info(p); if (info == stg_IND_info) { // This could happen, if e.g. we got a BLOCKING_QUEUE that has // just been replaced with an IND by another thread in // wakeBlockingQueue(). goto retry; }
if (info == stg_TSO_info || info == stg_BLOCKING_QUEUE_CLEAN_info || info == stg_BLOCKING_QUEUE_DIRTY_info) { ("ptr" msg) = ccall allocate(MyCapability() "ptr", BYTES_TO_WDS(SIZEOF_MessageBl ackHole));
SET_HDR(msg, stg_MSG_BLACKHOLE_info, CCS_SYSTEM); MessageBlackHole_tso(msg) = CurrentTSO; MessageBlackHole_bh(msg) = node;
(r) = ccall messageBlackHole(MyCapability() "ptr", msg "ptr");
if (r == 0) { goto retry; } else { StgTSO_why_blocked(CurrentTSO) = BlockedOnBlackHole::I16; StgTSO_block_info(CurrentTSO) = msg; jump stg_block_blackhole(node); } } else { ENTER(p); } }
The difference is, when the tag of indirectee is 0, evacuate assumes that indirectee can't point to an IND, but BLACKHOLE entry code thinks it's possible and there's even a comment about why. (I don't understand the comment yet) I'm wondering if this code is correct, and why. Again any pointers would be appreciated.
Taking a quick look at the code, my guess is that: - a BLOCKING_QUEUE gets overwritten by an IND in wakeBlockingQueue() - but when this happens, the indirectee of the BLACKHOLE will also be overwritten to point to the value
At runtime a thread might see an intermediate state because these mutations are happening in another thread, so we might follow the indirectee and see the IND. But this state can't be observed by the GC, because all mutator threads have stopped at a safe point.
Cheers Simon
Thanks,
Ömer _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Thanks Simon, that's really helpful.
A few more questions:
As far as I understand the difference between
- BLACKHOLE pointing to a TSO
- BLACKHOLE pointing to a BLOCKING_QUEUE
is that in the former we don't yet have any threads blocked by the BLACKHOLE
whereas in the latter we have and the blocking queue holds all those blocked
threads. Did I get this right?
Secondly, can a BLACKHOLE point to a THUNK? I'd expect no, because we BLACKHOLE
a closure when we're done evaluating it (assuming no eager blackholing), and
evaluation usually happens up to WHNF.
Thanks,
Ömer
2018-03-20 18:27 GMT+03:00 Simon Marlow
Added comments: https://phabricator.haskell.org/D4517
On 20 March 2018 at 14:58, Simon Marlow
wrote: Hi Omer,
On 20 March 2018 at 13:05, Ömer Sinan Ağacan
wrote: Hi,
I've been looking at BLACKHOLE closures and how the indirectee field is used and I have a few questions:
Looking at evacuate for BLACKHOLE closures:
case BLACKHOLE: { StgClosure *r; const StgInfoTable *i; r = ((StgInd*)q)->indirectee; if (GET_CLOSURE_TAG(r) == 0) { i = r->header.info; if (IS_FORWARDING_PTR(i)) { r = (StgClosure *)UN_FORWARDING_PTR(i); i = r->header.info; } if (i == &stg_TSO_info || i == &stg_WHITEHOLE_info || i == &stg_BLOCKING_QUEUE_CLEAN_info || i == &stg_BLOCKING_QUEUE_DIRTY_info) { copy(p,info,q,sizeofW(StgInd),gen_no); return; } ASSERT(i != &stg_IND_info); } q = r; *p = r; goto loop; }
It seems like indirectee can be a TSO, WHITEHOLE, BLOCKING_QUEUE_CLEAN, BLOCKING_QUEUE_DIRTY, and it can't be IND. I'm wondering what does it mean for a BLACKHOLE to point to a
- TSO - WHITEHOLE - BLOCKING_QUEUE_CLEAN - BLOCKING_QUEUE_DIRTY
That sounds right to me.
Is this documented somewhere or otherwise could someone give a few pointers on where to look in the code?
Unfortunately I don't think we have good documentation for this, but you should look at the comments around messageBlackHole in Messages.c.
Secondly, I also looked at the BLACKHOLE entry code, and it seems like it has a different assumption about what can indirectee field point to:
INFO_TABLE(stg_BLACKHOLE,1,0,BLACKHOLE,"BLACKHOLE","BLACKHOLE") (P_ node) { W_ r, info, owner, bd; P_ p, bq, msg;
TICK_ENT_DYN_IND(); /* tick */
retry: p = StgInd_indirectee(node); if (GETTAG(p) != 0) { return (p); }
info = StgHeader_info(p); if (info == stg_IND_info) { // This could happen, if e.g. we got a BLOCKING_QUEUE that has // just been replaced with an IND by another thread in // wakeBlockingQueue(). goto retry; }
if (info == stg_TSO_info || info == stg_BLOCKING_QUEUE_CLEAN_info || info == stg_BLOCKING_QUEUE_DIRTY_info) { ("ptr" msg) = ccall allocate(MyCapability() "ptr",
BYTES_TO_WDS(SIZEOF_MessageBlackHole));
SET_HDR(msg, stg_MSG_BLACKHOLE_info, CCS_SYSTEM); MessageBlackHole_tso(msg) = CurrentTSO; MessageBlackHole_bh(msg) = node;
(r) = ccall messageBlackHole(MyCapability() "ptr", msg "ptr");
if (r == 0) { goto retry; } else { StgTSO_why_blocked(CurrentTSO) = BlockedOnBlackHole::I16; StgTSO_block_info(CurrentTSO) = msg; jump stg_block_blackhole(node); } } else { ENTER(p); } }
The difference is, when the tag of indirectee is 0, evacuate assumes that indirectee can't point to an IND, but BLACKHOLE entry code thinks it's possible and there's even a comment about why. (I don't understand the comment yet) I'm wondering if this code is correct, and why. Again any pointers would be appreciated.
Taking a quick look at the code, my guess is that: - a BLOCKING_QUEUE gets overwritten by an IND in wakeBlockingQueue() - but when this happens, the indirectee of the BLACKHOLE will also be overwritten to point to the value
At runtime a thread might see an intermediate state because these mutations are happening in another thread, so we might follow the indirectee and see the IND. But this state can't be observed by the GC, because all mutator threads have stopped at a safe point.
Cheers Simon
Thanks,
Ömer _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Hi Omer,
As per my understanding, a BLACKHOLE can point to a THUNK when an exception
is thrown. An exception walks up the stack and overwrites the blackholes
pointed to by the update frames as it walks with an stg_raise closure. That
way, if any concurrent thread happens to evaluate a thunk that was walked,
it'll evaluate the thunk which will blow up as well thereby throwing the
exception on the other thread(s) too.
Definition of stg_raise:
https://github.com/ghc/ghc/blob/ba5797937e575ce6119de6c07703e90dda2557e8/rts...
raiseExceptionHelper dealing with update frames:
https://github.com/ghc/ghc/blob/d9d463289fe20316cff12a8f0dbf414db678fa72/rts...
In general, yes, you can think that a BLACKHOLE will point to a non-THUNK
object assuming that everything went right.
Hope that helps,
Rahul
On Fri, Mar 23, 2018 at 5:48 PM, Ömer Sinan Ağacan
Thanks Simon, that's really helpful.
A few more questions:
As far as I understand the difference between
- BLACKHOLE pointing to a TSO - BLACKHOLE pointing to a BLOCKING_QUEUE
is that in the former we don't yet have any threads blocked by the BLACKHOLE whereas in the latter we have and the blocking queue holds all those blocked threads. Did I get this right?
Secondly, can a BLACKHOLE point to a THUNK? I'd expect no, because we BLACKHOLE a closure when we're done evaluating it (assuming no eager blackholing), and evaluation usually happens up to WHNF.
Thanks,
Ömer
Added comments: https://phabricator.haskell.org/D4517
On 20 March 2018 at 14:58, Simon Marlow
wrote: Hi Omer,
On 20 March 2018 at 13:05, Ömer Sinan Ağacan
wrote:
Hi,
I've been looking at BLACKHOLE closures and how the indirectee field is used and I have a few questions:
Looking at evacuate for BLACKHOLE closures:
case BLACKHOLE: { StgClosure *r; const StgInfoTable *i; r = ((StgInd*)q)->indirectee; if (GET_CLOSURE_TAG(r) == 0) { i = r->header.info; if (IS_FORWARDING_PTR(i)) { r = (StgClosure *)UN_FORWARDING_PTR(i); i = r->header.info; } if (i == &stg_TSO_info || i == &stg_WHITEHOLE_info || i == &stg_BLOCKING_QUEUE_CLEAN_info || i == &stg_BLOCKING_QUEUE_DIRTY_info) { copy(p,info,q,sizeofW(StgInd),gen_no); return; } ASSERT(i != &stg_IND_info); } q = r; *p = r; goto loop; }
It seems like indirectee can be a TSO, WHITEHOLE, BLOCKING_QUEUE_CLEAN, BLOCKING_QUEUE_DIRTY, and it can't be IND. I'm wondering what does it mean for a BLACKHOLE to point to a
- TSO - WHITEHOLE - BLOCKING_QUEUE_CLEAN - BLOCKING_QUEUE_DIRTY
That sounds right to me.
Is this documented somewhere or otherwise could someone give a few pointers on where to look in the code?
Unfortunately I don't think we have good documentation for this, but you should look at the comments around messageBlackHole in Messages.c.
Secondly, I also looked at the BLACKHOLE entry code, and it seems like
it
has a different assumption about what can indirectee field point to:
INFO_TABLE(stg_BLACKHOLE,1,0,BLACKHOLE,"BLACKHOLE","BLACKHOLE") (P_ node) { W_ r, info, owner, bd; P_ p, bq, msg;
TICK_ENT_DYN_IND(); /* tick */
retry: p = StgInd_indirectee(node); if (GETTAG(p) != 0) { return (p); }
info = StgHeader_info(p); if (info == stg_IND_info) { // This could happen, if e.g. we got a BLOCKING_QUEUE that has // just been replaced with an IND by another thread in // wakeBlockingQueue(). goto retry; }
if (info == stg_TSO_info || info == stg_BLOCKING_QUEUE_CLEAN_info || info == stg_BLOCKING_QUEUE_DIRTY_info) { ("ptr" msg) = ccall allocate(MyCapability() "ptr",
BYTES_TO_WDS(SIZEOF_MessageBlackHole));
SET_HDR(msg, stg_MSG_BLACKHOLE_info, CCS_SYSTEM); MessageBlackHole_tso(msg) = CurrentTSO; MessageBlackHole_bh(msg) = node;
(r) = ccall messageBlackHole(MyCapability() "ptr", msg "ptr");
if (r == 0) { goto retry; } else { StgTSO_why_blocked(CurrentTSO) = BlockedOnBlackHole::I16; StgTSO_block_info(CurrentTSO) = msg; jump stg_block_blackhole(node); } } else { ENTER(p); } }
The difference is, when the tag of indirectee is 0, evacuate assumes
2018-03-20 18:27 GMT+03:00 Simon Marlow
: that indirectee can't point to an IND, but BLACKHOLE entry code thinks it's possible and there's even a comment about why. (I don't understand the comment yet) I'm wondering if this code is correct, and why. Again any pointers would be appreciated.
Taking a quick look at the code, my guess is that: - a BLOCKING_QUEUE gets overwritten by an IND in wakeBlockingQueue() - but when this happens, the indirectee of the BLACKHOLE will also be overwritten to point to the value
At runtime a thread might see an intermediate state because these mutations are happening in another thread, so we might follow the indirectee and see the IND. But this state can't be observed by the GC, because all mutator threads have stopped at a safe point.
Cheers Simon
Thanks,
Ömer _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
-- Rahul Muttineni

Hi Rahul,
I'm still confused. The code that walks the stack and updates UPDATE_FRAMEs
only makes indirections point to the "raise" closure, not to thunks or anything
else (I also don't understand why this is needed but I guess that's another
topic). I still don't see how can a BLACKHOLE point to a THUNK.
Ömer
2018-03-23 18:51 GMT+03:00 Rahul Muttineni
Hi Omer,
As per my understanding, a BLACKHOLE can point to a THUNK when an exception is thrown. An exception walks up the stack and overwrites the blackholes pointed to by the update frames as it walks with an stg_raise closure. That way, if any concurrent thread happens to evaluate a thunk that was walked, it'll evaluate the thunk which will blow up as well thereby throwing the exception on the other thread(s) too.
Definition of stg_raise: https://github.com/ghc/ghc/blob/ba5797937e575ce6119de6c07703e90dda2557e8/rts...
raiseExceptionHelper dealing with update frames: https://github.com/ghc/ghc/blob/d9d463289fe20316cff12a8f0dbf414db678fa72/rts...
In general, yes, you can think that a BLACKHOLE will point to a non-THUNK object assuming that everything went right.
Hope that helps, Rahul
On Fri, Mar 23, 2018 at 5:48 PM, Ömer Sinan Ağacan
wrote: Thanks Simon, that's really helpful.
A few more questions:
As far as I understand the difference between
- BLACKHOLE pointing to a TSO - BLACKHOLE pointing to a BLOCKING_QUEUE
is that in the former we don't yet have any threads blocked by the BLACKHOLE whereas in the latter we have and the blocking queue holds all those blocked threads. Did I get this right?
Secondly, can a BLACKHOLE point to a THUNK? I'd expect no, because we BLACKHOLE a closure when we're done evaluating it (assuming no eager blackholing), and evaluation usually happens up to WHNF.
Thanks,
Ömer
2018-03-20 18:27 GMT+03:00 Simon Marlow
: Added comments: https://phabricator.haskell.org/D4517
On 20 March 2018 at 14:58, Simon Marlow
wrote: Hi Omer,
On 20 March 2018 at 13:05, Ömer Sinan Ağacan
wrote: Hi,
I've been looking at BLACKHOLE closures and how the indirectee field is used and I have a few questions:
Looking at evacuate for BLACKHOLE closures:
case BLACKHOLE: { StgClosure *r; const StgInfoTable *i; r = ((StgInd*)q)->indirectee; if (GET_CLOSURE_TAG(r) == 0) { i = r->header.info; if (IS_FORWARDING_PTR(i)) { r = (StgClosure *)UN_FORWARDING_PTR(i); i = r->header.info; } if (i == &stg_TSO_info || i == &stg_WHITEHOLE_info || i == &stg_BLOCKING_QUEUE_CLEAN_info || i == &stg_BLOCKING_QUEUE_DIRTY_info) { copy(p,info,q,sizeofW(StgInd),gen_no); return; } ASSERT(i != &stg_IND_info); } q = r; *p = r; goto loop; }
It seems like indirectee can be a TSO, WHITEHOLE, BLOCKING_QUEUE_CLEAN, BLOCKING_QUEUE_DIRTY, and it can't be IND. I'm wondering what does it mean for a BLACKHOLE to point to a
- TSO - WHITEHOLE - BLOCKING_QUEUE_CLEAN - BLOCKING_QUEUE_DIRTY
That sounds right to me.
Is this documented somewhere or otherwise could someone give a few pointers on where to look in the code?
Unfortunately I don't think we have good documentation for this, but you should look at the comments around messageBlackHole in Messages.c.
Secondly, I also looked at the BLACKHOLE entry code, and it seems like it has a different assumption about what can indirectee field point to:
INFO_TABLE(stg_BLACKHOLE,1,0,BLACKHOLE,"BLACKHOLE","BLACKHOLE") (P_ node) { W_ r, info, owner, bd; P_ p, bq, msg;
TICK_ENT_DYN_IND(); /* tick */
retry: p = StgInd_indirectee(node); if (GETTAG(p) != 0) { return (p); }
info = StgHeader_info(p); if (info == stg_IND_info) { // This could happen, if e.g. we got a BLOCKING_QUEUE that has // just been replaced with an IND by another thread in // wakeBlockingQueue(). goto retry; }
if (info == stg_TSO_info || info == stg_BLOCKING_QUEUE_CLEAN_info || info == stg_BLOCKING_QUEUE_DIRTY_info) { ("ptr" msg) = ccall allocate(MyCapability() "ptr",
BYTES_TO_WDS(SIZEOF_MessageBlackHole));
SET_HDR(msg, stg_MSG_BLACKHOLE_info, CCS_SYSTEM); MessageBlackHole_tso(msg) = CurrentTSO; MessageBlackHole_bh(msg) = node;
(r) = ccall messageBlackHole(MyCapability() "ptr", msg "ptr");
if (r == 0) { goto retry; } else { StgTSO_why_blocked(CurrentTSO) = BlockedOnBlackHole::I16; StgTSO_block_info(CurrentTSO) = msg; jump stg_block_blackhole(node); } } else { ENTER(p); } }
The difference is, when the tag of indirectee is 0, evacuate assumes that indirectee can't point to an IND, but BLACKHOLE entry code thinks it's possible and there's even a comment about why. (I don't understand the comment yet) I'm wondering if this code is correct, and why. Again any pointers would be appreciated.
Taking a quick look at the code, my guess is that: - a BLOCKING_QUEUE gets overwritten by an IND in wakeBlockingQueue() - but when this happens, the indirectee of the BLACKHOLE will also be overwritten to point to the value
At runtime a thread might see an intermediate state because these mutations are happening in another thread, so we might follow the indirectee and see the IND. But this state can't be observed by the GC, because all mutator threads have stopped at a safe point.
Cheers Simon
Thanks,
Ömer _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
-- Rahul Muttineni

I think I can at least answer the why: we're talking about threads
referring to suspended computations within a thread whose stack is being
"unwound". Those computations won't be resumable after the unwind (which
makes their context go away). So they have to be overwritten with something
to cause the referencing threads to abort if they need the
no-longer-computable results of those suspended computations.
On Sat, Mar 24, 2018 at 3:27 PM, Ömer Sinan Ağacan
Hi Rahul,
I'm still confused. The code that walks the stack and updates UPDATE_FRAMEs only makes indirections point to the "raise" closure, not to thunks or anything else (I also don't understand why this is needed but I guess that's another topic). I still don't see how can a BLACKHOLE point to a THUNK.
Ömer
Hi Omer,
As per my understanding, a BLACKHOLE can point to a THUNK when an exception is thrown. An exception walks up the stack and overwrites the blackholes pointed to by the update frames as it walks with an stg_raise closure. That way, if any concurrent thread happens to evaluate a thunk that was walked, it'll evaluate the thunk which will blow up as well thereby throwing the exception on the other thread(s) too.
Definition of stg_raise: https://github.com/ghc/ghc/blob/ba5797937e575ce6119de6c07703e9 0dda2557e8/rts/Exception.cmm#L424-L427
raiseExceptionHelper dealing with update frames: https://github.com/ghc/ghc/blob/d9d463289fe20316cff12a8f0dbf41 4db678fa72/rts/Schedule.c#L2864-L2875
In general, yes, you can think that a BLACKHOLE will point to a non-THUNK object assuming that everything went right.
Hope that helps, Rahul
On Fri, Mar 23, 2018 at 5:48 PM, Ömer Sinan Ağacan
wrote:
Thanks Simon, that's really helpful.
A few more questions:
As far as I understand the difference between
- BLACKHOLE pointing to a TSO - BLACKHOLE pointing to a BLOCKING_QUEUE
is that in the former we don't yet have any threads blocked by the BLACKHOLE whereas in the latter we have and the blocking queue holds all those blocked threads. Did I get this right?
Secondly, can a BLACKHOLE point to a THUNK? I'd expect no, because we BLACKHOLE a closure when we're done evaluating it (assuming no eager blackholing), and evaluation usually happens up to WHNF.
Thanks,
Ömer
2018-03-20 18:27 GMT+03:00 Simon Marlow
: Added comments: https://phabricator.haskell.org/D4517
On 20 March 2018 at 14:58, Simon Marlow
wrote: Hi Omer,
On 20 March 2018 at 13:05, Ömer Sinan Ağacan
wrote: Hi,
I've been looking at BLACKHOLE closures and how the indirectee field is used and I have a few questions:
Looking at evacuate for BLACKHOLE closures:
case BLACKHOLE: { StgClosure *r; const StgInfoTable *i; r = ((StgInd*)q)->indirectee; if (GET_CLOSURE_TAG(r) == 0) { i = r->header.info; if (IS_FORWARDING_PTR(i)) { r = (StgClosure *)UN_FORWARDING_PTR(i); i = r->header.info; } if (i == &stg_TSO_info || i == &stg_WHITEHOLE_info || i == &stg_BLOCKING_QUEUE_CLEAN_info || i == &stg_BLOCKING_QUEUE_DIRTY_info) { copy(p,info,q,sizeofW(StgInd),gen_no); return; } ASSERT(i != &stg_IND_info); } q = r; *p = r; goto loop; }
It seems like indirectee can be a TSO, WHITEHOLE, BLOCKING_QUEUE_CLEAN, BLOCKING_QUEUE_DIRTY, and it can't be IND. I'm wondering what does
it
mean for a BLACKHOLE to point to a
- TSO - WHITEHOLE - BLOCKING_QUEUE_CLEAN - BLOCKING_QUEUE_DIRTY
That sounds right to me.
Is this documented somewhere or otherwise could someone give a few pointers on where to look in the code?
Unfortunately I don't think we have good documentation for this, but you should look at the comments around messageBlackHole in Messages.c.
Secondly, I also looked at the BLACKHOLE entry code, and it seems
it has a different assumption about what can indirectee field point to:
INFO_TABLE(stg_BLACKHOLE,1,0,BLACKHOLE,"BLACKHOLE","BLACKHOLE") (P_ node) { W_ r, info, owner, bd; P_ p, bq, msg;
TICK_ENT_DYN_IND(); /* tick */
retry: p = StgInd_indirectee(node); if (GETTAG(p) != 0) { return (p); }
info = StgHeader_info(p); if (info == stg_IND_info) { // This could happen, if e.g. we got a BLOCKING_QUEUE
2018-03-23 18:51 GMT+03:00 Rahul Muttineni
: like that has // just been replaced with an IND by another thread in // wakeBlockingQueue(). goto retry; }
if (info == stg_TSO_info || info == stg_BLOCKING_QUEUE_CLEAN_info || info == stg_BLOCKING_QUEUE_DIRTY_info) { ("ptr" msg) = ccall allocate(MyCapability() "ptr",
BYTES_TO_WDS(SIZEOF_MessageBlackHole));
SET_HDR(msg, stg_MSG_BLACKHOLE_info, CCS_SYSTEM); MessageBlackHole_tso(msg) = CurrentTSO; MessageBlackHole_bh(msg) = node;
(r) = ccall messageBlackHole(MyCapability() "ptr", msg "ptr");
if (r == 0) { goto retry; } else { StgTSO_why_blocked(CurrentTSO) = BlockedOnBlackHole::I16; StgTSO_block_info(CurrentTSO) = msg; jump stg_block_blackhole(node); } } else { ENTER(p); } }
The difference is, when the tag of indirectee is 0, evacuate assumes that indirectee can't point to an IND, but BLACKHOLE entry code thinks it's possible and there's even a comment about why. (I don't understand the comment yet) I'm wondering if this code is correct, and why. Again any pointers would be appreciated.
Taking a quick look at the code, my guess is that: - a BLOCKING_QUEUE gets overwritten by an IND in wakeBlockingQueue() - but when this happens, the indirectee of the BLACKHOLE will also be overwritten to point to the value
At runtime a thread might see an intermediate state because these mutations are happening in another thread, so we might follow the indirectee and see the IND. But this state can't be observed by the GC, because all mutator threads have stopped at a safe point.
Cheers Simon
Thanks,
Ömer _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
-- Rahul Muttineni
ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
-- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net

The raise closure is declared to be a THUNK:
https://phabricator.haskell.org/diffusion/GHC/browse/master/rts/Exception.cm...
Another example of this is when an asynchronous exception is thrown, and we
update all the thunks/BLACKHOLEs pointed to by the update frames to point
to new thunks (actually AP_STACK closures) representing the frozen state of
evaluation of those thunks. For this, see rts/RaiseAsync.c.
Cheers
Simon
On 24 March 2018 at 19:27, Ömer Sinan Ağacan
Hi Rahul,
I'm still confused. The code that walks the stack and updates UPDATE_FRAMEs only makes indirections point to the "raise" closure, not to thunks or anything else (I also don't understand why this is needed but I guess that's another topic). I still don't see how can a BLACKHOLE point to a THUNK.
Ömer
Hi Omer,
As per my understanding, a BLACKHOLE can point to a THUNK when an exception is thrown. An exception walks up the stack and overwrites the blackholes pointed to by the update frames as it walks with an stg_raise closure. That way, if any concurrent thread happens to evaluate a thunk that was walked, it'll evaluate the thunk which will blow up as well thereby throwing the exception on the other thread(s) too.
Definition of stg_raise: https://github.com/ghc/ghc/blob/ba5797937e575ce6119de6c07703e9 0dda2557e8/rts/Exception.cmm#L424-L427
raiseExceptionHelper dealing with update frames: https://github.com/ghc/ghc/blob/d9d463289fe20316cff12a8f0dbf41 4db678fa72/rts/Schedule.c#L2864-L2875
In general, yes, you can think that a BLACKHOLE will point to a non-THUNK object assuming that everything went right.
Hope that helps, Rahul
On Fri, Mar 23, 2018 at 5:48 PM, Ömer Sinan Ağacan
wrote:
Thanks Simon, that's really helpful.
A few more questions:
As far as I understand the difference between
- BLACKHOLE pointing to a TSO - BLACKHOLE pointing to a BLOCKING_QUEUE
is that in the former we don't yet have any threads blocked by the BLACKHOLE whereas in the latter we have and the blocking queue holds all those blocked threads. Did I get this right?
Secondly, can a BLACKHOLE point to a THUNK? I'd expect no, because we BLACKHOLE a closure when we're done evaluating it (assuming no eager blackholing), and evaluation usually happens up to WHNF.
Thanks,
Ömer
2018-03-20 18:27 GMT+03:00 Simon Marlow
: Added comments: https://phabricator.haskell.org/D4517
On 20 March 2018 at 14:58, Simon Marlow
wrote: Hi Omer,
On 20 March 2018 at 13:05, Ömer Sinan Ağacan
wrote: Hi,
I've been looking at BLACKHOLE closures and how the indirectee field is used and I have a few questions:
Looking at evacuate for BLACKHOLE closures:
case BLACKHOLE: { StgClosure *r; const StgInfoTable *i; r = ((StgInd*)q)->indirectee; if (GET_CLOSURE_TAG(r) == 0) { i = r->header.info; if (IS_FORWARDING_PTR(i)) { r = (StgClosure *)UN_FORWARDING_PTR(i); i = r->header.info; } if (i == &stg_TSO_info || i == &stg_WHITEHOLE_info || i == &stg_BLOCKING_QUEUE_CLEAN_info || i == &stg_BLOCKING_QUEUE_DIRTY_info) { copy(p,info,q,sizeofW(StgInd),gen_no); return; } ASSERT(i != &stg_IND_info); } q = r; *p = r; goto loop; }
It seems like indirectee can be a TSO, WHITEHOLE, BLOCKING_QUEUE_CLEAN, BLOCKING_QUEUE_DIRTY, and it can't be IND. I'm wondering what does
it
mean for a BLACKHOLE to point to a
- TSO - WHITEHOLE - BLOCKING_QUEUE_CLEAN - BLOCKING_QUEUE_DIRTY
That sounds right to me.
Is this documented somewhere or otherwise could someone give a few pointers on where to look in the code?
Unfortunately I don't think we have good documentation for this, but you should look at the comments around messageBlackHole in Messages.c.
Secondly, I also looked at the BLACKHOLE entry code, and it seems
it has a different assumption about what can indirectee field point to:
INFO_TABLE(stg_BLACKHOLE,1,0,BLACKHOLE,"BLACKHOLE","BLACKHOLE") (P_ node) { W_ r, info, owner, bd; P_ p, bq, msg;
TICK_ENT_DYN_IND(); /* tick */
retry: p = StgInd_indirectee(node); if (GETTAG(p) != 0) { return (p); }
info = StgHeader_info(p); if (info == stg_IND_info) { // This could happen, if e.g. we got a BLOCKING_QUEUE
2018-03-23 18:51 GMT+03:00 Rahul Muttineni
: like that has // just been replaced with an IND by another thread in // wakeBlockingQueue(). goto retry; }
if (info == stg_TSO_info || info == stg_BLOCKING_QUEUE_CLEAN_info || info == stg_BLOCKING_QUEUE_DIRTY_info) { ("ptr" msg) = ccall allocate(MyCapability() "ptr",
BYTES_TO_WDS(SIZEOF_MessageBlackHole));
SET_HDR(msg, stg_MSG_BLACKHOLE_info, CCS_SYSTEM); MessageBlackHole_tso(msg) = CurrentTSO; MessageBlackHole_bh(msg) = node;
(r) = ccall messageBlackHole(MyCapability() "ptr", msg "ptr");
if (r == 0) { goto retry; } else { StgTSO_why_blocked(CurrentTSO) = BlockedOnBlackHole::I16; StgTSO_block_info(CurrentTSO) = msg; jump stg_block_blackhole(node); } } else { ENTER(p); } }
The difference is, when the tag of indirectee is 0, evacuate assumes that indirectee can't point to an IND, but BLACKHOLE entry code thinks it's possible and there's even a comment about why. (I don't understand the comment yet) I'm wondering if this code is correct, and why. Again any pointers would be appreciated.
Taking a quick look at the code, my guess is that: - a BLOCKING_QUEUE gets overwritten by an IND in wakeBlockingQueue() - but when this happens, the indirectee of the BLACKHOLE will also be overwritten to point to the value
At runtime a thread might see an intermediate state because these mutations are happening in another thread, so we might follow the indirectee and see the IND. But this state can't be observed by the GC, because all mutator threads have stopped at a safe point.
Cheers Simon
Thanks,
Ömer _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
-- Rahul Muttineni

Simon Marlow
The raise closure is declared to be a THUNK:
https://phabricator.haskell.org/diffusion/GHC/browse/master/rts/Exception.cm...
Another example of this is when an asynchronous exception is thrown, and we update all the thunks/BLACKHOLEs pointed to by the update frames to point to new thunks (actually AP_STACK closures) representing the frozen state of evaluation of those thunks. For this, see rts/RaiseAsync.c.
This thread has answered a number of interesting questions. It would be a shame if these answers vanished into the abyss of the ghc-devs archives. Omer, do you think you could make sure that the discussion here is summarized in a Note (or ensure that the relevant notes reference one another, if they already exist)? Cheers, - Ben

I still don't understand the whole story with blackholes but I'll
update the comments around the BLACKHOLE stack frame and/or wiki pages
once I get a better understanding.
Ömer
2018-03-26 21:47 GMT+03:00 Ben Gamari
Simon Marlow
writes: The raise closure is declared to be a THUNK:
https://phabricator.haskell.org/diffusion/GHC/browse/master/rts/Exception.cm...
Another example of this is when an asynchronous exception is thrown, and we update all the thunks/BLACKHOLEs pointed to by the update frames to point to new thunks (actually AP_STACK closures) representing the frozen state of evaluation of those thunks. For this, see rts/RaiseAsync.c.
This thread has answered a number of interesting questions. It would be a shame if these answers vanished into the abyss of the ghc-devs archives.
Omer, do you think you could make sure that the discussion here is summarized in a Note (or ensure that the relevant notes reference one another, if they already exist)?
Cheers,
- Ben
participants (5)
-
Ben Gamari
-
Brandon Allbery
-
Rahul Muttineni
-
Simon Marlow
-
Ömer Sinan Ağacan