
#15544: Non-deterministic segmentation fault in cryptohash-sha256 testsuite -------------------------------------+------------------------------------- Reporter: bgamari | Owner: (none) Type: bug | Status: new Priority: highest | Milestone: 8.6.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by osa1):
@osa1 what makes you suspect the STM fix?
I'm debugging the assertion failure in comment:12 which looked serious enough to me (a TSO list is getting corrupted). I realized that the list that's being corrupted is a run queue, and the reason it's being corrupted is because in `stmCommitTransaction` we unpark a thread that is already in a run queue. So at some point the thread is in two lists (in both a run queue and a TRec's wait queue). This is the point where we corrupt the list: {{{ We're unpark_tso()'ing a thread that is already in a run queue. 352 if (tso->block_info.closure != &stg_STM_AWOKEN_closure) { 353 // safe to do a non-atomic test-and-set here, because it's 354 // fine if we do multiple tryWakeupThread()s. 355 tso->block_info.closure = &stg_STM_AWOKEN_closure; 356 tryWakeupThread(cap,tso); 357 } Old value = (StgTSO *) 0x104df58 New value = (StgTSO *) 0x42001d9000 0x0000000000dcb2b3 in unpark_tso (cap=0x104f6c0 <MainCapability>, tso=0x42001d9078) at rts/STM.c:355 355 tso->block_info.closure = &stg_STM_AWOKEN_closure;
bt #0 0x0000000000dcb2b3 in unpark_tso (cap=0x104f6c0 <MainCapability>, tso=0x42001d9078) at rts/STM.c:355 #1 0x0000000000dcb35c in unpark_waiters_on (cap=0x104f6c0 <MainCapability>, s=0x42001c2070) at rts/STM.c:374 #2 0x0000000000dcd2d2 in stmCommitTransaction (cap=0x104f6c0 <MainCapability>, trec=0x4200037c50) at rts/STM.c:1092 #3 0x0000000000dee080 in stg_atomically_frame_info () #4 0x0000000000000000 in ?? () }}}
(note that this is reverse execution so "Old value" is actually the new value) The thread is already in a run queue: {{{
print tso $23 = (StgTSO *) 0x42001d9078
print MainCapability->run_queue_hd->_link->_link $25 = (struct StgTSO_ *) 0x42001d9078 }}}
At this point the TSO link is fine: {{{
print MainCapability->run_queue_hd->_link->_link->block_info.prev == MainCapability->run_queue_hd->_link $29 = 1 }}}
Because the STM fix changed `unpark_tso()` I thought it may be related. I don't yet know how this thread ends up in two lists, I'll investigate further. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15544#comment:17 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler