
#15136: High CPU when asynchronous exception and unblocking retry on TVar raced -------------------------------------+------------------------------------- Reporter: nshimaza | Owner: osa1 Type: bug | Status: new Priority: highest | Milestone: 8.6.1 Component: Runtime System | Version: 8.4.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by osa1): Just to repeat comment:1 and comment:2 with my words: Thread 1 kills Thread 2 which is blocked on a TVar operation. For this it calls raiseAsync() and for that it has to lock Thread 2 (lockTSO). Then to abort the transaction it needs to lock the TVar (lock_tvar). At the same time Thread 3 succeeds to modify the TVar and to unblock any threads blocked on this TVar it needs to lock the TVar (lock_tvar), and then to actually unblock the thread it needs to lock the TSO (lockTSO). When the order of locking goes like this: - Thread 1 locks the TSO (lockTSO) - Thread 3 locks the TVar (lock_tvar) We get a deadlock because Thread 1 now wants to lock the TVar and Thread 3 wants to lock the TSO, both of which are locked already.
Perhaps we should switch to using an owner semantics for BlockedOnSTM too - that is, if we see BlockedOnSTM in raiseAsync, we attempt to lock the TVar pointed to by tso->block_info.
I only get `END_TSO_QUEUE` in `tso->block_info`. I think the TVar is only reachable from the array list `tso->trec->current_chunk`. I guess we could do this: - Lock the TSO - If BlockedOnSTM then check tso->trec entries. Expect to see only one TVar there (can we have more than on TVars here?). Lock the TVar and release the TSO. - Continue with raiseAsync() I don't know if we can see more than one TVar in tso->trec entries. Also, we need to modify stmAbortTransaction because we'll have the TVar locked already, but it still needs to lock it when it's called from other call sites (e.g. from `raise#`). -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15136#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler