
#15136: High CPU when asynchronous exception and unblocking retry on TVar raced -------------------------------------+------------------------------------- Reporter: nshimaza | Owner: (none) Type: bug | Status: new Priority: highest | Milestone: 8.6.1 Component: Runtime System | Version: 8.4.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by carlostome): After some investigation I found that indeed there is a deadlock between the mutexes of a TSO and a TVar. An excerpt of the debugging log is the follwing: {{{ Task 2: cap 1: running thread 7012 (ThreadRunGHC) Task 2: STM: 0xad6d08 : stmStartTransaction with 316 tokens Task 2: STM: 0xad6d08 : stmStartTransaction()=0x4200224000 Task 2: STM: 0x4200224000 : stmWriteTVar(0x4200225c68, 0xad5e72) Task 2: STM: 0x4200224000 : get_entry_for TVar 0x4200225c68 Task 2: STM: 0x4200224000 : FOR_EACH_ENTRY, current_chunk=0x420027e4d0 limit=0 Task 2: STM: 0x4200224000 : read_current_value(0x4200225c68)=0xad5e69 Task 2: STM: 0x4200224000 : stmWriteTVar done Task 2: STM: 0x4200224000 : stmCommitTransaction() Task 2: STM: 0x4200224000 : lock_stm() Task 2: STM: 0x4200224000 : FOR_EACH_ENTRY, current_chunk=0x420027e4d0 limit=1 Task 2: STM: 0x4200224000 : trying to acquire 0x4200225c68 Task 2: STM: 0x4200224000 : cond_lock_tvar(0x4200225c68, 0xad5e69) -- (1) Task 2: STM: 0x4200224000 : success Task 2: STM: 0x4200224000 : doing read check Task 2: STM: 0x4200224000 : FOR_EACH_ENTRY, current_chunk=0x420027e4d0 limit=1 Task 2: STM: 0x4200224000 : read-check succeeded Task 2: STM: 0x4200224000 : FOR_EACH_ENTRY, current_chunk=0x420027e4d0 limit=1 Task 2: STM: 0x4200224000 : writing 0xad5e72 to 0x4200225c68, waking waiters -- (2) Task 2: STM: unpark_waiters_on tvar=0x4200225c68 Task 2: STM: unpark_waiters_on tvar=0x4200225c68, status=0x4200223b80 Task 2: STM: cap 1: unpark_waiters_on tvar=0x4200225c68, status2=0x4200223b80 -- (3) Task 2: STM: cap 1: unpark_tso id 6866 -- (4) Task 1: BlockedOnSTM: lockTSO id 6866 Task 1: cap 0 BlockedOnSTM: raiseAsync id 6866 Task 1: cap 0: raising exception in thread 6866. Task 1: cap 0: raiseAsync: freezing atomically frame -- (5) Task 1: STM: 0x4200225f58 : stmAbortTransaction -- (6) Task 1: STM: 0x4200225f58 : lock_stm() Task 1: STM: 0x4200225f58 : aborting top-level transaction cap 0 Task 1: STM: 0x4200225f58 : aborting top-level transaction Task 1: STM: 0x4200225f58 : stmAbortTransaction aborting waiting transaction Task 1: STM: 0x4200225f58 : remove_watch_queue_entries_for_trec() -- (7) Task 1: STM: 0x4200225f58 : FOR_EACH_ENTRY, current_chunk=0x4200223cc8 limit=1 Task 1: STM: 0x4200225f58 : lock_tvar (0x4200225c68) -- (8) }}} What is happening here has to do with how async exceptions are implemented. The thread 6866 is waiting on the TVar 0x4200225c68. In the meantime the thread 7012 is able to change the value of such TVar to True. To do so, first it has to acquire the lock for the TVar (1), then actually change its value (2), and finally wake any thread that was waiting on this TVar (3). To wake up a concrete waiter it needs to hold its mutex through lock_tso() (4). The problem is that when an async exception is thrown to a thread whose status is BlockedOnSTM, the first thing the thread throwing the exception does is to grab the mutex of the target thread with lock_tso() (https://github.com/ghc/ghc/blob/93220d46fceabf3afeae36f1fda94e1698c3639a/rts...), then because the target thread was in the middle of an STM transaction (5) that dependes on the same TVar, it has to abort it (6) for what it has to be removed from the TVar watch queue (7) needing the mutex on the TVar to do so (8). Basically the thread completing the STM grabs first the TVar lock and then the TSO lock while for the async exception the locks are taken in reverse order. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15136#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler