[GHC] #8209: Race condition in setNumCapabilities

#8209: Race condition in setNumCapabilities ----------------------------------+------------------------------------- Reporter: akio | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 7.7 Keywords: | Operating System: Unknown/Multiple Architecture: x86_64 (amd64) | Type of failure: Runtime crash Difficulty: Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | ----------------------------------+------------------------------------- In HEAD, the following program sometimes deadlocks (about 1/10 of the time). {{{ import Control.Concurrent import Control.Monad import GHC.Conc main = do mainTid <- myThreadId labelThread mainTid "main" forM_ [0..0] $ \i -> forkIO $ do subTid <- myThreadId labelThread subTid $ "sub " ++ show i forM_ [0..100000000] $ \j -> putStrLn $ "sub " ++ show i ++ ": " ++ show j yield setNumCapabilities 2 }}} The problem seems to be that there is a race condition between setNumCapabilites and a Task returning from a foreign call. Specifically, a sequence of events like the following can happen: 1. Task 0 makes a foreign call. 2. Task 1 calls setNumCapabilities() 3. The call on task 0 returns; it reads task->cap from the memory. 4. Task 1 moves the Capabilities, invalidating all pointers to them. 5. Task 0 takes over the invalidated Capability. The attached patch adds an ASSERT to the RTS to demonstrate the problem (it does not fix the problem). -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8209 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8209: Race condition in setNumCapabilities -------------------------------------+---------------------------------- Reporter: akio | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: x86_64 (amd64) Type of failure: Runtime crash | Difficulty: Unknown Test Case: | Blocked By: Blocking: | Related Tickets: -------------------------------------+---------------------------------- Comment (by parcs): Perhaps related, I am getting a rare deadlock with this test case: === Test.hs {{{ import GHC.Conc main :: IO () main = do setNumCapabilities 2 setNumCapabilities 3 }}} === Command Line {{{ $ ghc-stage2 -threaded Test.hs [1 of 1] Compiling Main ( Test.hs, Test.o ) Linking Test ... $ while true; do ./B; echo -n .; done .............................................<eventually stops> }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8209#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8209: Race condition in setNumCapabilities -------------------------------------+---------------------------------- Reporter: akio | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: x86_64 (amd64) Type of failure: Runtime crash | Difficulty: Unknown Test Case: | Blocked By: Blocking: | Related Tickets: -------------------------------------+---------------------------------- Comment (by ezyang): I haven't run the program, but are you sure (5) happens? It is fine for task 0 to hit your ASSERT even when setNumCapabilities is running, because it still needs to check whether or not it's acceptable to run, and that won't happen until setNumCapabilities. What it seems to me is happening is that waitForReturnCapability is improperly pre-committing to the capability it wants to return to, whereas it should always retrieve a fresh candidate capability each time around the wait loop. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8209#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8209: Race condition in setNumCapabilities -------------------------------------+---------------------------------- Reporter: akio | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: x86_64 (amd64) Type of failure: Runtime crash | Difficulty: Unknown Test Case: | Blocked By: Blocking: | Related Tickets: -------------------------------------+---------------------------------- Comment (by akio): Replying to [comment:2 ezyang]:
I haven't run the program, but are you sure (5) happens? On second thought I'm not so sure. I'll look at this again later today.
-- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8209#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8209: Race condition in setNumCapabilities -------------------------------------+---------------------------------- Reporter: akio | Owner: simonmar Type: bug | Status: new Priority: highest | Milestone: Component: Runtime System | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: x86_64 (amd64) Type of failure: Runtime crash | Difficulty: Unknown Test Case: | Blocked By: Blocking: | Related Tickets: -------------------------------------+---------------------------------- Changes (by simonmar): * owner: => simonmar * priority: normal => highest Comment: Thanks for the report. Treating as a blocker for 7.8. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8209#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8209: Race condition in setNumCapabilities -------------------------------------+---------------------------------- Reporter: akio | Owner: simonmar Type: bug | Status: new Priority: highest | Milestone: 7.8.1 Component: Runtime System | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: x86_64 (amd64) Type of failure: Runtime crash | Difficulty: Unknown Test Case: | Blocked By: Blocking: | Related Tickets: -------------------------------------+---------------------------------- Changes (by simonmar): * milestone: => 7.8.1 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8209#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8209: Race condition in setNumCapabilities -------------------------------------+---------------------------------- Reporter: akio | Owner: simonmar Type: bug | Status: new Priority: highest | Milestone: 7.8.1 Component: Runtime System | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: x86_64 (amd64) Type of failure: Runtime crash | Difficulty: Unknown Test Case: | Blocked By: Blocking: | Related Tickets: -------------------------------------+---------------------------------- Comment (by akio): I found another error (reproducible with the same code). {{{moreCapabilities}}} calls {{{memcpy}}} to copy {{{Capability}}}s. This means the {{{lock}}} field is also copied. If the {{{lock}}} field of the old {{{Capability}}} is held by another thread, the new copy of {{{lock}}} is also in the held state. However it will never be released, because the actual lock held is the old {{{lock}}}, not the new one. The task running {{{setNumCapabilities()}}} then deadlocks trying to acquire this lock. (By the way copying a mutex seems to be an undefined behavior according to the pthread spec) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8209#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

I haven't run the program, but are you sure (5) happens? It is fine for task 0 to hit your ASSERT even when setNumCapabilities is running, because it still needs to check whether or not it's acceptable to run, and that won't happen until setNumCapabilities. What it seems to me is happening is
#8209: Race condition in setNumCapabilities -------------------------------------+---------------------------------- Reporter: akio | Owner: simonmar Type: bug | Status: new Priority: highest | Milestone: 7.8.1 Component: Runtime System | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: x86_64 (amd64) Type of failure: Runtime crash | Difficulty: Unknown Test Case: | Blocked By: Blocking: | Related Tickets: -------------------------------------+---------------------------------- Comment (by akio): Replying to [comment:2 ezyang]: that waitForReturnCapability is improperly pre-committing to the capability it wants to return to, whereas it should always retrieve a fresh candidate capability each time around the wait loop. Yes, it seems that that is what's happening in practice. However, isn't there still a possibility that e.g. task 0 tries to acquire {{{cap->lock}}} in {{{waitForReturnCapability}}}, after {{{cap}}} is {{{stgFree}}}d in {{{setNumCapabilities}}}? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8209#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8209: Race condition in setNumCapabilities -------------------------------------+---------------------------------- Reporter: akio | Owner: simonmar Type: bug | Status: new Priority: highest | Milestone: 7.8.1 Component: Runtime System | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: x86_64 (amd64) Type of failure: Runtime crash | Difficulty: Unknown Test Case: | Blocked By: Blocking: | Related Tickets: -------------------------------------+---------------------------------- Comment (by ezyang): Replying to [comment:7 akio]:
However, isn't there still a possibility that e.g. task 0 tries to acquire {{{cap->lock}}} in {{{waitForReturnCapability}}}, after {{{cap}}} is {{{stgFree}}}d in {{{setNumCapabilities}}}?
Certainly that would be wrong, and a fix has to deal with it. (Note that while the memcpy problem I would expect to cause a deadlock, I would not expect this to cause a deadlock.) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8209#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8209: Race condition in setNumCapabilities
-------------------------------------+----------------------------------
Reporter: akio | Owner: simonmar
Type: bug | Status: new
Priority: highest | Milestone: 7.8.1
Component: Runtime System | Version: 7.7
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture: x86_64 (amd64)
Type of failure: Runtime crash | Difficulty: Unknown
Test Case: | Blocked By:
Blocking: | Related Tickets:
-------------------------------------+----------------------------------
Comment (by Simon Marlow

#8209: Race condition in setNumCapabilities
-------------------------------------+----------------------------------
Reporter: akio | Owner: simonmar
Type: bug | Status: new
Priority: highest | Milestone: 7.8.1
Component: Runtime System | Version: 7.7
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture: x86_64 (amd64)
Type of failure: Runtime crash | Difficulty: Unknown
Test Case: | Blocked By:
Blocking: | Related Tickets:
-------------------------------------+----------------------------------
Comment (by Simon Marlow

#8209: Race condition in setNumCapabilities -------------------------------------+---------------------------------- Reporter: akio | Owner: simonmar Type: bug | Status: new Priority: highest | Milestone: 7.8.1 Component: Runtime System | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: x86_64 (amd64) Type of failure: Runtime crash | Difficulty: Unknown Test Case: | Blocked By: Blocking: | Related Tickets: -------------------------------------+---------------------------------- Changes (by AndreasVoellmy): * cc: andreas.voellmy@… (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8209#comment:11 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8209: Race condition in setNumCapabilities -------------------------------------+---------------------------------- Reporter: akio | Owner: simonmar Type: bug | Status: new Priority: highest | Milestone: 7.8.1 Component: Runtime System | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: x86_64 (amd64) Type of failure: Runtime crash | Difficulty: Unknown Test Case: | Blocked By: Blocking: | Related Tickets: -------------------------------------+---------------------------------- Comment (by AndreasVoellmy): @Simon: did your patch resolve this issue? If so, should it be marked fixed? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8209#comment:12 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8209: Race condition in setNumCapabilities -------------------------------------+---------------------------------- Reporter: akio | Owner: simonmar Type: bug | Status: closed Priority: highest | Milestone: 7.8.1 Component: Runtime System | Version: 7.7 Resolution: fixed | Keywords: Operating System: Unknown/Multiple | Architecture: x86_64 (amd64) Type of failure: Runtime crash | Difficulty: Unknown Test Case: | Blocked By: Blocking: | Related Tickets: -------------------------------------+---------------------------------- Changes (by simonmar): * status: new => closed * resolution: => fixed Comment: oops, thanks for the reminder. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8209#comment:13 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC