
#15427: Calling hs_try_putmvar from an unsafe foreign call can cause the RTS to hang -------------------------------------+------------------------------------- Reporter: syntheorem | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Runtime | Version: 8.4.3 System | Keywords: | Operating System: Unknown/Multiple Architecture: | Type of failure: Runtime crash Unknown/Multiple | Test Case: | Blocked By: Blocking: | Related Tickets: Differential Rev(s): | Wiki Page: -------------------------------------+------------------------------------- An unsafe foreign call which calls `hs_try_putmvar` can cause the RTS to hang, preventing any Haskell threads from making progress. However, compiling with `-debug` causes it instead to fail an assertion in the scheduler: {{{ internal error: ASSERTION FAILED: file rts/Schedule.c, line 510 (GHC version 8.4.3 for x86_64_apple_darwin) }}} Here is a minimal test case which reproduces the assertion. It needs to be built with `-debug -threaded` and run with `+RTS -N2` or higher. {{{#!hs import Control.Concurrent (forkIO, threadDelay) import Control.Concurrent.MVar (MVar, newEmptyMVar, takeMVar) import Control.Monad (forever) import Foreign.C.Types (CInt(..)) import Foreign.StablePtr (StablePtr) import GHC.Conc (PrimMVar, newStablePtrPrimMVar) foreign import ccall unsafe hs_try_putmvar :: CInt -> StablePtr PrimMVar -> IO () main = do mvar <- newEmptyMVar forkIO $ forever $ do takeMVar mvar forkIO $ forever $ do sp <- newStablePtrPrimMVar mvar hs_try_putmvar (-1) sp threadDelay 1 -- Let it spin a few times to trigger the bug threadDelay 500 }}} I actually checked out GHC and added this as a test case and did some debugging. The specific assertion that fails is `ASSERT(task->cap == cap)`. This seems to happen because of this code in `hs_try_putmvar`: {{{#!c Task *task = getTask(); // ... ACQUIRE_LOCK(&cap->lock); // If the capability is free, we can perform the tryPutMVar immediately if (cap->running_task == NULL) { cap->running_task = task; task->cap = cap; RELEASE_LOCK(&cap->lock); // ... releaseCapability(cap); } else { // ... } }}} Basically it assumes that the current thread's task isn't currently running a capability, so it takes a new one and then releases it without restoring the previous value of `task->cap`. Modifying the code to restore the value of `task->cap` after releasing the capability fixes the assertion. But I don't know enough about the RTS to be sure I'm not missing something here. In particular, is there a problem with the task basically holding two capabilities for a short time? My other thought is that maybe it should check if its task is currently running a capability, and in that case do something else. But I'm not sure what. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15427 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler