[GHC] #10629: threadWaitRead throws BlockedIndefinitelyOnMVar

#10629: threadWaitRead throws BlockedIndefinitelyOnMVar -------------------------------------+------------------------------------- Reporter: | Owner: simonmar facundo.dominguez | Status: new Type: bug | Milestone: Priority: normal | Version: 7.10.1 Component: Runtime | Operating System: Linux System | Type of failure: Incorrect result Keywords: | at runtime concurrency sockets | Blocked By: Architecture: | Related Tickets: Unknown/Multiple | Test Case: | Blocking: | Differential Revisions: | -------------------------------------+------------------------------------- In a project using the network-transport-tcp package, I'm observing {{{threadWaitRead}}} throw the exception {{{BlockedIndefinitelyOnMVar}}}. The call stack is roughly: {{{ ... n-t-tcp:Network.Transport.TCP.handleIncomingMessages n-t-tcp:Network.Transport.TCP.Internal.recvInt32 n-t-tcp:Network.Transport.TCP.Internal.recvExact network:Network.Socket.ByteString:recv network:Network.Socket.ByteString:recvInner network:Network.Socket.Internal:throwSocketErrorWaitRead base:Control.Concurrent:threadWaitRead }}} IIUC this would be an RTS bug. The socket file descriptor is healthy and works fine if the exception is caught and {{{threadWaitRead}}} is retried. Unfortunately, I can only reproduce this in a particular machine and with a rather complex test case. I'd appreciate any advice on inspecting the RTS code to scan for the cause of {{{BlockedIndefinitelyOnMVar}}} being thrown. Of course, if someone can help explaining this behavior I'll be most thankful. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10629 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10629: threadWaitRead throws BlockedIndefinitelyOnMVar -------------------------------------+------------------------------------- Reporter: | Owner: simonmar facundo.dominguez | Status: new Type: bug | Milestone: Priority: normal | Version: 7.10.1 Component: Runtime System | Keywords: Resolution: | concurrency sockets Operating System: Linux | Architecture: Type of failure: Incorrect result | Unknown/Multiple at runtime | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Description changed by facundo.dominguez: Old description:
In a project using the network-transport-tcp package, I'm observing {{{threadWaitRead}}} throw the exception {{{BlockedIndefinitelyOnMVar}}}.
The call stack is roughly:
{{{ ... n-t-tcp:Network.Transport.TCP.handleIncomingMessages n-t-tcp:Network.Transport.TCP.Internal.recvInt32 n-t-tcp:Network.Transport.TCP.Internal.recvExact network:Network.Socket.ByteString:recv network:Network.Socket.ByteString:recvInner network:Network.Socket.Internal:throwSocketErrorWaitRead base:Control.Concurrent:threadWaitRead }}}
IIUC this would be an RTS bug. The socket file descriptor is healthy and works fine if the exception is caught and {{{threadWaitRead}}} is retried.
Unfortunately, I can only reproduce this in a particular machine and with a rather complex test case.
I'd appreciate any advice on inspecting the RTS code to scan for the cause of {{{BlockedIndefinitelyOnMVar}}} being thrown.
Of course, if someone can help explaining this behavior I'll be most thankful.
New description: In a project using the network-transport-tcp package, I'm observing {{{threadWaitRead}}} throw the exception {{{BlockedIndefinitelyOnMVar}}}. The call stack is roughly: {{{ ... n-t-tcp:Network.Transport.TCP.handleIncomingMessages n-t-tcp:Network.Transport.TCP.Internal.recvInt32 n-t-tcp:Network.Transport.TCP.Internal.recvExact network:Network.Socket.ByteString:recv network:Network.Socket.ByteString:recvInner network:Network.Socket.Internal:throwSocketErrorWaitRead base:Control.Concurrent:threadWaitRead }}} IIUC this would be an RTS bug. The socket file descriptor is healthy and works fine if the exception is caught and {{{threadWaitRead}}} is retried. Unfortunately, I can only reproduce this in a particular cluster and with a rather complex test case while using the threaded runtime. I'd appreciate any advice on inspecting the RTS code to scan for the cause of {{{BlockedIndefinitelyOnMVar}}} being thrown. Of course, if someone can help explaining this behavior I'll be most thankful. -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10629#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10629: threadWaitRead throws BlockedIndefinitelyOnMVar -------------------------------------+------------------------------------- Reporter: | Owner: simonmar facundo.dominguez | Status: new Type: bug | Milestone: Priority: normal | Version: 7.10.1 Component: Runtime System | Keywords: Resolution: | concurrency sockets Operating System: Linux | Architecture: Type of failure: Incorrect result | Unknown/Multiple at runtime | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Changes (by slyfox): * cc: slyfox (added) Comment: It's unclear which runtime you are using: threaded or non-threaded? Can you grab an '+RTS -Ds' dump for a process that crashed like that? Trac #10590 might or might not be relevant. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10629#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10629: threadWaitRead throws BlockedIndefinitelyOnMVar -------------------------------------+------------------------------------- Reporter: | Owner: simonmar facundo.dominguez | Status: new Type: bug | Milestone: Priority: normal | Version: 7.10.1 Component: Runtime System | Keywords: Resolution: | concurrency sockets Operating System: Linux | Architecture: Type of failure: Incorrect result | Unknown/Multiple at runtime | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Comment (by slyfox): The RTS throws that exception in a single place in resurrectThreads: https://github.com/ghc/ghc/blob/9b1ebba2af060fef90dcd722313d3f8041ec5a97/rts... -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10629#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10629: threadWaitRead throws BlockedIndefinitelyOnMVar -------------------------------------+------------------------------------- Reporter: | Owner: simonmar facundo.dominguez | Status: new Type: bug | Milestone: Priority: normal | Version: 7.10.1 Component: Runtime System | Keywords: Resolution: | concurrency sockets Operating System: Linux | Architecture: Type of failure: Incorrect result | Unknown/Multiple at runtime | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Comment (by facundo.dominguez): I'm attaching the eventlog produced with {{{-Ds}}}. It can be uncompressed with: {{{ cat test.eventlog.xz.00 test.eventlog.xz.01 | unxz > test.eventlog }}} Had to split it to comply with the upload limits of trac. It contains an entry produced with {{{ traceEventIO "threadWaitRead: failed with BlockedIndefinitelyOnMVar" }}} At the point where {{{threadedWaitRead}}} fails. I couldn't reproduce the bug with GHC HEAD. I'm using the threaded runtime. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10629#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10629: threadWaitRead throws BlockedIndefinitelyOnMVar -------------------------------------+------------------------------------- Reporter: | Owner: simonmar facundo.dominguez | Status: new Type: bug | Milestone: Priority: normal | Version: 7.10.1 Component: Runtime System | Keywords: Resolution: | concurrency sockets Operating System: Linux | Architecture: Type of failure: Incorrect result | Unknown/Multiple at runtime | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Comment (by slyfox): '''-Ds''' should output scheduling log to stderr (might require '''-debug''' link option as well). -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10629#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10629: threadWaitRead throws BlockedIndefinitelyOnMVar -------------------------------------+------------------------------------- Reporter: | Owner: simonmar facundo.dominguez | Status: new Type: bug | Milestone: Priority: normal | Version: 7.10.1 Component: Runtime System | Keywords: Resolution: | concurrency sockets Operating System: Linux | Architecture: Type of failure: Incorrect result | Unknown/Multiple at runtime | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Comment (by facundo.dominguez): Indeed. But I redirected it to the eventlog with {{{-l}}} as it is suggested to reduce the overhead of tracing. If the stderr output is needed instead, let me know and I'll try it. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10629#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10629: threadWaitRead throws BlockedIndefinitelyOnMVar -------------------------------------+------------------------------------- Reporter: | Owner: simonmar facundo.dominguez | Status: new Type: bug | Milestone: Priority: normal | Version: 7.10.1 Component: Runtime System | Keywords: Resolution: | concurrency sockets Operating System: Linux | Architecture: Type of failure: Incorrect result | Unknown/Multiple at runtime | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Comment (by slyfox): stderr is preferred as it shows MVar addresses and thread states on each GC cycle (I wonder if it's a logging bug or feature): {{{ all threads: threads on capability 0: other threads: thread 4 @ 0x7f526d10b388 is blocked on an MVar @ 0x7f526d10aa50 (TSO_DIRTY) thread 3 @ 0x7f526d105d20 ["TimerManager"] is blocked on an external call (TSO_DIRTY) thread 2 @ 0x7f526d1058f0 ["IOManager on cap 0"] is blocked on an external call (TSO_DIRTY) }}} I hope to see file descriptor ID/MVar history for thread killed by BlockedIndefinitelyOnMVar. Human readable thread names are set with '''labelThread'''. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10629#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10629: threadWaitRead throws BlockedIndefinitelyOnMVar -------------------------------------+------------------------------------- Reporter: | Owner: simonmar facundo.dominguez | Status: new Type: bug | Milestone: Priority: normal | Version: 7.10.1 Component: Runtime System | Keywords: Resolution: | concurrency sockets Operating System: Linux | Architecture: Type of failure: Incorrect result | Unknown/Multiple at runtime | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Comment (by facundo.dominguez): I failed to reproduce the bug without redirecting with {{{-l}}}. I updated the attached eventlog and also the stderr output I get. Hopefully, they both sum up all the interesting information. Let me know if the split eventlog/stderr is still unhelpful. I'm using {{{labelThread tid "handleIncomingMessages"}}} to mark the dying thread. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10629#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10629: threadWaitRead throws BlockedIndefinitelyOnMVar -------------------------------------+------------------------------------- Reporter: | Owner: simonmar facundo.dominguez | Status: new Type: bug | Milestone: Priority: normal | Version: 7.10.1 Component: Runtime System | Keywords: Resolution: | concurrency sockets Operating System: Linux | Architecture: Type of failure: Incorrect result | Unknown/Multiple at runtime | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Comment (by simonmar): `BlockedIndefinitelyOnMVar` is thrown when the thread is blocked on an `MVar` that isn't reachable from anywhere else, which means that it will never be woken up. There are two possibilities: - the exception is being thrown erroneously (unlikely, I'd guess) - the IO manager has somehow lost the `MVar` that the thread is waiting on. You could proceed by instrumenting code below `GHC.Event` to find out what happens to the `MVar` the thread is waiting on. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10629#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10629: threadWaitRead throws BlockedIndefinitelyOnMVar -------------------------------------+------------------------------------- Reporter: | Owner: simonmar facundo.dominguez | Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 7.10.1 Resolution: | Keywords: | concurrency sockets Operating System: Linux | Architecture: Type of failure: Incorrect result | Unknown/Multiple at runtime | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Comment (by facundo.dominguez): After staring a while to the code in {{{GHC.Event}}}, I got this program which seems to reproduce the problem. I'm using {{{network-2.6.2.1}}}, though I don't think the particular version makes a difference. {{{ -- t.hs -- -- The output should look like -- -- > $ ghc --make -threaded t.hs -- > $ ./t -- > threadWaitWrite: terminated -- > threadWaitRead: thread blocked indefinitely in an MVar operation -- > threadWaitRead: terminated -- > main thread terminated -- main thread terminated import Control.Concurrent import Control.Exception import Control.Monad import System.IO import System.Mem import Network.Socket main :: IO () main = do (s0, s1) <- socketPair AF_UNIX Stream defaultProtocol let fd = fdSocket s0 forkIO $ do catch (threadWaitRead (fromIntegral fd)) $ \e -> putStrLn $ "threadWaitRead: " ++ show (e :: SomeException) putStrLn "threadWaitRead: terminated" forkIO $ do threadDelay 500000 catch (threadWaitWrite (fromIntegral fd)) $ \e -> putStrLn $ "threadWaitWrite: " ++ show (e :: SomeException) putStrLn "threadWaitWrite: terminated" threadDelay 1000000 send s1 "hello" putStrLn "main thread terminated" }}} {{{GHC.Event.Manager}}} has a suspicious [https://github.com/ghc/ghc/blob/ghc-7.10.1-release/libraries/base/GHC/Event/... line]. It removes all callbacks from the table, but I can't see where the unmatched callbacks are added back. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10629#comment:10 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10629: threadWaitRead throws BlockedIndefinitelyOnMVar -------------------------------------+------------------------------------- Reporter: | Owner: simonmar facundo.dominguez | Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 7.10.1 Resolution: | Keywords: | concurrency sockets Operating System: Linux | Architecture: Type of failure: Incorrect result | Unknown/Multiple at runtime | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Comment (by facundo.dominguez): ... and it seems to have been fixed in #10317. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10629#comment:11 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10629: threadWaitRead throws BlockedIndefinitelyOnMVar -------------------------------------+------------------------------------- Reporter: | Owner: simonmar facundo.dominguez | Type: bug | Status: closed Priority: normal | Milestone: Component: Runtime System | Version: 7.10.1 Resolution: worksforme | Keywords: | concurrency sockets Operating System: Linux | Architecture: Type of failure: Incorrect result | Unknown/Multiple at runtime | Test Case: Blocked By: | Blocking: Related Tickets: #10317 | Differential Revisions: -------------------------------------+------------------------------------- Changes (by thomie): * status: new => closed * resolution: => worksforme * related: => #10317 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10629#comment:12 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC