[GHC] #14707: setNumCapabilities can cause threads to get stuck in gcWorkerThread

#14707: setNumCapabilities can cause threads to get stuck in gcWorkerThread -------------------------------------+------------------------------------- Reporter: duog | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Runtime | Version: 8.5 System | Keywords: | Operating System: Unknown/Multiple Architecture: | Type of failure: None/Unknown Unknown/Multiple | Test Case: | Blocked By: Blocking: | Related Tickets: Differential Rev(s): | Wiki Page: -------------------------------------+------------------------------------- I have a patch with some instrumentation that proves that sometimes threads do not leave gcWorkerThread until the following gc. I suspect it's caused by `idle_caps` being mutated in `scheduleDoGC` after the call to `requestSync`. A thread enters `yieldCapability` sees that itself is not idle, so enters `gcWorkerThread`, but then `idle_caps` is mutated so that that thread *is* idle, and it's spin locks are not touched by the garbage collector. Potential fixes: * Don't look at `idle_caps` in the garbage collector when we're touching the spin-locks, just do it for all capabilities. I don't *think* this does any harm. * Don't mutate `idle_caps` after the call to `requestSync`; move that logic to before the call. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14707 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14707: setNumCapabilities can cause threads to get stuck in gcWorkerThread -------------------------------------+------------------------------------- Reporter: duog | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 8.5 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Description changed by duog: Old description:
I have a patch with some instrumentation that proves that sometimes threads do not leave gcWorkerThread until the following gc.
I suspect it's caused by `idle_caps` being mutated in `scheduleDoGC` after the call to `requestSync`. A thread enters `yieldCapability` sees that itself is not idle, so enters `gcWorkerThread`, but then `idle_caps` is mutated so that that thread *is* idle, and it's spin locks are not touched by the garbage collector.
Potential fixes: * Don't look at `idle_caps` in the garbage collector when we're touching the spin-locks, just do it for all capabilities. I don't *think* this does any harm. * Don't mutate `idle_caps` after the call to `requestSync`; move that logic to before the call.
New description: I have a patch with some instrumentation that proves that sometimes threads do not leave gcWorkerThread until the following gc. I suspect it's caused by `idle_caps` being mutated in `scheduleDoGC` after the call to `requestSync`. A thread enters `yieldCapability` sees that itself is not idle, so enters `gcWorkerThread`, but then `idle_caps` is mutated so that that thread ''is'' idle, and it's spin locks are not touched by the garbage collector. Potential fixes: * Don't look at `idle_caps` in the garbage collector when we're touching the spin-locks, just do it for all capabilities. I don't ''think'' this does any harm. * Don't mutate `idle_caps` after the call to `requestSync`; move that logic to before the call. -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14707#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14707: setNumCapabilities can cause threads to get stuck in gcWorkerThread -------------------------------------+------------------------------------- Reporter: duog | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 8.5 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Description changed by duog: Old description:
I have a patch with some instrumentation that proves that sometimes threads do not leave gcWorkerThread until the following gc.
I suspect it's caused by `idle_caps` being mutated in `scheduleDoGC` after the call to `requestSync`. A thread enters `yieldCapability` sees that itself is not idle, so enters `gcWorkerThread`, but then `idle_caps` is mutated so that that thread ''is'' idle, and it's spin locks are not touched by the garbage collector.
Potential fixes: * Don't look at `idle_caps` in the garbage collector when we're touching the spin-locks, just do it for all capabilities. I don't ''think'' this does any harm. * Don't mutate `idle_caps` after the call to `requestSync`; move that logic to before the call.
New description: I have a patch with some instrumentation (Phab:D4339) that proves that sometimes threads do not leave gcWorkerThread until the following gc. I suspect it's caused by `idle_caps` being mutated in `scheduleDoGC` after the call to `requestSync`. A thread enters `yieldCapability` sees that itself is not idle, so enters `gcWorkerThread`, but then `idle_caps` is mutated so that that thread ''is'' idle, and it's spin locks are not touched by the garbage collector. Potential fixes: * Don't look at `idle_caps` in the garbage collector when we're touching the spin-locks, just do it for all capabilities. I don't ''think'' this does any harm. * Don't mutate `idle_caps` after the call to `requestSync`; move that logic to before the call. Of course, maybe I'm misunderstanding and this isn't a bug? -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14707#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14707: setNumCapabilities can cause threads to get stuck in gcWorkerThread -------------------------------------+------------------------------------- Reporter: duog | Owner: (none) Type: bug | Status: closed Priority: normal | Milestone: Component: Runtime System | Version: 8.5 Resolution: invalid | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by duog): * status: new => closed * resolution: => invalid Comment: Turns out my instrumentation was wrong, but at least I learned something. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14707#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC