[GHC] #13164: idle time full GCs (idle cpu usage)

#13164: idle time full GCs (idle cpu usage) -------------------------------------+------------------------------------- Reporter: lspitzner | Owner: Type: feature | Status: new request | Priority: normal | Milestone: Component: Runtime | Version: 8.0.1 System | Keywords: idle GC | Operating System: Unknown/Multiple Architecture: | Type of failure: None/Unknown Unknown/Multiple | Test Case: | Blocked By: Blocking: | Related Tickets: Differential Rev(s): | Wiki Page: -------------------------------------+------------------------------------- In an interactive program I noticed idle CPU usage of ~12% due to the full GCs done by the RTS on default settings. For this specific usecase this is a lot of wasted CPU cycles for no gain. Passing {{{-I0}}} to the RTS "fixes" this; nonetheless I think that the default behaviour could/should be improved, even when considering other usecases where full GCs are a good choice. The program in question keeps ~50MB allocated on the heap but does close to no short-lived allocations in between the idle GCs. Some relevant data from the summary: {{{ 930,569,008 bytes allocated in the heap 4,847,446,928 bytes copied during GC 46,255,056 bytes maximum residency (111 sample(s)) 378,200 bytes maximum slop 94 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 1784 colls, 0 par 0.123s 0.100s 0.0001s 0.0006s Gen 1 111 colls, 0 par 9.377s 9.418s 0.0848s 0.1099s INIT time 0.000s ( 0.003s elapsed) MUT time 2.250s ( 95.029s elapsed) GC time 9.500s ( 9.518s elapsed) RP time 0.000s ( 0.000s elapsed) PROF time 0.000s ( 0.000s elapsed) EXIT time 0.007s ( 0.008s elapsed) Total time 11.760s (104.557s elapsed) }}} The detailed GC statistics of the idle GCs look like {{{ Alloc Copied Live GC GC TOT TOT Page Flts bytes bytes bytes user elap user elap 13304 45958728 46210536 0.070 0.068 0.920 1.550 0 0 (Gen: 1) 11568 45958720 46210536 0.100 0.102 1.043 2.584 0 0 (Gen: 1) 15432 45958720 46210536 0.103 0.105 1.167 3.588 0 0 (Gen: 1) 11568 45958720 46210536 0.100 0.102 1.287 4.586 0 0 (Gen: 1) 11568 45958720 46210536 0.107 0.107 1.413 5.592 0 0 (Gen: 1) 11568 45958720 46210536 0.073 0.080 1.503 6.566 0 0 (Gen: 1) 11568 45958720 46210536 0.073 0.073 1.593 7.560 0 0 (Gen: 1) 11568 45958720 46210536 0.067 0.068 1.677 8.557 0 0 (Gen: 1) }}} I don't really know what exactly "alloc bytes" means, but my guess is that those numbers are indeed fairly low and those GCs are mostly useless. To clarify the intention of this request: '''Tweak the idle GC (default) configuration in some way so that interactive programs with moderate heap use reasonable amount of CPU time while idle.''' And "reasonable" is intentionally vague as I don't know how to weight other usecases. Minimal example: {{{ import Control.Concurrent main :: IO () main = do let large = [1..1000000] print $ length large [1..20] `forM_` \_ -> do threadDelay 400000 print $ sum large }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13164 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13164: idle time full GCs (idle cpu usage) -------------------------------------+------------------------------------- Reporter: lspitzner | Owner: Type: feature request | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 8.0.1 Resolution: | Keywords: idle GC Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonmar): I'm open to ideas here. The tradeoff is that without the idle GC you don't get any deadlock detection (`BlockedOnDeadMVar` and friends) when the process is idle, so provided you're OK with that then it's fine to use `-I0`. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13164#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13164: idle time full GCs (idle cpu usage) -------------------------------------+------------------------------------- Reporter: lspitzner | Owner: Type: feature request | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 8.0.1 Resolution: | Keywords: idle GC Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by lspitzner): I think it would be a good approach to keep the current delay, but to insert some kind of conditional around actually running the full GC. This condition could take into account e.g.: - The time since the last (actually-performed) idle GC. This would reset if a non-idle full GC is performed. - The number of bytes collected by the last idle GC, perhaps in relation to the residency. (If you collect less than 10% of heap as garbage, it is probably not worth to run the GC too often). - The number of bytes allocated since the last full GC (idle or not). I don't know if these statistics are available, but at least the first should be easy to implement. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13164#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13164: idle time full GCs (idle cpu usage) -------------------------------------+------------------------------------- Reporter: lspitzner | Owner: Type: feature request | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 8.0.1 Resolution: | Keywords: idle GC Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonmar): The main purpose of the idle GC is not really to collect garbage, it's for deadlock detection. The program may have deadlocked even if it has only allocated a tiny amount since the previous GC, and we have no way of telling that. On the other hand, if you want the idle GC for collecting garbage, and you don't mind about not having prompt deadlock detection, then yes having those heuristics would be a good idea. I'm just not sure how often this is the case. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13164#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13164: idle time full GCs (idle cpu usage) -------------------------------------+------------------------------------- Reporter: lspitzner | Owner: Type: feature request | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 8.0.1 Resolution: | Keywords: idle GC Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by lspitzner): I just noticed that this means that you do not get deadlock detection if you forkIO (forever (threadDelay 100000)). That's interesting. You are right, I was focused on GC performance instead of deadlock detection because the latter was not even mentioned when i chatted with rwbarton. Maybe he can comment on this. Even for deadlock detection it makes sense to delay things based on the cost of the full GCs. (I'd rather have to wait 15secs to get the deadlock detected when I manage to create deadlocks than constantly be wasting CPU time - but if full GCs are cheap, I would not mind quicker detection.) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13164#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13164: idle time full GCs (idle cpu usage) -------------------------------------+------------------------------------- Reporter: lspitzner | Owner: Type: feature request | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 8.0.1 Resolution: | Keywords: idle GC Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by rwbarton): FWIW I've never cared about deadlock detection, and it sounds unreliable in view of lspitzner's most recent comment. I do care that interactive programs I'm not actively using don't consume 12% of my CPU. Doing a GC when the user isn't interacting with the program sounds like a good idea to hide latency. Judging from the detailed GC statistics lspitzner's program isn't totally idle; it must be doing a little bit of work every second, repeatedly triggering a new idle GC. But one would expect it to be cheap to do a little bit of work every second. The heap could be a lot bigger than 50 MB... The user's guide suggests enabling the idle GC for interactive programs, but when there is a little work to do every second it seems basically unusable. And the effects of the idle GC would be lost completely by doing work every 0.2 seconds, which is strange. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13164#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13164: idle time full GCs (idle cpu usage) -------------------------------------+------------------------------------- Reporter: lspitzner | Owner: Type: feature request | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 8.0.1 Resolution: | Keywords: idle GC Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonmar):
I just noticed that this means that you do not get deadlock detection if you forkIO (forever (threadDelay 100000)). That's interesting.
Probably because it never runs a major GC, only minor GCs. Admittedly that's surprising and counter-intuitive.
I do care that interactive programs I'm not actively using don't consume 12% of my CPU.
I'm sympathetic to that. Like I said, I'm open to suggestions here. I do worry that not having deadlock detection by default will be surprising - it's quite useful when developing/debugging, for instance. We have lots of tests in `testsuite/tests/concurrent` that rely on it. How about we give the idle GC a time budget expressed as a % of wall clock time? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13164#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13164: idle time full GCs (idle cpu usage) -------------------------------------+------------------------------------- Reporter: lspitzner | Owner: Type: feature request | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 8.0.1 Resolution: | Keywords: idle GC Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by lspitzner): Firstly: I said "I just noticed that this means", i.e. this was a pure deduction, not an observation. Apart from that, I never meant to suggest making -I0 the default, and I think the two goals (deadlock detection and latency reduction) do align here. So between options a) status quo b) default -I0 c) something more clever that both does deadlock detection and uses, lets say.. 3% idle cpu time, I'd choose c) over a) over b). I am not sure how the budget would be enforced, but it sounds like a good idea, especially if that is easier to implement than the heuristics I mentioned above. Another thought: If there was an upper bound on the interval between full GCs (idle or not) you'd get reliable deadlock detection even with "forever threaddelay" cases. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13164#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13164: idle time full GCs (idle cpu usage) -------------------------------------+------------------------------------- Reporter: lspitzner | Owner: Type: feature request | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 8.0.1 Resolution: | Keywords: idle GC Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by rwbarton):
How about we give the idle GC a time budget expressed as a % of wall clock time?
This sounds sensible, e.g., if the idle GC last ran for m milliseconds, wait (at least) 100*m milliseconds before running it again.
Another thought: If there was an upper bound on the interval between full GCs (idle or not) you'd get reliable deadlock detection even with "forever threaddelay" cases.
This makes sense, too. So the 100*m timer should be reset whenever we do a full GC. Then we probably no longer need the "idle" condition. (But when the RTS is ''truly'' idle then we should not do another GC, of course. For example, currently when you leave ghci idle in a terminal it does one idle GC after (GHC's custom idle GC time of) 5 seconds, but then doesn't do another GC after 5 more seconds if you haven't interacted with it at all.) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13164#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC