[GHC] #14981: GHC parallel GC is not doing well on modern many-core machine

#14981: GHC parallel GC is not doing well on modern many-core machine -------------------------------------+------------------------------------- Reporter: varosi | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Runtime | Version: 8.4.1 System | Keywords: | Operating System: Unknown/Multiple Architecture: | Type of failure: None/Unknown Unknown/Multiple | Test Case: | Blocked By: Blocking: | Related Tickets: Differential Rev(s): | Wiki Page: -------------------------------------+------------------------------------- I'm testing a small ray-tracer on different many-core machines, like x64 88 core and Aarch64 96 core (on https://packet.net). Parallel GC seems to have throughput problems on more than 24-32 cores. See this Reddit thread about - https://www.reddit.com/r/haskell/comments/85vwlq/our_lovely_ghc_parallel_gc_... There you may find .eventlog file and PNG with a screenshot. May be it's time to resurrect Concurrent GC project again? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14981 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14981: GHC parallel GC is not doing well on modern many-core machine -------------------------------------+------------------------------------- Reporter: varosi | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 8.4.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by osa1): * cc: osa1 (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14981#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14981: GHC parallel GC is not doing well on modern many-core machine -------------------------------------+------------------------------------- Reporter: varosi | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 8.4.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by jberryman): I only spent a few minutes looking at the eventlog in threadscope, but the thing that looked instantly fishy to me was that it looks like we stop the world at every minor GC (there are only a dozen collections of gen 1). Another thing to observe is that it looks like the spark creation is healthy, and all work is sparked within the first third or so of program execution (i.e. those little pauses aren't yields because no work is ready to be done, which is what I thought might be happening at first glance. Supposedly RTS flags used were: `-N -A15m -qb0 -qn8`. Attaching op's screenshot from the reddit thread -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14981#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14981: GHC parallel GC is not doing well on modern many-core machine -------------------------------------+------------------------------------- Reporter: varosi | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 8.4.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by jberryman): * Attachment "ARM96coreIssue.png" added. threadscope zoomed in -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14981 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14981: GHC parallel GC is not doing well on modern many-core machine -------------------------------------+------------------------------------- Reporter: varosi | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 8.4.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by varosi): Minor GC should not do stop the world? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14981#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14981: GHC parallel GC is not doing well on modern many-core machine -------------------------------------+------------------------------------- Reporter: varosi | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 8.4.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by jberryman): varosi, my understanding is that only collecting the oldest generation should stop the world. It's possible I'm mistaken or misinterpreting the threadscope profile though. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14981#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14981: GHC parallel GC is not doing well on modern many-core machine -------------------------------------+------------------------------------- Reporter: varosi | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 8.4.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by varosi): Yes, Threadscope show that nursery GC is stopping the world. But is that behavior okay? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14981#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14981: GHC parallel GC is not doing well on modern many-core machine -------------------------------------+------------------------------------- Reporter: varosi | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 8.4.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): Minor GCs indeed do necessarily stop-the-world. This is a known limitation which affects large core-counts particularly badly. However, fixing this in a copying garbage collector is quite tricky. There have been a few attempts at avoiding this stop-the-world. The most recent attempt is the Simons' "Multicore Garbage Collection with Local Heaps". You can still find the prototype implementation (against GHC 6.10, IIRC) on the `wip/local-heaps` branch but it not merged as the performance improvement of brought by this change was outweighed by its enormous complexity. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14981#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14981: GHC parallel GC is not doing well on modern many-core machine -------------------------------------+------------------------------------- Reporter: varosi | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 8.4.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by varosi): Because single core performance will not rise too much more and the future is in many-core CPUs, are there a plans to reconsider bringing local heaps back in some future time? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14981#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14981: GHC parallel GC is not doing well on modern many-core machine -------------------------------------+------------------------------------- Reporter: varosi | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 8.4.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari):
Are there a plans to reconsider bringing local heaps back in some future time?
Not at the moment, at least not with the same approach that was tried in the local heaps paper. Unfortunately maintaining the local heap invariant ends up being rather expensive both in complexity budget and computation (since objects which may be encountered by other capabilities must be evacuated out to the global heap). GHC is not at all unusual in the stop-the-world nature of its minor GC and there are well-understood ways of dealing with it: simply increase the size of the nursery to reduce the frequency of minor GCs (and therefore synchronization). I have seen GHC run very well on a few dozen cores with a `+RTS -A128MB`. Depending upon the allocation load, it may also help to reduce `+RTS -qn` to reduce the number of cores which need to synchronize. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14981#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14981: GHC parallel GC is not doing well on modern many-core machine -------------------------------------+------------------------------------- Reporter: varosi | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 8.4.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by varosi): I tried different nursery sizes. Even up to 300Mb, but the problem is that memory bandwidth has a cost and even with 128Gb of RAM, using more memory just slows down total run times/throughput. btw, I have written more on what I tried here: https://www.reddit.com/r/haskell/comments/85vwlq/our_lovely_ghc_parallel_gc_... -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14981#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC