
#9221: (super!) linear slowdown of parallel builds on 40 core machine -------------------------------------+------------------------------------- Reporter: carter | Owner: Type: bug | Status: new Priority: high | Milestone: 7.10.1 Component: Compiler | Version: 7.8.2 Resolution: | Keywords: Operating System: | Architecture: Unknown/Multiple Unknown/Multiple | Difficulty: Unknown Type of failure: Compile- | Blocked By: time performance bug | Related Tickets: #910 Test Case: | Blocking: | Differential Revisions: | -------------------------------------+------------------------------------- Comment (by gintas): I think I know what's going on here. If you look at parUpsweep in compiler/main/GhcMake.js, its argument n_jobs is used in two places: one is the initial value of the par_sem semaphore used to limit parallelization, and the other is a call to setNumCapabilities. The latter seems to be the cause of the slowdown. Note that setNumCapabilities is only invoked if the previous count of capabilities was 1. I used that to control for both settings independently, and it turns out that the runtime overhead is mostly independent of the semaphore value and highly influenced by capability count. I ran some experiments on a 16-CPU VM (picked a larger one deliberately to make the differences more pronounced). Running with jobs=4 & caps=4, a test took 37s walltime, jobs=4 & caps=16 took 51s, jobs=4 & caps=32 took 114s (344s of MUT and 1021s of GC!). The figures are very similar for jobs=16 and jobs=64. See attached log for more details (-sstderr output). It looks like the runtime GC is just inefficient when running with many capabilities, even if many physical cores are available. I'll try a few experiments to verify that this is a general pattern that is not specific to the GhcMake implementation. Logic and a few experiments indicate that it does not help walltime to set the number of jobs (semaphore value) higher than the number of capabilities, so there's not much we can do about those two parameters in the parUpsweep implementation other than capping n_jobs at some constant (probably <= 8). -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9221#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler