
#9221: (super!) linear slowdown of parallel builds on 40 core machine -------------------------------------+------------------------------------- Reporter: carter | Owner: Type: bug | Status: new Priority: normal | Milestone: 8.2.1 Component: Compiler | Version: 7.8.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #910, #8224 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by slyfox): 24-core VM. CPU topology: {{{ $ lstopo-no-graphics Machine (118GB) Package L#0 + L3 L#0 (30MB) L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 PU L#0 (P#0) PU L#1 (P#1) L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 PU L#2 (P#2) PU L#3 (P#3) L2 L#2 (256KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 PU L#4 (P#4) PU L#5 (P#5) L2 L#3 (256KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 PU L#6 (P#6) PU L#7 (P#7) L2 L#4 (256KB) + L1d L#4 (32KB) + L1i L#4 (32KB) + Core L#4 PU L#8 (P#8) PU L#9 (P#9) L2 L#5 (256KB) + L1d L#5 (32KB) + L1i L#5 (32KB) + Core L#5 PU L#10 (P#10) PU L#11 (P#11) L2 L#6 (256KB) + L1d L#6 (32KB) + L1i L#6 (32KB) + Core L#6 PU L#12 (P#12) PU L#13 (P#13) L2 L#7 (256KB) + L1d L#7 (32KB) + L1i L#7 (32KB) + Core L#7 PU L#14 (P#14) PU L#15 (P#15) L2 L#8 (256KB) + L1d L#8 (32KB) + L1i L#8 (32KB) + Core L#8 PU L#16 (P#16) PU L#17 (P#17) L2 L#9 (256KB) + L1d L#9 (32KB) + L1i L#9 (32KB) + Core L#9 PU L#18 (P#18) PU L#19 (P#19) L2 L#10 (256KB) + L1d L#10 (32KB) + L1i L#10 (32KB) + Core L#10 PU L#20 (P#20) PU L#21 (P#21) L2 L#11 (256KB) + L1d L#11 (32KB) + L1i L#11 (32KB) + Core L#11 PU L#22 (P#22) PU L#23 (P#23) $ numactl -H available: 1 nodes (0) node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 node 0 size: 120881 MB node 0 free: 120192 MB node distances: node 0 0: 10 }}} (I would not trust numactl output). Separate processes: {{{ $ make clean; time make -j1 real 1m33.147s user 1m20.836s sys 0m11.556s $ make clean; time make -j10 real 0m11.275s user 1m29.800s sys 0m12.856s $ make clean; time make -j12 real 0m10.537s user 1m36.276s sys 0m16.948s $ make clean; time make -j14 real 0m9.117s user 1m39.132s sys 0m18.332s $ make clean; time make -j20 real 0m8.498s user 2m7.064s sys 0m17.912s $ make clean; time make -j22 real 0m7.468s user 2m9.808s sys 0m18.592s $ make clean; time make -j24 real 0m7.336s user 2m15.936s sys 0m19.004s $ make clean; time make -j26 real 0m7.433s user 2m17.612s sys 0m19.648s $ make clean; time make -j28 real 0m7.554s user 2m17.760s sys 0m19.564s $ make clean; time make -j30 real 0m7.563s user 2m16.776s sys 0m21.104s }}} Numbers are jumping slightly from run to run but the gist is best performance is around -j24, not -j12. Single process: {{{ $ ./synth.bash -j1 +RTS -sstderr -A256M -qb0 -RTS real 1m15.214s user 1m14.060s sys 0m0.984s $ ./synth.bash -j8 +RTS -sstderr -A256M -qb0 -RTS real 0m11.275s user 1m21.708s sys 0m2.912s $ ./synth.bash -j10 +RTS -sstderr -A256M -qb0 -RTS real 0m10.279s user 1m25.184s sys 0m3.664s $ ./synth.bash -j12 +RTS -sstderr -A256M -qb0 -RTS real 0m9.605s user 1m32.688s sys 0m4.292s $ ./synth.bash -j14 +RTS -sstderr -A256M -qb0 -RTS real 0m9.144s user 1m40.288s sys 0m4.964s $ ./synth.bash -j16 +RTS -sstderr -A256M -qb0 -RTS real 0m10.003s user 1m51.916s sys 0m6.604s $ ./synth.bash -j20 +RTS -sstderr -A256M -qb0 -RTS real 0m10.215s user 2m7.924s sys 0m8.208s $ ./synth.bash -j22 +RTS -sstderr -A256M -qb0 -RTS real 0m10.483s user 2m13.440s sys 0m10.456s $ ./synth.bash -j24 +RTS -sstderr -A256M -qb0 -RTS real 0m10.985s user 2m18.028s sys 0m10.780s $ ./synth.bash -j32 +RTS -sstderr -A256M -qb0 -RTS real 0m12.636s user 2m32.312s sys 0m14.508s }}} Here we see best numbers around -j12 and those are worse than multiprocess run. From '''perf record''' it's not very clear what happens. I'll try to get a 64-core VM next week and see if the effect will be visible there much better. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9221#comment:66 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler