[GHC] #12181: Multi-threaded code on ARM64 GHC runtime doesn't use all available cores

#12181: Multi-threaded code on ARM64 GHC runtime doesn't use all available cores -------------------------------------+------------------------------------- Reporter: varosi | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Runtime | Version: 7.10.3 System | Keywords: | Operating System: Unknown/Multiple Architecture: arm | Type of failure: Runtime | performance bug Test Case: | Blocked By: Blocking: | Related Tickets: Differential Rev(s): | Wiki Page: -------------------------------------+------------------------------------- This is the machine: [http://www.cnx-software.com/2016/04/30/setup-guide-mini-review-of-bq- aquaris-m10-ubuntu-edition-tablet-from-a-developers-perspective/] Haskell ray-tracer that uses Control.Parallel.Strategies and parBuffer that is working well on x64 machine and using all the cores available use only 2 cores from 4 in total on that ARM machine. This machine usually work on two cores only and when it sees that they are used more - it enables two more to get total of four cores. If I give "+RTS -N4" it works just fine. So I think the problem is that the runtime doesn't check for all available cores, but only for enabled. In the first link you could see that "lscpu" returns 4 cores in total. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12181 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12181: Multi-threaded code on ARM64 GHC runtime doesn't use all available cores --------------------------------------------+------------------------------ Reporter: varosi | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: arm Type of failure: Runtime performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | --------------------------------------------+------------------------------ Comment (by thomie): varosi: you are reporting an issue with `+RTS -N`, correct? When compiling the following program with `-threaded`, and running it with `./Main +RTS -N`, it prints `2` instead of `4` on your machine: {{{ import Control.Concurrent main = getNumCapabilities >>= print }}} Here's the code that gets the number of processors when using `+RTS -N`, from `rts/posix/OSThreads.c`: {{{#!C uint32_t getNumberOfProcessors (void) { static uint32_t nproc = 0; if (nproc == 0) { #if defined(HAVE_SYSCONF) && defined(_SC_NPROCESSORS_ONLN) nproc = sysconf(_SC_NPROCESSORS_ONLN); #elif defined(HAVE_SYSCONF) && defined(_SC_NPROCESSORS_CONF) nproc = sysconf(_SC_NPROCESSORS_CONF); #elif defined(darwin_HOST_OS) size_t size = sizeof(uint32_t); if(sysctlbyname("hw.logicalcpu",&nproc,&size,NULL,0) != 0) { if(sysctlbyname("hw.ncpu",&nproc,&size,NULL,0) != 0) nproc = 1; } #elif defined(freebsd_HOST_OS) size_t size = sizeof(uint32_t); if(sysctlbyname("hw.ncpu",&nproc,&size,NULL,0) != 0) nproc = 1; #else nproc = 1; #endif } return nproc; } }}} From `man sysconf`: {{{ - _SC_NPROCESSORS_CONF The number of processors configured. - _SC_NPROCESSORS_ONLN The number of processors currently online (available). }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12181#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12181: Multi-threaded code on ARM64 GHC runtime doesn't use all available cores --------------------------------------------+------------------------------ Reporter: varosi | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: arm Type of failure: Runtime performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | --------------------------------------------+------------------------------ Comment (by rwbarton): It really doesn't seem sensible to me to have GHC assume by default that CPUs that are off-line will magically become available under load. Though admittedly I don't know what the use of taking CPUs off-line is supposed to be. This seems like a deficiency in the operating system, that there isn't a way to ask it "how many CPUs will my program run on". It's not for GHC to work around this I think. You could do so yourself by starting a thread that periodically checks the number of currently available processors and calls `setNumCapabilities`. Or just run with `-N4`. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12181#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12181: Multi-threaded code on ARM64 GHC runtime doesn't use all available cores --------------------------------------------+------------------------------ Reporter: varosi | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: arm Type of failure: Runtime performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | --------------------------------------------+------------------------------ Comment (by varosi): The problem seems to be more deep. Currently we run a program written in C for profiling of matrix multiplication and it runs on all cores. When we run Haskell program with "+RTS -N8" (fix for number of cores) it runs 8 OS threads but they are taking just half of available cores and program runs much slower than running it with "+RTS -N4". This is the program for reference: https://bitbucket.org/varosi/cgraytrace/overview -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12181#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12181: Multi-threaded code on ARM64 GHC runtime doesn't use all available cores --------------------------------------------+------------------------------ Reporter: varosi | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: arm Type of failure: Runtime performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | --------------------------------------------+------------------------------ Comment (by varosi): @thomie, yes, initially this 4 core machine is reporting only 1 active core. And as @rwbarton said, it is not a problem of GHC. So -N will not work correctly on that machine and we have to tell it explicitly 4 cores. It is actually tablet machine, so it save power with turning off cores. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12181#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12181: Multi-threaded code on ARM64 GHC runtime doesn't use all available cores --------------------------------------------+------------------------------ Reporter: varosi | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: arm Type of failure: Runtime performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | --------------------------------------------+------------------------------ Comment (by dobenour): What about having a Haskell API to tell the RTS to re-detect the number of CPUs, looking for the number of available processors? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12181#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12181: Multi-threaded code on ARM64 GHC runtime doesn't use all available cores --------------------------------------------+------------------------------ Reporter: varosi | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: arm Type of failure: Runtime performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | --------------------------------------------+------------------------------ Comment (by simonmar): You can already do this, with `GHC.Conc.getNumberOfProcessors` followed by `GHC.Conc.setNumCapabilities`. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12181#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12181: Multi-threaded code on ARM64 GHC runtime doesn't use all available cores --------------------------------------------+------------------------------ Reporter: varosi | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: arm Type of failure: Runtime performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | --------------------------------------------+------------------------------ Comment (by varosi): Isn't it good idea if GHC runtime is doing this once per second or the like? So this way it will automatically work on similar machines? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12181#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12181: Multi-threaded code on ARM64 GHC runtime doesn't use all available cores --------------------------------------------+------------------------------ Reporter: varosi | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: arm Type of failure: Runtime performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | --------------------------------------------+------------------------------ Comment (by simonmar): Yes, perhaps `+RTS -N` should automatically readjust at some regular interval. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12181#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12181: Multi-threaded code on ARM64 GHC runtime doesn't use all available cores --------------------------------------------+------------------------------ Reporter: varosi | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: arm Type of failure: Runtime performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | --------------------------------------------+------------------------------ Comment (by varosi): Or may be operating systems could have already mechanisms to signal of such changes without pulling them regularly. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12181#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12181: Multi-threaded code on ARM64 GHC runtime doesn't use all available cores --------------------------------------------+------------------------------ Reporter: varosi | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: arm Type of failure: Runtime performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | --------------------------------------------+------------------------------ Comment (by varosi): Will that issue enter 8.2 or 8.4? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12181#comment:10 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12181: Multi-threaded code on ARM64 GHC runtime doesn't use all available cores --------------------------------------------+------------------------------ Reporter: varosi | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: arm Type of failure: Runtime performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | --------------------------------------------+------------------------------ Comment (by bgamari): At the moment there is no milestone meaning we aren't targetting any particular release for a fix. I will say that I'm not terribly keen on the idea of polling to get CPU information. I tend to agree with Reid that this is a distribution issue: bringing CPUs entirely offline for the sake of power management seems a bit crazy. Don't the Linux `cpufreq`, `cpuidle`, and clock management, and runtime PM subsystems exist precise to avoid this sort of heavy-weight power management? Why not just implement the RTS's `-N` logic yourself, but using at `_SC_NPROCESSORS_CONF` instead of `_SC_NPROCESSORS_ONLN`? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12181#comment:11 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC