[GHC] #10229: setThreadAffinity assumes a certain CPU virtual core layout

#10229: setThreadAffinity assumes a certain CPU virtual core layout -------------------------------------+------------------------------------- Reporter: nh2 | Owner: simonmar Type: bug | Status: new Priority: normal | Milestone: Component: Runtime | Version: 7.10.1 System | Operating System: Unknown/Multiple Keywords: | Type of failure: Runtime Architecture: | performance bug Unknown/Multiple | Blocked By: Test Case: | Related Tickets: Blocking: | Differential Revisions: | -------------------------------------+------------------------------------- The {{{RTS -qa}}} option that can set thread affinity was implemented in https://git.haskell.org/ghc.git/commitdiff/31caec794c3978d55d79f715f21fb7294... {{{ // Schedules the thread to run on CPU n of m. m may be less than the // number of physical CPUs, in which case, the thread will be allowed // to run on CPU n, n+m, n+2m etc. void setThreadAffinity (nat n, nat m) }}} Today I discovered that on some machines, this option helps parallel performance (e.g. {{{+RTS -N4}}}) a lot, while on others it doesn't. Together with thomie on #ghc, I found out the reason: Lets assume I have 4 real cores with hyperthreading, so 8 virtual cores. The mapping of hyperthreading cores to physical cores is different across machines. On my one machine (Intel i5), the layout is 11223344, meaning that the first two vCPUs (hyperthreads) that the OS announces (visible e.g. in HTOP) map to the first physical core in the system, and so on. On my other machine (Intel Xeon), the layout is 12341234; here the 1st and the 5th vCPU map to the same physical core. This layout can be (on Linux) observed by running: {{{ cat /proc/cpuinfo|egrep "processor|physical id|core id" |sed 's/^processor/\nprocessor/g' }}} I do not know whether this layout is dictated by the processor, chosen by the OS, or even changing across reboots; what is clear is that the layout can vary across machines. Now, as explained by thomie: {{{ -qa will set your 4 capabilities to cores [(1,5), (2,6), (3,7), (4,8)], and then the os randomly chooses out of those tuples }}} This strategy is optimal for the 12341234 layout; for example, when running with -N4, it ensures that two threads are not scheduled onto vCPUs that are on the same physical core. The possible {{{+RTS -aq}}} choice {{{1__4_23_}}} is a great assignment in this case, as is {{{1234____}}} ({{{_}}} means the vCPU is not chosen). But for the 11223344, the choice {{{1234____}}} isn't good, because it uses only 2 of our 4 physical cores; our program now takes twice as long to run. ---- It seems likely to me that {{{setThreadAffinity}}} was written on a machine with 12341234 layout, and with the assumption that all machines have this layout. It would be great if we could change it to take the actual layout into account. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10229 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10229: setThreadAffinity assumes a certain CPU virtual core layout -------------------------------------+------------------------------------- Reporter: nh2 | Owner: simonmar Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 7.10.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Comment (by fryguybob): I have a version of GHC that I use to allow explicit setting of thread affinity for GHC capabilities. https://github.com/fryguybob/ghc/commit/09f8abd6e89eb2c830b1dc0702ce9a0a0c4f... For a real patch we would want to think about the details of the RTS flag and file format as well as allowing some more friendly command-line options for common settings and low core counts (things that, at the moment, I don't have time to do). I needed explicit setting to get consistent results on a Xeon E5-2699v3 machine with 72 threads where we wanted to consider not only hyperthreads and sockets, but also the proximity of particular cores on the same die. Without setting thread affinity for capabilities results were quite scattered. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10229#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10229: setThreadAffinity assumes a certain CPU virtual core layout -------------------------------------+------------------------------------- Reporter: nh2 | Owner: simonmar Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 7.10.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Comment (by fryguybob): Also I'll note that the optimal mapping of capabilities to threads is very workload dependent. It also seems likely that, in the near future at least, the gains from finding the best mapping over what the OS gives you will continue to increase. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10229#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10229: setThreadAffinity assumes a certain CPU virtual core layout -------------------------------------+------------------------------------- Reporter: nh2 | Owner: simonmar Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 7.10.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #1741 | Differential Revisions: -------------------------------------+------------------------------------- Changes (by thomie): * related: => #1741 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10229#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10229: setThreadAffinity assumes a certain CPU virtual core layout -------------------------------------+------------------------------------- Reporter: nh2 | Owner: simonmar Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 7.10.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #1741 | Differential Revisions: -------------------------------------+------------------------------------- Comment (by fryguybob): Phab:D800 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10229#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10229: setThreadAffinity assumes a certain CPU virtual core layout -------------------------------------+------------------------------------- Reporter: nh2 | Owner: simonmar Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 7.10.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #1741 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by maoe): * cc: maoe (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10229#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10229: setThreadAffinity assumes a certain CPU virtual core layout -------------------------------------+------------------------------------- Reporter: nh2 | Owner: simonmar Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 7.10.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #1741 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by pacak): * cc: pacak (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10229#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC