
#8578: Improvements to SpinLock implementation -------------------------------------+------------------------------------ Reporter: parcs | Owner: parcs Type: task | Status: patch Priority: normal | Milestone: Component: Runtime System | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Unknown/Multiple Type of failure: None/Unknown | Difficulty: Unknown Test Case: | Blocked By: Blocking: | Related Tickets: -------------------------------------+------------------------------------ Comment (by simonmar): Here are my results with `-N4` on an Intel Core i7-3770 (4 cores, 8 threads). {{{ -------------------------------------------------------------------------------- Program Size Allocs Runtime Elapsed TotalMem -------------------------------------------------------------------------------- blackscholes +0.0% +0.0% -1.7% -2.4% -0.3% coins +0.0% -0.0% +0.4% +1.0% -8.6% gray +0.0% +0.0% +15.1% +14.3% +0.0% mandel +0.0% +0.0% +3.3% +3.3% -0.8% matmult +0.0% +8.1% -2.4% -2.6% +0.0% minimax +0.0% +0.0% -1.3% -1.1% +0.0% nbody +0.0% -6.0% -1.9% 0.06 +0.0% parfib +0.0% +0.1% +16.2% +16.2% +0.0% partree +0.0% -0.0% +1.0% +0.5% -3.0% prsa +0.0% -0.1% +1.1% +0.9% +0.0% queens +0.0% -0.5% -1.3% -0.5% +7.1% ray +0.0% -0.3% -0.4% -0.5% +0.0% sumeuler +0.0% +0.0% +1.0% +1.0% +0.0% transclos +0.0% +0.0% +1.2% +1.4% +0.0% -------------------------------------------------------------------------------- Min +0.0% -6.0% -2.4% -2.6% -8.6% Max +0.0% +8.1% +16.2% +16.2% +7.1% Geometric Mean +0.0% +0.1% +2.0% +2.3% -0.4% }}} Not good! Two programs (gray and parfib) are significantly worse. The effect is real, here is the timing info for parfib before and after: {{{ 5.70user 0.00system 0:01.43elapsed 397%CPU (0avgtext+0avgdata 20816maxresident)k 6.52user 0.00system 0:01.64elapsed 397%CPU (0avgtext+0avgdata 21568maxresident)k }}} I wonder whether not using a locked instruction in the spinlock might cause the loop to spin for longer, because it takes longer for the memory write to reach the core that is waiting for it? Someone could probably dig into this further with perf. But the lesson here, as usual, is to always benchmark and don't just assume that because it looks good it will work in practice! -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8578#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler