[GHC] #14330: Sparks are not started promptly

#14330: Sparks are not started promptly -------------------------------------+------------------------------------- Reporter: andrewthad | Owner: (none) Type: feature | Status: new request | Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Keywords: sparks | Operating System: Unknown/Multiple Architecture: | Type of failure: None/Unknown Unknown/Multiple | Test Case: | Blocked By: Blocking: | Related Tickets: Differential Rev(s): | Wiki Page: -------------------------------------+------------------------------------- This was a question on StackOverflow. With some prompting from Yuras, I've decided to open this as an issue. Here is the original question (which has been satisfactorily answered): https://stackoverflow.com/questions/46586941/why-are-ghc-sparks- fizzling/46603680?noredirect=1#comment80163830_46603680 Here is a more narrowly tailored version of the code I have posted there: {{{#!hs {-# LANGUAGE BangPatterns #-} {-# OPTIONS_GHC -O2 -Wall -threaded -fforce-recomp #-} import Criterion.Main import Control.Parallel.Strategies (runEval,rpar,rseq) import qualified Data.Vector.Primitive as PV main :: IO () main = do let fewNumbers = PV.replicate 10000000 1.00000001 manyNumbers = PV.replicate 100000000 1.00000001 defaultMain [ bgroup "serial" [ bench "few" $ whnf serialProduct fewNumbers , bench "many" $ whnf serialProduct manyNumbers ] , bgroup "parallel" [ bench "few" $ whnf parallelProduct fewNumbers , bench "many" $ whnf parallelProduct manyNumbers ] ] serialProduct :: PV.Vector Double -> Double serialProduct v = let !len = PV.length v go :: Double -> Int -> Double go !d !ix = if ix < len then go (d * PV.unsafeIndex v ix) (ix + 1) else d in go 1.0 0 -- | This only works when the vector length is a multiple of 4. parallelProduct :: PV.Vector Double -> Double parallelProduct v = runEval $ do let chunk = div (PV.length v) 4 p2 <- rpar (serialProduct (PV.slice (chunk * 1) chunk v)) p3 <- rpar (serialProduct (PV.slice (chunk * 2) chunk v)) p4 <- rpar (serialProduct (PV.slice (chunk * 3) chunk v)) p1 <- rseq (serialProduct (PV.slice (chunk * 0) chunk v)) rseq (p1 * p2 * p3 * p4) }}} We can build and run this with: {{{
ghc -threaded parallel_compute.hs ./parallel_compute +RTS -N6 }}}
On my eight-core laptop, here are the results we get: {{{ benchmarking serial/few time 11.46 ms (11.29 ms .. 11.61 ms) 0.999 R² (0.998 R² .. 1.000 R²) mean 11.52 ms (11.44 ms .. 11.62 ms) std dev 222.8 μs (140.9 μs .. 299.6 μs) benchmarking serial/many time 118.1 ms (116.1 ms .. 120.0 ms) 1.000 R² (1.000 R² .. 1.000 R²) mean 117.2 ms (116.6 ms .. 117.9 ms) std dev 920.3 μs (550.5 μs .. 1.360 ms) variance introduced by outliers: 11% (moderately inflated) benchmarking parallel/few time 10.04 ms (9.968 ms .. 10.14 ms) 0.999 R² (0.999 R² .. 1.000 R²) mean 9.970 ms (9.891 ms .. 10.03 ms) std dev 172.9 μs (114.5 μs .. 282.9 μs) benchmarking parallel/many time 45.32 ms (43.55 ms .. 47.17 ms) 0.996 R² (0.993 R² .. 0.999 R²) mean 45.93 ms (44.71 ms .. 48.10 ms) std dev 3.041 ms (1.611 ms .. 4.654 ms) variance introduced by outliers: 20% (moderately inflated) }}} Interestingly, in the benchmark with the smaller 10,000,000 element vector, we see almost no performance improvement from the sparks. But, in the one with the larger 100,000,000 element vector, we see a considerable speedup. It runs 2.5-3.0x faster. The reason for this is that sparks are not started between scheduling intervals. By default, this happens every 20ms. We can see the fizzling like this: {{{
./parallel_compute 'parallel/few' +RTS -N6 -s benchmarking parallel/few ... SPARKS: 1536 (613 converted, 0 overflowed, 0 dud, 42 GC'd, 881 fizzled) ... ./parallel_compute 'parallel/many' +RTS -N6 -s benchmarking parallel/many ... SPARKS: 411 (411 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled) ... }}}
For application developers, it's possible to work around this by tweaking the scheduling interval: {{{
ghc -threaded -rtsopts parallel_compute.hs ./parallel_compute 'parallel/few' +RTS -N6 -s -C0.001 benchmarking parallel/few time 4.158 ms (4.013 ms .. 4.302 ms) 0.993 R² (0.987 R² .. 0.998 R²) mean 4.094 ms (4.054 ms .. 4.164 ms) std dev 178.5 μs (131.5 μs .. 243.7 μs) variance introduced by outliers: 24% (moderately inflated) ... SPARKS: 3687 (3687 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled) }}}
I suppose that in the normal case, if you're going to be sparking
Much better. But, there are two problems with this: 1. This may negatively impact the overall performance of an application. 2. It doesn't work at all for library developers. It isn't practical to tell end users of your to use certain runtime flags. I don't know enough about the RTS to suggest a way to improve this. However, intuitively, I would expect that if I spark something and there's an idle capability, the idle capability could immediately be given the spark instead of having it placed in the local queue. This may not be possible or may not be compatible with the minimal use of locks in the implementation of sparks though. Here is a comment I made in the StackOverflow thread: things, you should ensure that the work done by all the sparks plus the main thread takes well over 20ms. Otherwise, nearly everything will fizzle unless scheduling happens to be coming soon. I've always wondered about the threshold for how fine-grained sparks should be, and my understanding is now that this is roughly it. In short, I'd like for it to be possible to realize some of the benefits of parallelism for computations that take under 20ms without resorting to `forkIO` and `MVar`. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14330 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14330: Sparks are not started promptly -------------------------------------+------------------------------------- Reporter: andrewthad | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: sparks Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by jberryman): * type: feature request => bug Comment: I hope it's all right that I've modified this from "feature request" to "bug". This seems like not particularly well-understood or documented behavior. At least this user (me) still doesn't really understand the implications. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14330#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14330: Sparks are not started promptly -------------------------------------+------------------------------------- Reporter: andrewthad | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: sparks Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * cc: Jaffacake (added) Comment: Considering this to be a bug is perhaps reasonable, although perhaps only a documentation bug. I agree that the current behavior is a bit surprising. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14330#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

So how does the spark turn into a thread? When the scheduler spots that
#14330: Sparks are not started promptly -------------------------------------+------------------------------------- Reporter: andrewthad | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: sparks Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by winter): The current behavior is strange, from GHC Commentary/Rts/Scheduler: the current capability has no runnable threads, it checks the spark pool, and if there is a valid spark (a spark that points to a THUNK), then the spark is turned into a real thread and placed on the run queue: see createSparkThread in rts/Sparks.c. Also, the scheduler attempts to share its available sparks with any other idle capabilities: see schedulePushWork in rts/Schedule.c. Why do we have to wait until there is no runnable threads? A new spark should at least have a same priority with a new thread created with `forkIO`. The right way to do this IMHO is to ''always'' check the sparks pool within one scheduling loop, this can be done by start a sparks thread dedicated to check sparks just like I/O or timer manager. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14330#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14330: Sparks are not started promptly -------------------------------------+------------------------------------- Reporter: andrewthad | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.1 Resolution: | Keywords: sparks Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by lelf): * cc: lelf (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14330#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC