
Greeting, Jitter vs Throughput ==================== Scenario -------- I have the following scenario: CPU with [C] cores concurrent program the 1 main thread uses OpenGL for animated visual output [W] worker threads uses FFI to lengthy numerical computations with the following desires : (J) minimize jitter : the 1 main thread needs to be responsive (T) maximize throughput: idle CPU time is a waste of time The problem is, I want both (J) and (T)! Benchmarks ---------- Some rough benchmarks from my 'production' implementation[1] where: jitter = stddev of actual period (target period = 40ms) idle = C - (real time) / (user+sys time) (cores used elsewhere) (but for example Xorg will use some time for the OpenGL output, etc) C W jitter idle 2 0 1.5ms 168% threaded RTS 2 1 1.9ms 65% " " 2 2 9.4ms 46% " " 2 3 13.3ms 37% " " For comparison, a (very) minimal GLUT program gives: 0.3ms threaded RTS 0.5ms non-threaded RTS Picking (J) over (T), then setting W=1 when C=2 gives best jitter Picking (T) over (J), then setting W=2 when C=2 gives best throughput Question -------- What is the best way to re-structure the program to have both low jitter and high throughput? Options I am considering: 1. worker threads estimate the time needed to complete each job, and don't start a job if it is likely to break the deadline (bonus points if just those worker Haskell threads running on the same OS thread as the main Haskell thread need to pause) 2. change the foreign code to break jobs into smaller pieces, for example perhaps something like: worker :: IO (IO Bool) -> IO () worker getJob = forever $ do job <- getJob whileM_ job yield -- [2] instead of worker :: IO (IO ()) -> IO () worker = forever . join 3. re-implement the foreign code in Haskell instead of C and hope that GHC makes the Haskell go as fast as GCC makes the C go 5. wait (for a long time) for a new RTS with: full pre-emption (including interrupting/resuming foreign code) user settable thread priorities (1) is "a fun challenge" (there may be PhDs awarded for less, I imagine) (2) isn't quite trivial, some scope for tuning subjob chunk size (3) is boring translation but could lead to interesting benchmarks even (especially?) if it fails to be as fast as C Which would you pick? Links ----- [1] http://hackage.haskell.org/package/mandulia [2] http://hackage.haskell.org/packages/archive/monad-loops/latest/doc/html/Cont... Thanks for any insight and advice, Claude -- http://claudiusmaximus.goto10.org
participants (1)
-
Claude Heiland-Allen