I wrote a program that uses a timed thread to collect data from a C producer (using FFI). The number of threads in C producer are fixed (and created at init). One haskell timer thread uses threadDelay to run itself on timed interval. When I look at RTS output after killing the program after couple of timer iterations, I see number of worker tasks increasing with time.

 For example, below is an output after 20 iterations of timer event:

                      MUT time (elapsed)       GC time  (elapsed)
  Task  0 (worker) :    0.00s    (  0.00s)       0.00s    (  0.00s)
  Task  1 (worker) :    0.00s    (  0.00s)       0.00s    (  0.00s)
  .......output until task 37 snipped as it is same as task 1.......
  Task 38 (worker) :    0.07s    (  0.09s)       0.00s    (  0.00s)
  Task 39 (worker) :    0.07s    (  0.09s)       0.00s    (  0.00s)
  Task 40 (worker) :    0.18s    ( 10.20s)       0.00s    (  0.00s)
  Task 41 (worker) :    0.18s    ( 10.20s)       0.00s    (  0.00s)
  Task 42 (worker) :    0.18s    ( 10.20s)       0.00s    (  0.00s)
  Task 43 (worker) :    0.18s    ( 10.20s)       0.00s    (  0.00s)
  Task 44 (worker) :    0.52s    ( 10.74s)       0.00s    (  0.00s)
  Task 45 (worker) :    0.52s    ( 10.75s)       0.00s    (  0.00s)
  Task 46 (worker) :    0.52s    ( 10.75s)       0.00s    (  0.00s)
  Task 47 (bound)  :    0.00s    (  0.00s)       0.00s    (  0.00s)


After two iterations of timer event:

                       MUT time (elapsed)       GC time  (elapsed)
  Task  0 (worker) :    0.00s    (  0.00s)       0.00s    (  0.00s)
  Task  1 (worker) :    0.00s    (  0.00s)       0.00s    (  0.00s)
  Task  2 (worker) :    0.07s    (  0.09s)       0.00s    (  0.00s)
  Task  3 (worker) :    0.07s    (  0.09s)       0.00s    (  0.00s)
  Task  4 (worker) :    0.16s    (  1.21s)       0.00s    (  0.00s)
  Task  5 (worker) :    0.16s    (  1.21s)       0.00s    (  0.00s)
  Task  6 (worker) :    0.16s    (  1.21s)       0.00s    (  0.00s)
  Task  7 (worker) :    0.16s    (  1.21s)       0.00s    (  0.00s)
  Task  8 (worker) :    0.48s    (  1.80s)       0.00s    (  0.00s)
  Task  9 (worker) :    0.48s    (  1.81s)       0.00s    (  0.00s)
  Task 10 (worker) :    0.48s    (  1.81s)       0.00s    (  0.00s)
  Task 11 (bound)  :    0.00s    (  0.00s)       0.00s    (  0.00s)


Haskell code has one forkIO call to kick off C FFI - C FFI creates 8 threads. Runtime options are "-N3 +RTS -s". timer event is kicked off after forkIO. It is for the form (pseudo-code):

timerevent <other arguments> time = run where run = do threadDelay time >> do some work >> run where <other variables defined for run function>

I also wrote a simpler code using just timer event (fork one timer event, and run another timer event after that), but didn't see any tasks in RTS output. 

I tried searching GHC page for documentation on RTS output, but didn't find anything that could help me debug above issue. I suspect that timer event is the root cause of increasing number of tasks (with all but last 9 tasks idle -  I guess 8 tasks belong to C FFI, and one task to timerevent thread), and hence, memory leak. 

I will appreciate pointers on how to debug it. The timerevent does forkIO a call to send collected data from C FFI to a db server, but disabling that fork still results in the issue of increasing number of tasks. So, it seems strongly correlated with timer event though I am unable to reproduce it with a simpler version of timer event (which removes mvar sync/callback from C FFI).