Terminate unused worker threads

I finally got some spare time to do some GHC hacking, and after feeling my way around http://hackage.haskell.org/trac/ghc/ticket/4262 I came up with the following patch, which appears to work (that is, bound the number of worker threads hanging around after FFI calls): diff -rN -u old-ghc-clean/rts/Capability.c new-ghc-clean/rts/Capability.c --- old-ghc-clean/rts/Capability.c 2010-11-07 14:21:53.000000000 +0000 +++ new-ghc-clean/rts/Capability.c 2010-11-07 14:21:54.000000000 +0000 @@ -461,6 +461,16 @@ // This happens when the system is shutting down (see // Schedule.c:workerStart()). if (!isBoundTask(task) && !task->stopped) { + int i; + Task *t = cap->spare_workers; + // arbitrarily picked six spare workers as a good number + for (i = 1; t != NULL && i < 6; t = t->next, i++) {} + if (i >= 6) { + debugTrace(DEBUG_sched, "Lots of spare workers hanging around, terminating this thread"); + releaseCapability_(cap,rtsFalse); + RELEASE_LOCK(&cap->lock); + pthread_exit(NULL); + } task->next = cap->spare_workers; cap->spare_workers = task; } There are a few obvious questions with this patch: - We're doing a walk (albeit constant bounded) of the spare workers list; if we're actually not going to ever exceed X number maybe we should use a circular buffer or store the number of items in the queue. - It's not 100% clear to me if I've done enough cleanup (i.e. maybe other locks to release, items to free, etc?) - Of course, POSIX only, for now. Fortunately Windows is nice enough to give us ExitThread() so porting should be not a problem. Edward

Hi Edward, Sorry for taking so long to get around to reviewing this one. On 07/11/2010 14:29, Edward Z. Yang wrote:
I finally got some spare time to do some GHC hacking, and after feeling my way around http://hackage.haskell.org/trac/ghc/ticket/4262 I came up with the following patch, which appears to work (that is, bound the number of worker threads hanging around after FFI calls):
diff -rN -u old-ghc-clean/rts/Capability.c new-ghc-clean/rts/Capability.c --- old-ghc-clean/rts/Capability.c 2010-11-07 14:21:53.000000000 +0000 +++ new-ghc-clean/rts/Capability.c 2010-11-07 14:21:54.000000000 +0000 @@ -461,6 +461,16 @@ // This happens when the system is shutting down (see // Schedule.c:workerStart()). if (!isBoundTask(task)&& !task->stopped) { + int i; + Task *t = cap->spare_workers; + // arbitrarily picked six spare workers as a good number + for (i = 1; t != NULL&& i< 6; t = t->next, i++) {} + if (i>= 6) { + debugTrace(DEBUG_sched, "Lots of spare workers hanging around, terminating this thread"); + releaseCapability_(cap,rtsFalse); + RELEASE_LOCK(&cap->lock); + pthread_exit(NULL); + } task->next = cap->spare_workers; cap->spare_workers = task; }
There are a few obvious questions with this patch:
- We're doing a walk (albeit constant bounded) of the spare workers list; if we're actually not going to ever exceed X number maybe we should use a circular buffer or store the number of items in the queue.
I suggest keeping track of the number of items in the queue.
- It's not 100% clear to me if I've done enough cleanup (i.e. maybe other locks to release, items to free, etc?)
- Of course, POSIX only, for now. Fortunately Windows is nice enough to give us ExitThread() so porting should be not a problem.
So a worker thread is normally expected to exit by - returning from schedule() (Schedule.c line 291) - back to scheduleWorker(), which calls releaseCapability_() and workerTaskStop() So I think the main thing missing is a call to workerTaskStop(). It would be really nice if we could arrange that in the case where we have too many spare workers, the extra workers exit via the same route, but I suspect that might not be easy. Cheers, Simon

Excerpts from Simon Marlow's message of Wed Nov 17 06:15:46 -0500 2010:
I suggest keeping track of the number of items in the queue.
Ok, I added a spare_workers_no field to the Capability struct.
So I think the main thing missing is a call to workerTaskStop().
Added.
It would be really nice if we could arrange that in the case where we have too many spare workers, the extra workers exit via the same route, but I suspect that might not be easy.
Do you mean, spare workers that have useful tasks to do? Edward

Hi all,
Apologies for commenting before understanding Capability.c very well. But
it seems that this file uses locking quite heavily. Has there been an
analysis of whether atomic memory ops and lock free algorithms could play
any role here?
Simon mentioned keeping track of the number of items in the queue so as not
to have to traverse it while holding the lock. That, for example, seems
that it could be accomplished with an atomically modified counter.
Cheers,
-Ryan
On Wed, Nov 17, 2010 at 10:25 AM, Edward Z. Yang
Excerpts from Simon Marlow's message of Wed Nov 17 06:15:46 -0500 2010:
I suggest keeping track of the number of items in the queue.
Ok, I added a spare_workers_no field to the Capability struct.
So I think the main thing missing is a call to workerTaskStop().
Added.
It would be really nice if we could arrange that in the case where we have too many spare workers, the extra workers exit via the same route, but I suspect that might not be easy.
Do you mean, spare workers that have useful tasks to do?
Edward _______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Hello Ryan, Adding the extra spare_workers_no field bookkeeping doesn't add very much overhead, since whenever spare_worker's is changing we already have taken out the lock. As for using a lock-free data structures, I can't say I have a good feel for how performance might change in that case. :-) Edward

On 17/11/2010 18:55, Ryan Newton wrote:
Hi all,
Apologies for commenting before understanding Capability.c very well. But it seems that this file uses locking quite heavily. Has there been an analysis of whether atomic memory ops and lock free algorithms could play any role here?
The locks here are mostly used when handing over the Capability between OS threads, or when an OS thread is going idle, and neither of these are going to be made faster by using lock-free data structures. In fact, the lock on the Capability is only used to protect a very few fields - most of the fields are private to the Capability and require no locking. However, there is one place that I think a better implementation could be used. Each Capability has a message queue, which needs to behave like a multiple-writer single-reader buffer (order is unimportant). Currently it's a linked list protected by a lock, and this is bad - multiple writers contend for the lock with the reader. Having the reader take the whole buffer in one go was an improvement, but doesn't completely solve the problem. Something I need to be careful of is that the Capability should never go idle when the queue is non-empty (we don't want a lost wakeup), so presumably the lock is needed when the queue goes from empty to non-empty, but otherwise a lock-free operation to add items should suffice. I haven't trawled the literature yet - is there a known data structure that would be good here? Cheers, Simon
Simon mentioned keeping track of the number of items in the queue so as not to have to traverse it while holding the lock. That, for example, seems that it could be accomplished with an atomically modified counter.
Cheers, -Ryan
On Wed, Nov 17, 2010 at 10:25 AM, Edward Z. Yang
mailto:ezyang@mit.edu> wrote: Excerpts from Simon Marlow's message of Wed Nov 17 06:15:46 -0500 2010: > I suggest keeping track of the number of items in the queue.
Ok, I added a spare_workers_no field to the Capability struct.
> So I think the main thing missing is a call to workerTaskStop().
Added.
> It would be really nice if we could arrange that in the case where we > have too many spare workers, the extra workers exit via the same route, > but I suspect that might not be easy.
Do you mean, spare workers that have useful tasks to do?
Edward _______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org mailto:Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
participants (3)
-
Edward Z. Yang
-
Ryan Newton
-
Simon Marlow