
On Wed, Jul 18, 2012 at 4:10 AM, Bardur Arantsson
The most robust way is probably to use a completely independent supervisor program, e.g. "upstart", "systemd", "runit", etc. These usually have facilities for restarting the supervised program, and a rate limit on exactly how often to try that (over a given period of time).
These *won't* work for a program that's deadlocked because an important thread has died. For that you'll need either a watchdog (external) or an in-program mechanism for "supervised threads" which can catch any and all exceptions and restart threads as necessary. This tends to very domain-specific, but you might take some inspiration for the way supervisor hierarchies work in the actor model.
Hi Bardur, the "supervised threads" sounds like a good approach for me. Thanks!