On Wed, Jul 18, 2012 at 4:10 AM, Bardur Arantsson <spam@scientician.net> wrote:
The most robust way is probably to use a completely independent
supervisor program, e.g. "upstart", "systemd", "runit", etc. These
usually have facilities for restarting the supervised program, and a
rate limit on exactly how often to try that (over a given period of time).

These *won't* work for a program that's deadlocked because an important
thread has died. For that you'll need either a watchdog (external) or an
in-program mechanism for "supervised threads" which can catch any and
all exceptions and restart threads as necessary. This tends to very
domain-specific, but you might take some inspiration for the way
supervisor hierarchies work in the actor model.

Hi Bardur, the "supervised threads" sounds like a good approach for me. Thanks!