
David Leimbach
As some of you may know, I've been writing commercial Haskell code for a little bit here (about a year and a half) and I've just recently had to write some code that was going to run have to run for a really long time before being restarted, possibly months or years if all other parts of the system cooperate as it's part of a server infrastructure management system.
I recently ran into some serious space leak difficulties that would ultimately cause this program to crash some time after startup (my simulator is also written in Haskell, and runs a LOT faster than the real application ever could, this has enabled me to fast forward a bit the data growth issues and crash in minutes instead of days!)
Anyway, rather than try to paste it all here with images and such I thought I'd stick it up on my blog so others could maybe benefit from the anecdote. It's difficult to disclose enough useful information as it is commercial code not under an open source license, but there's neat diagrams and stuff there so hopefully the colors are at least amusing :-)
http://leimy9.blogspot.com/2009/11/long-running-haskell-applications.html Can you copy you blog at here? http://leimy9.blogspot.com/2009/11/long-running-haskell-applications.html is filter by GFW (http://en.wikipedia.org/wiki/Golden_Shield_Project) i can't see it.
About crash free program, you can consider multi-process design, keep Simple and Stable core running in RootProcess, and Core module won't crash, and make unstable module running in ChildProcess, if you occur some un-catch exception from ChildProcess, just reboot those sub-module. I'm research some Haskell/Gtk+ program, sometimes un-catch exception is unavoidable, multi-process is good chose to avoid some exception crash all program. Cheers, -- Andy