Re: [Haskell-cafe] Real-time garbage collection for Haskell

4 Mar 2010

      On 2010-03-01 19:37 +0000 (Mon), Thomas Schilling wrote:
...
A possible workaround would be to sprinkle lots of 'rnf's around your
code....
As I learned rather to my chagrin on a large project, you generally
don't want to do that. I spent a couple of days writing instance
of NFData and loading up my application with rnfs and then watched
performance fall into a sinkhole.

I believe that the problem is that rnf traverses the entirety of a large
data structure even if it's already strict and doesn't need traversal.
My guess is that doing this frequently on data structures (such as Maps)
of less than tiny size was blowing out my cache.

I switched strategies to forcing a deep(ish) evaluation of only
newly constructed data instead. For example, after inserting a newly
constructed object into a Map, I would look it up and force evaluation
only of the result of that lookup. That solved my space leak problem and
made things chug along quite nicely.

Understanding the general techniques for this sort of thing and seeing
where you're likely to need to apply them isn't all that difficult, once
you understand the problem. (It's probably much easier if you don't have
to work it out all for yourself, as I did. Someone needs to write the
"how to manage lazyness in Haskell" guide.) The difficult part of it is
that you've really got to stay on top of it, because if you don't, the
space leaks come back and you have to go find them again. It feels a
little like dealing with buffers and their lengths in C.

On 2010-03-01 16:06 -0500 (Mon), Job Vranish wrote:
...
All of our toplevel inputs will be strict, and if we keep our
frame-to-frame state strick, our variances in runtimes, given the same
inputs, should be quite low modulo the GC.
This is exactly the approach I need to take for the trading system. I
basically have various (concurrent) loops that process input, update
state, and possibly generate output. The system runs for about six
hours, processing five million or so input messages with other loops
running anywhere from hundreds of thousands to millions of times. The
trick is to make sure that I never, ever start a new loop with an
unevaluated thunk referring to data needed only by the previous loop,
because otherwise I just grow and grow and grow....

Some tool to help with this would be wonderful. There's something for
y'all to think about.

On 2010-03-01 22:01 +0000 (Mon), Thomas Schilling wrote:
...
As Job and John have pointed out, though, laziness per se doesn't seem
to be an issue, which is good to hear. Space leaks might, but there is
no clear evidence that they are particularly harder to avoid than in
strict languages.
As I mentioned above, overall I find them so. Any individual space
leak you're looking at is easy to fix, but the constant vigilance is
difficult.

cjs
-- 
Curt Sampson                  +81 90 7737 2974
             http://www.starling-software.com
The power of accurate observation is commonly called cynicism
by those who have not got it.    --George Bernard Shaw