Re: [Haskell-cafe] Is Haskell a Good Choice for Web Applications? (ANN: Vocabulink)

7 May 2009

      Jason Dagit wrote:
...
On Wed, May 6, 2009 at 3:54 PM, Anton van Straaten
 wrote:
...
FWIW, I have an internal HAppS application that's been running continuously
since November last year, used daily, with stable memory usage.
Do you have advice about the way you wrote you app?  Things you
knowingly did to avoid space leaks?  Maybe a blog about your HAppS
app?
The app is written for a client under NDA, so a blog about it would have 
to be annoyingly vague.  But I don't think there's much mystery about 
why it doesn't leak:

The app does simulations.  Each simulation uses at least about 10MB of 
memory, more depending on parameters.  Typically a few thousand 
simulations are run successively, and the results are aggregated and 
analyzed.  The computation itself is purely functional - it takes some 
input parameters and produces results.  The results are written to a 
file.  Since each run of a set of simulations is essentially 
independent, there's not much risk of space leaks persisting across runs.

No doubt the potential for encountering space leaks goes up as one 
writes less pure code, persist more things in memory, and depend on more 
libraries.  My main point in mentioning my app is that "long-running" 
isn't really the issue - that's just a way of saying that an app has 
space leaks that are small enough not to be noticed until it's stressed.
...
...
In my experience, it's not hard to write stable long-running code
in good implementations of languages like Haskell, Scheme, Common Lisp, or
Java.
There are certainly cases where no automatic garbage collector could
know when it is safe to collect certain things.
If there are bugs in the user's program, sure - but that still doesn't 
make it "hard" to write applications that don't leak, given a decent GC. 
  On the contrary, I'd say it's very easy, in the great majority of cases.
...
A quick google search
for java space leaks turned up this article:
http://www.ibm.com/developerworks/java/library/j-leaks/
I think wikipedia uses the term "logical leak" for the type of space
leak I'm thinking of.  The garbage collector thinks you care about an
object but in fact, you want it to be freed.  Yes, it's because of a
bug, but these are bugs that tend to be subtle and tedious.
The example given in the IBM article is quite typical, but isn't subtle 
at all - it was simply an object being added to a table and never being 
removed.  You can often find such bugs quite easily by searching the 
source tree, without touching a debugging tool.  It's also possible to 
prevent them quite easily, with good coding practices (e.g. centralize 
uses of long-lived tables) and some simple code auditing practices.

If you're dealing with code that's complex enough to involve the kinds 
of non-trivial mutually dependent references that you need in order to 
encounter truly subtle instances of these bugs, the increased difficulty 
of memory management comes with the territory, i.e. it's harder because 
the application is harder.
...
The ambiguity is me thinking of relative cost of finding/fixing these
bugs.
To put this back into context, I was objecting to your having extended 
the space leak worrying to all GC'd languages.  I'm saying that it isn't 
hard, using most decent language implementations, to avoid space leaks. 
  For trivial cases such as the IBM example, it should be no harder in 
Haskell, either - possibly easier, since use of things like mutable 
tables is more controlled, and may be rarer.

However, Haskell does theoretically introduce a new class of dangers for 
space leaks, I'm not denying that.  Being pure and lazy introduces its 
own set of space leak risks.  But on that front, I was disturbed by the 
vagueness of the claims about long-running apps.  I haven't seen any 
solid justification for scaring people off about writing long-running 
apps in Haskell.  If there is such a justification, it needs to be more 
clearly identified.
...
Testing for correctness is something we tend to automate very
well.
How do you automate testing for performance under load?  Space usage is 
a similar kind of dynamic issue, in general.
...
So then, at some point we must have a bag of tricks for dealing with
these space leaks.  I want to talk about those tricks.  I'm not
talking about bugs in a specific program, but instead about techniques
and styles that are known to work well in practice.
OK.  That's a bit different from FFT's original contention, "hard to 
contain a long-running Haskell application in a finite amount of 
memory."  For my own part, I'm at least as non-strict as Haskell, and 
that bag of tricks, for me, is a thunk that hasn't yet been forced.

Anton