
I'm constantly surprised hearing from so many people about their space problems. I cannot remember having space problems with my programs. I don't know what everybody else is doing wrong :-)
At least two common cases.
Extracting compact data structures from large files. The contents of the large file is read as a linked list (ugh) of pointers (double ugh) to 32-bit Chars (triple ugh) -- twelve times the size of the file, if my calculations are correct. The contents can't be GC'ed before the extracted data is fully evaluated. (Now if the file was an mmap'ed array, it wouldn't be so bad, perhaps in the next generation IO that people are discussing this will be easier?)
Naive use of foldl. I tend to think the default foldl should be strict (ie. replaced by foldl') -- are there important cases where it needs to be lazy?
Indeed, extracting a compact data structure from a large data structure can easily cost much space because of lazy evaluation. "foldl" is probably used mostly for that purpose. Me not having space problems is probably related to the kind of programs I write. Most of my programs are program transformations that take an abstract syntax tree as input and produce a different abstract syntax tree (again a large structure). Trying to be lazy then means trying to produce as much output as possible with processing as little output as possible. More formally that means if there is some partial input for a function such that for all completions of this partial input to fully defined inputs all outputs of the function have a common prefix, then the function should already yield this prefix as output for the partial input. In the example that I mentioned in my previous posting I did actually originally compute size information for a large data structure, so did extract something compact from something large. However, I then realised that I only did some very basic arithmetic with the size information before generating another large data structure of this computed size. So then I decided to not to compute integers at all but do the arithmetic directly on the algebraic data type. Gone was the space problem, without using seq. You might also want to look at Colin Runciman's paper "What about the Natural Numbers?" in the Journal of Functional Programming in which he argues for a type of lazy natural numbers, with the same semantics as data Nat = Zero | Succ Nat. It fits much better for computing the size of a lazy data structure. I don't claim that all space problems can easily be dealt with. Olaf