
On Thu, 2008-05-08 at 23:29 +0100, Edsko de Vries wrote:
Hi,
How can I know whether something will be stack or heap allocated? For example, in the standard example of why
foldl (+) 0
will fail to evaluate a long list of integers due to a stack overflow, but foldl' won't, it is pointed out that foldl starts building up unevaluated thunks. So, apparently, these thunks are allocated on the stack rather than on the heap with a pointer to the thunk on the stack. (I understand that foldl' is asymptotically better than foldl space-wise; that is irrelevant to my question.)
On the other hand, in this version that sums all the values in a tree
data Tree = Leaf Int | Node Tree Tree
sum :: Tree -> Int sum t = sum' [t] 0 where sum' [] acc = acc sum' (Leaf i : ts) acc = sum' ts $! (i + acc) sum' (Node l r : ts) acc = sum' (l : r : ts) acc
we are building up a (potentially) large lists of trees yet to be processed, but never run out of stack space. Apparently, the list is built up on the heap rather than on the stack.
What is the fundamental difference between these two examples? Why is the list of trees yet to be processed allocated on the heap (with a pointer on the stack, presumably) but the unevaluated thunks in the foldl example allocated on the stack?
No, the thunks are (usually) stored on the heap. You don't get the stack overflow until you actually force the computation at which point you have an expression like: (...(((1+2)+3)+4) ... + 10000000) which requires stack in proportion to the number of nested parentheses (effectively)