
Hi, How can I know whether something will be stack or heap allocated? For example, in the standard example of why foldl (+) 0 will fail to evaluate a long list of integers due to a stack overflow, but foldl' won't, it is pointed out that foldl starts building up unevaluated thunks. So, apparently, these thunks are allocated on the stack rather than on the heap with a pointer to the thunk on the stack. (I understand that foldl' is asymptotically better than foldl space-wise; that is irrelevant to my question.) On the other hand, in this version that sums all the values in a tree data Tree = Leaf Int | Node Tree Tree sum :: Tree -> Int sum t = sum' [t] 0 where sum' [] acc = acc sum' (Leaf i : ts) acc = sum' ts $! (i + acc) sum' (Node l r : ts) acc = sum' (l : r : ts) acc we are building up a (potentially) large lists of trees yet to be processed, but never run out of stack space. Apparently, the list is built up on the heap rather than on the stack. What is the fundamental difference between these two examples? Why is the list of trees yet to be processed allocated on the heap (with a pointer on the stack, presumably) but the unevaluated thunks in the foldl example allocated on the stack? Thanks, Edsko

On Thu, 2008-05-08 at 23:29 +0100, Edsko de Vries wrote:
Hi,
How can I know whether something will be stack or heap allocated? For example, in the standard example of why
foldl (+) 0
will fail to evaluate a long list of integers due to a stack overflow, but foldl' won't, it is pointed out that foldl starts building up unevaluated thunks. So, apparently, these thunks are allocated on the stack rather than on the heap with a pointer to the thunk on the stack. (I understand that foldl' is asymptotically better than foldl space-wise; that is irrelevant to my question.)
On the other hand, in this version that sums all the values in a tree
data Tree = Leaf Int | Node Tree Tree
sum :: Tree -> Int sum t = sum' [t] 0 where sum' [] acc = acc sum' (Leaf i : ts) acc = sum' ts $! (i + acc) sum' (Node l r : ts) acc = sum' (l : r : ts) acc
we are building up a (potentially) large lists of trees yet to be processed, but never run out of stack space. Apparently, the list is built up on the heap rather than on the stack.
What is the fundamental difference between these two examples? Why is the list of trees yet to be processed allocated on the heap (with a pointer on the stack, presumably) but the unevaluated thunks in the foldl example allocated on the stack?
No, the thunks are (usually) stored on the heap. You don't get the stack overflow until you actually force the computation at which point you have an expression like: (...(((1+2)+3)+4) ... + 10000000) which requires stack in proportion to the number of nested parentheses (effectively)

Hi,
No, the thunks are (usually) stored on the heap. You don't get the stack overflow until you actually force the computation at which point you have an expression like: (...(((1+2)+3)+4) ... + 10000000) which requires stack in proportion to the number of nested parentheses (effectively)
Ah, that makes! So does it make sense to talk about "tail recursive thunks"? Or does the evaluation of thunks always take stack space proportional to the "nesting level"? Edsko

Edsko de Vries
(...(((1+2)+3)+4) ... + 10000000) which requires stack in proportion to the number of nested parentheses
Ah, that makes! So does it make sense to talk about "tail recursive thunks"? Or does the evaluation of thunks always take stack space proportional to the "nesting level"?
The key reason why nested additions take stack space, is because (+) on Integers is *strict* in both arguments. If it were somehow non-strict instead, then the unevaluated parts of the number would be heap-allocated rather than stack-allocated. Regards, Malcolm

Hi,
(...(((1+2)+3)+4) ... + 10000000) which requires stack in proportion to the number of nested parentheses
Ah, that makes! So does it make sense to talk about "tail recursive thunks"? Or does the evaluation of thunks always take stack space proportional to the "nesting level"?
The key reason why nested additions take stack space, is because (+) on Integers is *strict* in both arguments. If it were somehow non-strict instead, then the unevaluated parts of the number would be heap-allocated rather than stack-allocated.
I don't understand. Could you elaborate? Thanks, Edsko

2008/5/10 Edsko de Vries
The key reason why nested additions take stack space, is because (+) on Integers is *strict* in both arguments. If it were somehow non-strict instead, then the unevaluated parts of the number would be heap-allocated rather than stack-allocated.
I don't understand. Could you elaborate?
The key problem here is not that a huge thunk is built : it is allocated on the heap. But when it is forced, the fact that (+) is strict means that all the calls of (+) that had been created must go on the stack... So in the example of "fold (+) 0 [long list]", the problem is not in the evaluation of foldl which only build a big thunk (with a tail recursion), the problem intervene when you must evaluate the big thunk since then you need to stack all the (+)... If (+) was lazy it wouldn't be a problem since you would only need to call one (+) to get a value (which will go in the heap). -- Jedaï

Edsko de Vries wrote:
sum :: Tree -> Int sum t = sum' [t] 0 where sum' [] acc = acc sum' (Leaf i : ts) acc = sum' ts $! (i + acc) sum' (Node l r : ts) acc = sum' (l : r : ts) acc
Because of $!, you should compare the Leaf case to foldl', not foldl. The Node case can be said to mimic a stack using heap resource.
participants (5)
-
Albert Y. C. Lai
-
Chaddaï Fouché
-
Derek Elkins
-
Edsko de Vries
-
Malcolm Wallace