
Hello Folks, Just wondering about this. Please understand I'm not asking why programs use a lot of stack sometimes, but specifically why is using a lot of stack (vs. using a lot of heap) generally regarded as "bad". Or at least it seems that way given that ghc run time makes distinction between the two and sets separate limits for them (default max stack size being relatively small whereas default max heap size in unlimited). So programs can fail with a stack overflow despite having bucket loads of heap available? Frankly I don't care if my program fails because it's used a lot of stack or a lot of heap. I would rather set some common memory budget and have them fail if that budget was exceeded. This policy seems to have unfortunate consequences. Sometimes you end up re-writing stuff in a manner that just trades stack use for heap use (I.E. doesn't do anything to reduce overall memory consumption). Given the cost of reclaiming heap is rather high (compared to stack), this seems like bad idea (the version that used a lot of stack would be better IMO if only it didn't risk stack overflow). Example.. -- Strict version of take stackGobbler :: Int -> [x] -> [x] stackGobbler 0 _ = [] stackGobbler _ [] = [] stackGobbler n (x:xs) = let xs' = stackGobbler (n-1) xs in xs' `seq` (x:xs') -- Another strict version of take heapGobbler :: Int -> [x] -> [x] heapGobbler = heapGobbler' [] where heapGobbler' rxs 0 _ = reverse rxs heapGobbler' rxs _ [] = reverse rxs heapGobbler' rxs n (x:xs) = heapGobbler' (x:rxs) (n-1) xs But I guess everyone here is already aware of this, hence the question (current ghc memory system design seems a bit odd, but maybe there's a good reason why the rts can't work the way I would like). Thanks -- Adrian Hey

On Thu, 2007-05-03 at 16:24 +0100, Adrian Hey wrote:
Hello Folks,
Just wondering about this. Please understand I'm not asking why programs use a lot of stack sometimes, but specifically why is using a lot of stack (vs. using a lot of heap) generally regarded as "bad". Or at least it seems that way given that ghc run time makes distinction between the two and sets separate limits for them (default max stack size being relatively small whereas default max heap size in unlimited). So programs can fail with a stack overflow despite having bucket loads of heap available?
Perhaps it's there to help people who write simple non-terminating recursion. They'll get an error message fairly soon rather than using all memory on the machine and invoking the wrath of the OOM killer. Duncan

Duncan Coutts wrote:
On Thu, 2007-05-03 at 16:24 +0100, Adrian Hey wrote:
Hello Folks,
Just wondering about this. Please understand I'm not asking why programs use a lot of stack sometimes, but specifically why is using a lot of stack (vs. using a lot of heap) generally regarded as "bad". Or at least it seems that way given that ghc run time makes distinction between the two and sets separate limits for them (default max stack size being relatively small whereas default max heap size in unlimited). So programs can fail with a stack overflow despite having bucket loads of heap available?
Perhaps it's there to help people who write simple non-terminating recursion. They'll get an error message fairly soon rather than using all memory on the machine and invoking the wrath of the OOM killer.
Hmm, I still don't see why a "stack leak" should be treated differently from "heap leak". They'll both kill your program in the end. Regards -- Adrian Hey

On Thu, May 03, 2007 at 05:40:24PM +0100, Adrian Hey wrote:
Duncan Coutts wrote:
On Thu, 2007-05-03 at 16:24 +0100, Adrian Hey wrote:
Hello Folks,
Just wondering about this. Please understand I'm not asking why programs use a lot of stack sometimes, but specifically why is using a lot of stack (vs. using a lot of heap) generally regarded as "bad". Or at least it seems that way given that ghc run time makes distinction between the two and sets separate limits for them (default max stack size being relatively small whereas default max heap size in unlimited). So programs can fail with a stack overflow despite having bucket loads of heap available?
Perhaps it's there to help people who write simple non-terminating recursion. They'll get an error message fairly soon rather than using all memory on the machine and invoking the wrath of the OOM killer.
Hmm, I still don't see why a "stack leak" should be treated differently from "heap leak". They'll both kill your program in the end.
I believe it is because a stack cannot be garbage collected, and must be traversed as roots for every garbage collection. I don't think there are any issues with a huge stack per se, but it does not play nice with garbage collection so may hurt your performance and memory usage in unforeseen ways. John -- John Meacham - ⑆repetae.net⑆john⑈

On Thu, May 03, 2007 at 04:59:58PM -0700, John Meacham wrote:
I believe it is because a stack cannot be garbage collected, and must be traversed as roots for every garbage collection. I don't think there are any issues with a huge stack per se, but it does not play nice with garbage collection so may hurt your performance and memory usage in unforeseen ways.
Isn't it just the top of the stack that has to be treadted as a root? (maybe you need to walk the stack to find exception handlers and so on.) Maybe it shouldn't be so much worse than a heap. The Chicken Scheme system allocates everything on the C stack, and runs some sort of compacting collector when it is about to fill. Brandon

On Thu, May 03, 2007 at 05:36:45PM -0700, Brandon Michael Moore wrote:
On Thu, May 03, 2007 at 04:59:58PM -0700, John Meacham wrote:
I believe it is because a stack cannot be garbage collected, and must be traversed as roots for every garbage collection. I don't think there are any issues with a huge stack per se, but it does not play nice with garbage collection so may hurt your performance and memory usage in unforeseen ways.
Isn't it just the top of the stack that has to be treadted as a root? (maybe you need to walk the stack to find exception handlers and so on.) Maybe it shouldn't be so much worse than a heap. The Chicken Scheme system allocates everything on the C stack, and runs some sort of compacting collector when it is about to fill.
GHC uses a simple exponential-backoff algorithm for handling stacks. When the stack pointer passes the stack limit, the thread yields to the scheduler, where the stack size is doubled, and the old stack is moved. Perhaps instead we could modify the algorithm such that up to 16K stack size the behaivor is the same, but use linked lists for larger? 1. Allocate a new chunk, of size 16KB. 2. Copy the topmost 1KB of stack to the new block. This decreases storage efficiency slightly (but not much in time - memcpy is several times faster than anything the current ghc code generator can generate), but it avoids a nasty corner case (stack size fluctuating across 0 mod 16K) by acting as a form of hysteresis. 3. Create a special frame at the bottom of the new stack chunk that returns into a stack underflow handler, thus avoiding the need for yet another conditional. Yes, 16KB unrolled linked lists are virtually as fast as flat byte arrays; see Data.ByteString.Lazy if you want proof. The only hard part appears to be step 3, as it requires finding an appropriate place to insert the trampoline frame; it seems plausible that stack frames are sufficiently self describing to make this a simple exercise of loops, but it could be much much harder. With this change GHC stacks could fill the whole heap with little more performance degradation than ordinary objects already give. Stefan

Stefan O'Rear wrote:
On Thu, May 03, 2007 at 05:36:45PM -0700, Brandon Michael Moore wrote:
On Thu, May 03, 2007 at 04:59:58PM -0700, John Meacham wrote:
I believe it is because a stack cannot be garbage collected, and must be traversed as roots for every garbage collection. I don't think there are any issues with a huge stack per se, but it does not play nice with garbage collection so may hurt your performance and memory usage in unforeseen ways. Isn't it just the top of the stack that has to be treadted as a root? (maybe you need to walk the stack to find exception handlers and so on.) Maybe it shouldn't be so much worse than a heap. The Chicken Scheme system allocates everything on the C stack, and runs some sort of compacting collector when it is about to fill.
GHC uses a simple exponential-backoff algorithm for handling stacks. When the stack pointer passes the stack limit, the thread yields to the scheduler, where the stack size is doubled, and the old stack is moved. Perhaps instead we could modify the algorithm such that up to 16K stack size the behaivor is the same, but use linked lists for larger?
1. Allocate a new chunk, of size 16KB.
2. Copy the topmost 1KB of stack to the new block. This decreases storage efficiency slightly (but not much in time - memcpy is several times faster than anything the current ghc code generator can generate), but it avoids a nasty corner case (stack size fluctuating across 0 mod 16K) by acting as a form of hysteresis.
3. Create a special frame at the bottom of the new stack chunk that returns into a stack underflow handler, thus avoiding the need for yet another conditional.
GHC in the distant past (pre-4.0) had linked stack chunks, but we discarded the idea when the RTS was rewritten, mainly for simplicity. It was before my time, so I don't know all the war stories, but I think it turned out to be hairy enough to avoid it in the redesign. At least it would require separate TSO and STACK objects, the underflow frame, code to avoid performance degradation at the boundary (e.g. as you say, copy the top 1k of stack to the new chunk), stack walking gets more complicated (several places in the RTS do this). My impression is that the performance of stack growth isn't usually a limiting factor, so I'm not sure it's worth adding this complexity. Benchmarks can be convincing, though... Cheers, Simon

Brandon Michael Moore wrote:
On Thu, May 03, 2007 at 04:59:58PM -0700, John Meacham wrote:
I believe it is because a stack cannot be garbage collected, and must be traversed as roots for every garbage collection. I don't think there are any issues with a huge stack per se, but it does not play nice with garbage collection so may hurt your performance and memory usage in unforeseen ways.
Isn't it just the top of the stack that has to be treadted as a root?
No, because the functions below the top are still active and would usually be referencing other heap locations not visible to the top function. On another note, why use stacks in the first place? Couldn't all activation records just be heap allocated? Regards, Brian.

John Meacham wrote:
I believe it is because a stack cannot be garbage collected, and must be traversed as roots for every garbage collection. I don't think there are any issues with a huge stack per se, but it does not play nice with garbage collection so may hurt your performance and memory usage in unforeseen ways.
I'm still not convinced :-( I also don't believe it's in anybodys interest to have programs failing for no good reason. A good reason to fail is if overall memory demands are getting stupid. Failing because the stack has grown beyond some arbitrary (and typically small) size seems bad to me. I know that this is to a certain extent this is controllable using RTS options, but this is no use to me as a library writer tying to chose between stackGobbler and heapGobbler. The stuff should "just work" and not be dependent on the right RTS incantations being used when the final program is run. Regards -- Adrian Hey

Adrian Hey
Failing because the stack has grown beyond some arbitrary (and typically small) size seems bad to me.
Just FYI, nhc98 has a single memory area in which the stack and heap grow towards each other. Memory exhaustion only happens when the stack and heap meet in the middle and GC fails to reclaim any space. However, it can only do this because it is single-threaded. As soon as you need a separate stack for each thread in a multi-threaded system, this nice one-dimensional model breaks down. Regards, Malcolm

Malcolm Wallace wrote:
Just FYI, nhc98 has a single memory area in which the stack and heap grow towards each other. Memory exhaustion only happens when the stack and heap meet in the middle and GC fails to reclaim any space.
However, it can only do this because it is single-threaded. As soon as you need a separate stack for each thread in a multi-threaded system, this nice one-dimensional model breaks down.
Yes. A while ago I did the same thing with a toy FPL I was tinkering with. But like nhc98, it was single threaded. But I don't believe this is a serious extra complication. ghc does seem to have the capability to grow stacks effectively without bound (and presumably shrink them too), but it doesn't do this by default for reasons I don't understand. My preference would be to have an upper limit on total (stack+heap) memory used. Also, as Stefan has suggested, I think stack should grow linearly, not exponentially. But I don't really know enough about the innards of ghc rts to know what might or might not be possible/easy/difficult. Regards -- Adrian Hey

Adrian Hey wrote:
John Meacham wrote:
I believe it is because a stack cannot be garbage collected, and must be traversed as roots for every garbage collection. I don't think there are any issues with a huge stack per se, but it does not play nice with garbage collection so may hurt your performance and memory usage in unforeseen ways.
I'm still not convinced :-(
I also don't believe it's in anybodys interest to have programs failing for no good reason. A good reason to fail is if overall memory demands are getting stupid. Failing because the stack has grown beyond some arbitrary (and typically small) size seems bad to me.
I know that this is to a certain extent this is controllable using RTS options, but this is no use to me as a library writer tying to chose between stackGobbler and heapGobbler.
The stuff should "just work" and not be dependent on the right RTS incantations being used when the final program is run.
I'm more than happy to change the defaults, if there's some agreement on what the defaults should be. The current choice is somewhat historical - we used to have a bound on both heap size and stack size, but the heap size bound was removed because we felt that on balance it made life easier for more people, at the expense of a bit more pain when you write a leaky program. Also, it used to be the case that the OOM killer could be a bit unpredictable, killing vital system processes instead of the leaky Haskell program. I'm not sure if this is still true for current OS incarnations. Cheers, Simon

Simon Marlow wrote:
I'm more than happy to change the defaults, if there's some agreement on what the defaults should be. The current choice is somewhat historical - we used to have a bound on both heap size and stack size, but the heap size bound was removed because we felt that on balance it made life easier for more people, at the expense of a bit more pain when you write a leaky program.
Well in the light of what Stefan says about exponentially increasing stack size I'm not sure increasing (or removing) the default is the answer. Going from 16M to 32M to 64M stacks is bit drastic. It seems to me going up in sane sized linear increments would be better. But since we also want to avoid frequent copying of an already oversized stack I guess some linked list representation is what's needed. In fact I'd think what Stefan suggests or something very similar would be the way to go. But I have no idea how much work that would be. But to give programs best possible chance of running successfully then I think an (optional) overall limit on total memory use would be preferable (better than trying to guess how much stack space will be needed in advance). Regards -- Adrian Hey
participants (8)
-
Adrian Hey
-
Brandon Michael Moore
-
Brian Hulley
-
Duncan Coutts
-
John Meacham
-
Malcolm Wallace
-
Simon Marlow
-
Stefan O'Rear