Stack, Heap and GHC

Hello everyone, I have been trying to run a Haskell program of mine that does an extensive computation with very large amounts of data. I compiled the program with ghc --make. When I run it it terminates after some time with the message: Stack space overflow: current size 8388608 bytes. Use `+RTS -Ksize' to increase it. The program isn't that well written so the overflow did not surprise me, I expected that it might run out of memory. What did surprise me was the *stack* overflow. I do not use recursion in my program except for a couple of fold operations over very large lists. So I have a number of questions: 1) Which Haskell operations cost space on the stack, which cost space on the heap? I guess this is implementation dependent, so I looked into the GHC manual but did not find an answer. Where can I look these things up? 2) What could be possible sources of a stack overflow? (Apart from a recursive but not tail-recursive function.) 3) I tried using +RTS -K<size> as suggested, but these options do not seem to be passed through if I use --make. How can I use both, these compilation flags and --make? Thanks, Felix

On Dec 14, 2006, at 10:00 , Felix Breuer wrote:
3) I tried using +RTS -K<size> as suggested, but these options do not seem to be passed through if I use --make. How can I use both, these compilation flags and --make?
They aren't compile options; they're runtime options. The GHC runtime intercepts +RTS options and processes them before passing the remaining arguments (if any) to your program. -- brandon s. allbery [linux,solaris,freebsd,perl] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH

On Thu, Dec 14, 2006 at 04:00:53PM +0100, Felix Breuer wrote:
Hello everyone,
I have been trying to run a Haskell program of mine that does an extensive computation with very large amounts of data. I compiled the program with ghc --make. When I run it it terminates after some time with the message:
Stack space overflow: current size 8388608 bytes. Use `+RTS -Ksize' to increase it.
The program isn't that well written so the overflow did not surprise me, I expected that it might run out of memory. What did surprise me was the *stack* overflow. I do not use recursion in my program except for a couple of fold operations over very large lists. So I have a number of questions:
1) Which Haskell operations cost space on the stack, which cost space on the heap? I guess this is implementation dependent, so I looked into the GHC manual but did not find an answer. Where can I look these things up?
Lazily evaluated functions seem to get stuck on the stack., so space on the stack tends to get used up by over-lazy programs, which take a long time before they actually evaluate anything. But I'm not quite clear myself when exactly things go on the heap or the stack.
2) What could be possible sources of a stack overflow? (Apart from a recursive but not tail-recursive function.)
It's probably your folds. I can never keep them straight, but quite likely switching to a stricter variant will help you, which are named with a "'" at the end, e.g. foldl'. If you post your program here, I'd guess someone will take a look at it and give you a better suggestion where the trouble is. It can be hard to track down, I'm afraid.
3) I tried using +RTS -K<size> as suggested, but these options do not seem to be passed through if I use --make. How can I use both, these compilation flags and --make?
You pass +RTS -K<size> to your executable, not when compiling (which would affect the stack of ghc). :) -- David Roundy Department of Physics Oregon State University

On Thu, Dec 14, 2006 at 04:00:53PM +0100, Felix Breuer wrote:
Hello everyone,
I have been trying to run a Haskell program of mine that does an extensive computation with very large amounts of data. I compiled the program with ghc --make. When I run it it terminates after some time with the message:
Stack space overflow: current size 8388608 bytes. Use `+RTS -Ksize' to increase it.
The program isn't that well written so the overflow did not surprise me, I expected that it might run out of memory. What did surprise me was the *stack* overflow. I do not use recursion in my program except for a couple of fold operations over very large lists. So I have a number of questions:
Here's a little program that can illustrate this issue: import Data.List largenum = 1000000 main = do putStrLn "strict foldl1" print $ foldl1' (\a b -> a + 1) $ [1..largenum] putStrLn "lazy foldl1" print $ foldl1 (\a b -> a + 1) $ [1..largenum] It gets through the first one, but not the second call, which differs only in the strictness of the foldl. You can make it use up more memory by making largenum a hundred times bigger, in which case for some reason it doesn't seem to have a stack error (although it hasn't completed on my computer, and uses something like 2G of memory). Perhaps the thunks are placed on the heap, and only when they are actually evaluated does anything go onto the stack? -- David Roundy Department of Physics Oregon State University

David Roundy wrote:
On Thu, Dec 14, 2006 at 04:00:53PM +0100, Felix Breuer wrote:
Hello everyone,
I have been trying to run a Haskell program of mine that does an extensive computation with very large amounts of data. I compiled the program with ghc --make. When I run it it terminates after some time with the message:
Stack space overflow: current size 8388608 bytes. Use `+RTS -Ksize' to increase it.
The program isn't that well written so the overflow did not surprise me, I expected that it might run out of memory. What did surprise me was the *stack* overflow. I do not use recursion in my program except for a couple of fold operations over very large lists. So I have a number of questions:
Here's a little program that can illustrate this issue:
import Data.List
largenum = 1000000
main = do putStrLn "strict foldl1" print $ foldl1' (\a b -> a + 1) $ [1..largenum] putStrLn "lazy foldl1" print $ foldl1 (\a b -> a + 1) $ [1..largenum]
It gets through the first one, but not the second call, which differs only in the strictness of the foldl. You can make it use up more memory by making largenum a hundred times bigger, in which case for some reason it doesn't seem to have a stack error (although it hasn't completed on my computer, and uses something like 2G of memory). Perhaps the thunks are placed on the heap, and only when they are actually evaluated does anything go onto the stack?
As I understand, that's exactly how it works. If you enter a thunk, all the computation you need to do to force it uses the stack. In particular, if your thunk looks like (1 + (1 + (1 + ...))) then it's wrapping up a huge chunk of evaluation that has to be done all at once, and probably blows out the stack when it evaluates 1, pushes it and starts evaluating the right argument, which means evaluating a 1, pushing it, and starting the right argument ... Brandon

On Thu, 14 Dec 2006 15:31:54 -0800, David Roundy
import Data.List
largenum = 1000000
main = do putStrLn "strict foldl1" print $ foldl1' (\a b -> a + 1) $ [1..largenum] putStrLn "lazy foldl1" print $ foldl1 (\a b -> a + 1) $ [1..largenum]
It gets through the first one, but not the second call, which differs only in the strictness of the foldl. You can make it use up more memory by making largenum a hundred times bigger, in which case for some reason it doesn't seem to have a stack error (although it hasn't completed on my computer, and uses something like 2G of memory). Perhaps the thunks are placed on the heap, and only when they are actually evaluated does anything go onto the stack?
1) What precisely is a thunk? 2) Let me see if I get this right. The strict version runs in constant space because in the expression (((1 + 1) + 1) ... + 1) the innermost (1 + 1) is reduced to 2 right away. The lazy version first uses a huge amount of heap space by building the entire expression (((1 + 1) + 1) ... + 1) on the heap. The evaluation of this expression starts by placing the outermost + 1 on the stack and continues inward, not actually reducing anything, before everything has been placed on the stack, which causes the overflow. Correct? Thanks for your help! Felix

Felix Breuer schrieb:
1) What precisely is a thunk?
That depends on the abstraction level. At the evaluation level, it is an expression that has at least one unevaluated subexpression. At the implementation level, it could be a direct representation of an expression graph (partly evaluated). Or it could be a record containing a function pointer and a possibly empty series of parameter values, some of which may be thunks again. Regards, Jo

On 15/12/06, Felix Breuer
1) What precisely is a thunk?
It's a memory cell that says 'I'm an unevaluated value, to evaluate me do X'. For example, consider the differences between the following programs: (common for all that follows) myFunc :: [Int] -> [Int] (1) myFunc xs = ... (2) myFunc (x:xs) = ... (3) myFunc (5:x:[]) = ... (Assume all the pattern matches are successful.) In the RHS of (1), xs refers to a thunk that is the unevaluated list. In (2), you evaluate the first level (to weak head normal form, or whnf) of the list to reveal a cons cell, where the head and tail are both thunks (and can be accessed by x and xs respectively). In (3), you evaluate the first level of the list, revealing a cons cell. You then evaluate the head of this cell, revealing a 5 (so no thunks here; everything's fully evaluated). The tail gets evaluated to a further cons cell where the head is an unevaluated thunk and the tail is evaluated to the empty list (so again, no thunks). Hopefully that gives you some kind of intuition regarding thunks and the implementation of laziness in Haskell. -- -David House, dmhouse@gmail.com

On Fri, Dec 15, 2006 at 10:05:38AM +0000, Felix Breuer wrote:
On Thu, 14 Dec 2006 15:31:54 -0800, David Roundy
wrote: main = do putStrLn "strict foldl1" print $ foldl1' (\a b -> a + 1) $ [1..largenum] putStrLn "lazy foldl1" print $ foldl1 (\a b -> a + 1) $ [1..largenum]
2) Let me see if I get this right. The strict version runs in constant space because in the expression
(((1 + 1) + 1) ... + 1)
the innermost (1 + 1) is reduced to 2 right away.
The strict version never creates the expression (((1 + 1) + 1) ... + 1). It's easier to see with foldl': foldl' (\a b -> a + 1) 0 [1..3] { evaluates 0+1 = 1 } -> foldl' (\a b -> a + 1) 1 [2..3] { evaluates 1+1 = 2 } -> foldl' (\a b -> a + 1) 2 [3..3] { evaluates 2+1 = 3 } -> foldl' (\a b -> a + 1) 3 [] -> 3
The lazy version first uses a huge amount of heap space by building the entire expression
(((1 + 1) + 1) ... + 1)
on the heap. The evaluation of this expression starts by placing the outermost + 1 on the stack and continues inward, not actually reducing anything, before everything has been placed on the stack, which causes the overflow. Correct?
Right, foldl doesn't evaluate its argument as it goes, so it builds (((0+1)+1)+1) (on the heap): foldl (\a b -> a + 1) 0 [1..3] -> foldl (\a b -> a + 1) (0+1) [2..3] -> foldl (\a b -> a + 1) ((0+1)+1) [3..3] -> foldl (\a b -> a + 1) (((0+1)+1)+1) [] -> (((0+1)+1)+1) Now we need to evaluate (((0+1)+1)+1) to get the final answer. You can imagine a simple recursive evaluation function which, in the call evaluate (((0+1)+1)+1) recursively calls evaluate ((0+1)+1) which recursively calls evaluate (0+1) and it is this recursion that has a stack that overflows. Thanks Ian

felix:
Hello everyone,
I have been trying to run a Haskell program of mine that does an extensive computation with very large amounts of data. I compiled the program with ghc --make. When I run it it terminates after some time
Did you compile with -O (optimisations). Sometimes this fixes things, and its just good practice. -- Don

Hi
Did you compile with -O (optimisations). Sometimes this fixes things, and its just good practice.
It's slower to compile, and might fix things in GHC Haskell, but other compilers don't all have -O flags, so its generally best to make your program at least have the right sort of time/space behaviour using Haskell. If your code doesn't work without -O, then it probably won't work in Hugs or Yhc. Thanks Neil

Hello Felix, Thursday, December 14, 2006, 6:00:53 PM, you wrote:
The program isn't that well written so the overflow did not surprise me, I expected that it might run out of memory. What did surprise me was the *stack* overflow. I do not use recursion in my program except for a couple of fold operations over very large lists.
surprise: some fold variants are not tail recursive :)
3) I tried using +RTS -K<size> as suggested, but these options do not seem to be passed through if I use --make. How can I use both, these compilation flags and --make?
you should use it as argument to your compiled program. RTS means *run-time* system, after all :) -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com
participants (10)
-
Brandon Moore
-
Brandon S. Allbery KF8NH
-
Bulat Ziganshin
-
David House
-
David Roundy
-
dons@cse.unsw.edu.au
-
Felix Breuer
-
Ian Lynagh
-
Joachim Durchholz
-
Neil Mitchell