
I am trying to determine why my stack overflows in my medium sized program (it has several modules but maybe only 1000 LOC total). On Windows, at least, the ghcprof visualization tool doesn't work. Any suggestions besides an output trace? It may be the function below, which tries to determine if a list of strict bytestrings is longer than the count given. I have tried to make it strict but am not sure if it's too lazy. Any hints are appreciated. -- Determines if the length of the strings in the list is longer than the given -- count. If not, amount the list falls short is returned. Otherwise, -- -1 indicates the prefix list is at least that long. If the count is zero and -- the list is empty or just null strings, -1 is also returned. prefixesAtLeast :: Int -> [S.ByteString] -> Int prefixesAtLeast !0 !ss | null ss = 0 | all S.null ss = 0 | otherwise = -1 prefixesAtLeast !n !ss = prefixesAtLeast' n ss where prefixesAtLeast' !n ss | n < 0 = -1 | null ss = n | otherwise = let (!s : (!rest)) = ss in prefixesAtLeast' (n - (S.length s)) rest

Justin Bailey:
I am trying to determine why my stack overflows in my medium sized program [...snip...]
prefixesAtLeast :: Int -> [S.ByteString] -> Int prefixesAtLeast !0 !ss | null ss = 0 | all S.null ss = 0 | otherwise = -1 prefixesAtLeast !n !ss = prefixesAtLeast' n ss where prefixesAtLeast' !n ss | n < 0 = -1 | null ss = n | otherwise = let (!s : (!rest)) = ss in prefixesAtLeast' (n - (S.length s)) rest
Stack overflows often occur when you evaluate a "thunk" (unevaluated computation) which you have previously allowed to become deeply nested. The usual example is something like: print $ foldl (+) 0 [1..1000000] => print $ foldl (+) (0+1) [2..1000000] => print $ foldl (+) ((0+1)+2) [3..1000000] => print $ foldl (+) (((0+1)+2)+3) [4..1000000] => ... => print (...(((0+1)+2)+3)+...+1000000) => stack overflow The key point of the example is that foldl itself doesn't need any of the intermediate values of the accumulator, so these just build up into a deeply-nested unevaluated thunk. When print finally demands an integer, the run-time pushes a stack frame for each level of parentheses it enters as it tries to evaluate the thunk. Too many parentheses leads to a stack overflow. Of course, the solution to the example is to use foldl', whose implementation uses strictness annotations to force evaluation of the accumulator at each iteration. In your function, all the thunks you create will be evaluated in the next recursive call, so it's unlikely that prefixesAtLeast is *in itself* causing a stack overflow. However, it's possible that your use of this function is forcing evaluation of a deeply-nested thunk you've created somewhere else (as print does in the foldl example). So if your initial indications are that the stack overflow is occurring somewhere during the evaluation of prefixesAtLeast, you might want to try following the food-chain back through its suppliers (parameters) until you find something like the foldl example. By they way, your sprinkling of strictness annotations (a sure sign of stack-overflow desperation!) should not be necessary. Everything you have annotated should be forced anyway, either immediately or within one recursive call. In particular, !0 is entirely unnecessary, since matching against 0 does just as good a job of forcing the parameter value as the strictness annotation.

Matthew Brecknell wrote:
The key point of the example is that foldl itself doesn't need any of the intermediate values of the accumulator, so these just build up into a deeply-nested unevaluated thunk. When print finally demands an integer, the run-time pushes a stack frame for each level of parentheses it enters as it tries to evaluate the thunk. Too many parentheses leads to a stack overflow. Of course, the solution to the example is to use
What is the point in building this huge thunk if it can't be evaluated without a stack overflow? Could the runtime do partial evaluation to keep the thunk size down or would that cause semantic breakage? Joe Buehler

Joe Buehler wrote:
What is the point in building this huge thunk if it can't be evaluated without a stack overflow?
It's not that there's a point to it, it's just the behaviour of foldl. Hence you shouldn't be using foldl. GHC's strictness analyser can sometimes save you from yourself if you're compiling with -O, but it's better to just avoid foldl and use foldr or Data.List.foldl' instead.

On 8/16/07, Matthew Brecknell
However, it's possible that your use of this function is forcing evaluation of a deeply-nested thunk you've created somewhere else (as print does in the foldl example).
Thank you for the detailed and helpful reply. I was led to this function by the vanilla profiling, which showed that it had the highest percentage of allocations. Now I'm thinking I'll look at the functions up the call stack to see which might be building up the thunk. Would "retainer profiling" help me see what was building up this large thunk/closure? Justin

Justin Bailey:
Would "retainer profiling" help me see what was building up this large thunk/closure?
I'm not really familiar enough with GHC's profiling to answer that, but I'll take a guess. My guess is that profiling will only sometimes be useful in diagnosing stack overflows, because I suspect that memory stats reported by the profiler will usually be dominated by heap usage. So profiling *might* point you towards some big thunks on the heap which might cause a stack overflow on evaluation. If so, then you're in luck. But the problem is that you don't actually *need* a huge unevaluated thunk to cause a stack overflow. Sure, the foldl example had one, but consider what happens if we use foldr instead: print (foldr (+) 0 [1..]) => print (1+(foldr (+) 0 [2..])) => print (1+(2+(foldr (+) 0 [3..]))) => print (1+(2+(3+(foldr (+) 0 [4..])))) => ... => print (1+(2+(3+(...+(foldr (+) 0 [...])))) => stack overflow It's a bit more tricky to explain what's going on here, which may be one reason why foldr is not the usual stack overflow example. While the nested additions in the foldl example represented a long chain of unevaluated thunks on the heap, here they represent partially executed computations on the stack. There is no big thunk! But there are still many nested contexts on the stack, so we still get an overflow. Another way of contrasting the foldl and foldr examples is to realise that foldl always consumes its entire input list, while foldr only consumes as much as its asked to. In the former, foldl drives the process of thunk building. In the latter, it is the evaluation of the innermost (+) function that drives foldr to generate the next iteration. I suspect that explanation is not very clear, so I give a small experiment which will at least show that I'm not lying. :-) Run a basic GHC profile (without optimisations) on each of the following, and observe the total memory usage. With foldl, memory usage is very high, because the entire list is consumed to produce a huge thunk on the heap. With foldr, memory usage is only about 16M, just enough to blow the stack. -- trial 1: stack overflow, lots of memory consumed main = print (foldl (+) 0 [1..10000000] :: Int) -- trial 2: stack overflow, minimal memory consumption main = print (foldr (+) 0 [1..10000000] :: Int) In fact, we could give foldr an infinite list, and get exactly the same result. Curiously, if we give foldl an infinite list, we don't get a stack overflow, because we never get to the point of evaluating the thunk. Instead, we get heap exhaustion, because we just keep building thunks. -- trial 4: heap exhaustion, nasty main = print (foldl (+) 0 [1..] :: Int) -- trial 5: stack overflow, minimal memory consumption main = print (foldr (+) 0 [1..] :: Int) It's also instructive to run these tests with optimisations (no profiling), to see how they are affected by strictness analysis. Note that strictness analysis doesn't work for the default Integer type, so the Int type annotations are necessary.

On 8/18/07, Matthew Brecknell
Justin Bailey:
Would "retainer profiling" help me see what was building up this large thunk/closure?
I'm not really familiar enough with GHC's profiling to answer that, but I'll take a guess.
You're experimental programs have given me an idea - I can use them to test if the profiling tools can show me where a stack overflow might be occurring. Thanks for the clear explanation of the difference. I also found the wiki page http://www.haskell.org/haskellwiki/Stack_overflow to be helpful. Justin

Justin Bailey wrote:
I am trying to determine why my stack overflows in my medium sized program (it has several modules but maybe only 1000 LOC total). On Windows, at least, the ghcprof visualization tool doesn't work. Any suggestions besides an output trace?
You shouldn't need ghcprof. Just compiling with -prof -auto-all will be enough to get you able to use allocation profiling, then running with +RTS -p -RTS will generate an allocation profile as a fairly readable text file.
It may be the function below, which tries to determine if a list of strict bytestrings is longer than the count given.
Taking stabs in the dark is not a good idea, and sprinkling strictness annotations around in an undirected manner won't help, either, however much it might feel like doing something concrete. Start with looking at the profile output. You'll probably find it's a different part of your code entirely that's causing the problem.

Justin Bailey wrote:
-- Determines if the length of the strings in the list is longer than the given -- count. If not, amount the list falls short is returned. Otherwise, -- -1 indicates the prefix list is at least that long. If the count is zero and -- the list is empty or just null strings, -1 is also returned.
prefixesAtLeast :: Int -> [S.ByteString] -> Int
While that doesn't help your stack overflow problem, it's not very haskellish to return magic numbers. A Maybe type is more appropriate here.
prefixesAtLeast !0 !ss | null ss = 0 | all S.null ss = 0 | otherwise = -1 prefixesAtLeast !n !ss = prefixesAtLeast' n ss where prefixesAtLeast' !n ss | n < 0 = -1 | null ss = n | otherwise = let (!s : (!rest)) = ss in prefixesAtLeast' (n - (S.length s)) rest
Extracting the head and tail of ss with a let statement could lead to a huge unevaluated expression like rest = tail (tail (tail (...))) but the null test are likely to force it. Also note that the case n = 0 is quite rare. In any case, I'd write the function as lengthExcess :: Int -> [S.ByteString] -> Maybe Int lengthExcess n ss | n <= 0 = Nothing | otherwise = case ss of [] -> Just n (s:ss) -> lengthExcess (n - S.length s) ss Note the that the function name is chosen to mnemonically match the result type Maybe Int, i.e. "the excess is Just 5 characters" or "the excess is Nothing". Regards, apfelmus

On 8/17/07, apfelmus
Extracting the head and tail of ss with a let statement could lead to a huge unevaluated expression like
rest = tail (tail (tail (...)))
Even though they are probably forced, would breaking the head and tail apart via pattern-matching or a case statement avoid building up that unevaluated expression? Justin

Justin Bailey wrote:
apfelmus wrote:
Extracting the head and tail of ss with a let statement could lead to a huge unevaluated expression like
rest = tail (tail (tail (...)))
Even though they are probably forced, would breaking the head and tail apart via pattern-matching or a case statement avoid building up that unevaluated expression?
Yes, absolutely, since pattern matching has to force the scrutinee in order to choose the matching case. In contrast, a let statement let (x:xs) = expr in ... simply assumes that expr is of the form (x:xs) but does not force it and check whether that's really the case. Of course, this may turn out as pattern match later on as soon as x is demanded but expr was initially the empty list. In your case, the test null ss forces ss and checks whether the let-pattern is ok. So, you basically end up doing what a case expression would do. In other words, the situation is more "they are most likely forced" than "they are probably forced" and it's just a matter of convenience to choose one over the other. But there are certain situations where you can check/prove differently that the let pattern never fails and where such a lazy pattern is wanted. Regards, apfelmus
participants (5)
-
apfelmus
-
Bryan O'Sullivan
-
Joe Buehler
-
Justin Bailey
-
Matthew Brecknell