How can we detect and fix memory leak due to lazyness?

Recently, I'm facing the dark side of laziness -- the memory leak because of laziness. Typical pattern that I encounter the problem is like this. My code was working fine and I was happy. I just wanted to inspect some properties of my code so I made a slight chage go the code such as adding counter argument or attaching auxiliary data filed to original data for tracing how the data has been constructed. All of a sudden the program runs out of memory or overflows the stack. One problem is that it comes up unexpectedly. Another even worse problem is that sometimes I get no idea for the exact location causing the leak! It really panics facing such darkness of lazy evaluation. Just a small innocent looking fix for inspection or tracing blow things up, sometime with no clue for its reason. When we implement a debugging or tracing option in the software and let the user toggle those features, how could we be sure that turning on those features won't crash the software written in Haskell? Are there standardized approaches for detecting and fixing these kind of problems? Haskell may be type safe but not safe at all from unexpanded diversion, which is not because of the programmers' mistake but just because of the laziness. I have posted an wiki article including one example of adding a counter to count the number of basic operations in sorting algorithm. http://www.haskell.org/haskellwiki/Physical_equality This was a rather simple situation and we figured out how to cure this by self equality check ( x==x ) forcing evaluation. There are worse cases not being able to figure out the cure. I wrote a function for analyzing some property of a graph, which worked fine. fixOnBy t p f x = if t x' `p` t x then x else fixOnBy t p f x' where x' = f x fixSize f x = fixOnBy Set.size (==) f x sctAnal gs = null cgs || all (not . null) dcs where gs' = fixSize compose $ Set.fromList [(x,y,cs) | To _ x y cs<-Set.toList gs] cgs = [z | z@(x,y,cs)<-Set.toList gs', x==y] dcs = [ [c| c@(a,D,b)<-Set.toList cs , a==b] | (_,_,cs)<-cgs] compose gs = trace ("## "++show (Set.size gs)) $ foldr Set.insert gs $ do (x1,y1,cs1) <- Set.toList gs (_,y2,cs2) <- takeWhileFst y1 $ Set.toList $ setGT (y1,Al""(-1),Set.empty) gs return (x1,y2,cs1 `comp` cs2) takeWhileFst y = takeWhile (\(y',_,_) -> y==y') This function makes a transitive closure of the given set of relations by fixpoint iteration on the size of the set of weighted edges. Sample output is like this. *Main> main ## 170 ## 400 ## 1167 ## 2249 ## 2314 False When I add an extra data field for tracing how the new relation was constructed, (e.g. tag [a,b,c] on a->c if it came from a->b and b->c) it suddenly overflows the stack even before printing out the trace. The following is the code that leaks memory. sctAnal gs = null cgs || all (not . null) dcs where gs' = fixSize compose $ Set.fromList [TT (x,y,cs) [] | To _ x y cs<-Set.toList gs] cgs = [z | z@(TT (x,y,cs) _)<-Set.toList gs', x==y] dcs = [[c| c@(a,D,b)<-Set.toList cs , a==b] | TT (_,_,cs) _<-cgs] compose gs = trace ("## "++show (Set.size gs)) $ foldr checkInsert gs $ do TT (x1,y1,cs1) l1 <- Set.toList gs TT (_,y2,cs2) l2 <- takeWhileTTfrom y1 . Set.toList $ setGT (TT (y1,Al""(-1),Set.empty) []) gs return $ TT (x1,y2,cs1 `comp` cs2) (l1++y1:l2) takeWhileTTfrom y = takeWhile (\(TT (y',_,_) _) -> y==y') checkInsert x s | Set.member x s = s | otherwise = Set.insert x s data TT a b = TT a b deriving (Show) instance (Eq a, Eq b) => Eq (TT a b) where (TT x lx) == (TT y ly) = lx==lx && ly==ly && x == y instance (Ord a, Ord b) => Ord (TT a b) where (TT x lx) < (TT y ly) = lx==lx && ly==ly && x < y The really intersting thing happens when I just make the Ord derived the stack does not overflow and starts to print out the trace. (It is not the result that I want though. My intention is to ignore the tags in the set operation) data TT a b = TT a b deriving (Show,Eq,Ord) I believe my Eq and Ord instances defined above are even more stricter than the derived ones. Is there some magic in "deriving" that prevents memory leak? I've even followed the instance declaration like the following that would be the same as deriving but still leaks memory. data TT a b = TT a b deriving (Show) instance (Eq a, Eq b) => Eq (TT a b) where (TT x lx) == (TT y ly) = x == y && lx == ly instance (Ord a, Ord b) => Ord (TT a b) where (TT x lx) < (TT y ly) = x < y || x == y && lx < ly This is really a panic. -- Ahn, Ki Yung

On 8/7/06, Ahn, Ki Yung
I have posted an wiki article including one example of adding a counter to count the number of basic operations in sorting algorithm.
http://www.haskell.org/haskellwiki/Physical_equality
This was a rather simple situation and we figured out how to cure this by self equality check ( x==x ) forcing evaluation.
Forcing evaluation using (==) is a bit of a hack. Luckily, we have a better function to force evaluation: seq (which has type a -> b -> b). "seq x y" evaluates "x" to weak head normal form before returning "y". Let's try another feature of Haskell to force evaluation: strict data fields. A ! in front of a field in a data declaration signifies strictness. In the example below, whenever we construct a value with TT, the second argument is evaluated. \begin{code} data TT a b = TT a !b \end{code} Perhaps your instances will work correctly with this data declaration? Cheers, Spencer Janssen

Perhaps your instances will work correctly with this data declaration?
Perhaps it might. But that misses an important point. The biggest impediment to developing large robust applications with Haskell is the opacity of its performance model. Haskell is fantastic in very many ways, but this is a really serious difficulty. I can make a seemingly slight change to my program and the performance changes dramatically. What's worse, the connection between the cause of the blowup and place where it is observed can often be quite subtle[*]. There's a classic example of two one line haskell programs, one of which uses O(1) stack space and the other O(n) stack space, even though they compute the same result, and which are so similar, you have to stare at them for five minutes before you can spot the difference. Hughes' "Why functional programming matters" argues [rightly] that lazy FP provides a better "glue", to allow greater abstraction at the semantic level. The flip side, which IIRC, he doesn't mention is the opacity of the performance model. Here's a question for the experts. What generalizations can I make about the performance of lazy functions under composition? In particular, if all my individual functions are well behaved, will the program as a whole be well behaved? cheers, Tom [*] Gosh, this is beginning to sound like a diatribe on the evils of pointers and manual memory management in C. Interesting....

On 8/7/06, Spencer Janssen
Forcing evaluation using (==) is a bit of a hack. Luckily, we have a better function to force evaluation: seq (which has type a -> b -> b). "seq x y" evaluates "x" to weak head normal form before returning "y".
Let's try another feature of Haskell to force evaluation: strict data fields. A ! in front of a field in a data declaration signifies strictness. In the example below, whenever we construct a value with TT, the second argument is evaluated.
\begin{code} data TT a b = TT a !b \end{code}
Perhaps your instances will work correctly with this data declaration?
Surely I've tried that. Unfortunately seq and the strict data declaration is not helpful in general. They are only helpful on base values such as Int or Bool. What they do is just making sure that it is not a thunk. That is if it was a list it would just evaluate to see the cons cell but no further. Someone wrote a deepSeq module for forcing deep evaluation, which is like doing self equality strictness hack like x==x. However, we should be able to locate what is the source of the memory leak to apply such strictness tricks. I've tried plugging in x==x like hack almost everywhere I could but still hard to find the right hack. I think this is one of the most frustrating drawbacks developing software in lazy languages like Haskell. I am a fan of lazy langnauge; I like laziness and infinite data structures and clean semantics. But this is really painful. We have confidence that Haskell programs are robust. It seems it is too easy to blow up the memory or overflow the stack without intention. -- Ahn, Ki Yung

kyagrd:
On 8/7/06, Spencer Janssen
wrote: Forcing evaluation using (==) is a bit of a hack. Luckily, we have a better function to force evaluation: seq (which has type a -> b -> b). "seq x y" evaluates "x" to weak head normal form before returning "y".
Let's try another feature of Haskell to force evaluation: strict data fields. A ! in front of a field in a data declaration signifies strictness. In the example below, whenever we construct a value with TT, the second argument is evaluated.
\begin{code} data TT a b = TT a !b \end{code}
Perhaps your instances will work correctly with this data declaration?
Surely I've tried that.
Unfortunately seq and the strict data declaration is not helpful in general. They are only helpful on base values such as Int or Bool. What they do is just making sure that it is not a thunk. That is if it was a list it would just evaluate to see the cons cell but no further.
Someone wrote a deepSeq module for forcing deep evaluation, which is like doing self equality strictness hack like x==x. However, we should be able to locate what is the source of the memory leak to apply such strictness tricks.
The key is to profile. Compile the code, with optimisations on, with -prof -auto-all, then run the resulting program with +RTS -p -RTS. This will identify costly and timely functions. You can then refine the search further with {-# SCC "line1" #-} pragmas, next to expressoins you want to check the cost of. -- Don

Hello Ki, Tuesday, August 8, 2006, 6:34:51 AM, you wrote:
Unfortunately seq and the strict data declaration is not helpful in general. They are only helpful on base values such as Int or Bool. What they do is just making sure that it is not a thunk. That is if it was a list it would just evaluate to see the cons cell but no further.
Someone wrote a deepSeq module for forcing deep evaluation, which is
it was a proposal to add deepSeq to the language itself (just allow to automatically derive it by compiler, for example). we can add another proposal of implementing deep strict fields: data T = C !![Int] -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Ahn, Ki Yung wrote:
Recently, I'm facing the dark side of laziness -- the memory leak because of laziness.
The following is the code that leaks memory.
sctAnal gs = null cgs || all (not . null) dcs where gs' = fixSize compose $ Set.fromList [TT (x,y,cs) [] | To _ x y cs<-Set.toList gs] cgs = [z | z@(TT (x,y,cs) _)<-Set.toList gs', x==y] dcs = [[c| c@(a,D,b)<-Set.toList cs , a==b] | TT (_,_,cs) _<-cgs] compose gs = trace ("## "++show (Set.size gs)) $ foldr checkInsert
One thing is that (foldr) is not recommended if you can at all avoid it. I think you may be able to use (foldl') here (if you also swap the arg order for checkInsert) which behaves as a strict fold operation so it doesn't waste space building up thunks.
gs $ do TT (x1,y1,cs1) l1 <- Set.toList gs TT (_,y2,cs2) l2 <- takeWhileTTfrom y1 . Set.toList $ setGT (TT (y1,Al""(-1),Set.empty) []) gs return $ TT (x1,y2,cs1 `comp` cs2) (l1++y1:l2) takeWhileTTfrom y = takeWhile (\(TT (y',_,_) _) -> y==y') checkInsert x s
-- checkInsert s x -- if you use foldl'
| Set.member x s = s | otherwise = Set.insert x s
Sorry I can't help more at the moment - Regards, Brian. -- Logic empowers us and Love gives us purpose. Yet still phantoms restless for eras long past, congealed in the present in unthought forms, strive mightily unseen to destroy us. http://www.metamilk.com

Ahn, Ki Yung wrote:
Recently, I'm facing the dark side of laziness -- the memory leak because of laziness.
Typical pattern that I encounter the problem is like this.
My code was working fine and I was happy. I just wanted to inspect some properties of my code so I made a slight chage go the code such as adding counter argument or attaching auxiliary data filed to original data for tracing how the data has been constructed. All of a sudden the program runs out of memory or overflows the stack.
One problem is that it comes up unexpectedly. Another even worse problem is that sometimes I get no idea for the exact location causing the leak!
It really panics facing such darkness of lazy evaluation. Just a small innocent looking fix for inspection or tracing blow things up, sometime with no clue for its reason.
When we implement a debugging or tracing option in the software and let the user toggle those features, how could we be sure that turning on those features won't crash the software written in Haskell?
Are there standardized approaches for detecting and fixing these kind of problems?
Haskell may be type safe but not safe at all from unexpanded diversion, which is not because of the programmers' mistake but just because of the laziness.
I have posted an wiki article including one example of adding a counter to count the number of basic operations in sorting algorithm.
http://www.haskell.org/haskellwiki/Physical_equality
This was a rather simple situation and we figured out how to cure this by self equality check ( x==x ) forcing evaluation.
There are worse cases not being able to figure out the cure. I wrote a function for analyzing some property of a graph, which worked fine.
fixOnBy t p f x = if t x' `p` t x then x else fixOnBy t p f x' where x' = f x
fixSize f x = fixOnBy Set.size (==) f x
sctAnal gs = null cgs || all (not . null) dcs where gs' = fixSize compose $ Set.fromList [(x,y,cs) | To _ x y cs<-Set.toList gs] cgs = [z | z@(x,y,cs)<-Set.toList gs', x==y] dcs = [ [c| c@(a,D,b)<-Set.toList cs , a==b] | (_,_,cs)<-cgs] compose gs = trace ("## "++show (Set.size gs)) $ foldr Set.insert gs $ do (x1,y1,cs1) <- Set.toList gs (_,y2,cs2) <- takeWhileFst y1 $ Set.toList $ setGT (y1,Al""(-1),Set.empty) gs return (x1,y2,cs1 `comp` cs2) takeWhileFst y = takeWhile (\(y',_,_) -> y==y')
This function makes a transitive closure of the given set of relations by fixpoint iteration on the size of the set of weighted edges.
Sample output is like this.
*Main> main ## 170 ## 400 ## 1167 ## 2249 ## 2314 False
When I add an extra data field for tracing how the new relation was constructed, (e.g. tag [a,b,c] on a->c if it came from a->b and b->c) it suddenly overflows the stack even before printing out the trace.
I find that overflow a bit odd. What is the ghc command line? Are you using optimization flags?
The following is the code that leaks memory.
sctAnal gs = null cgs || all (not . null) dcs where gs' = fixSize compose $ Set.fromList [TT (x,y,cs) [] | To _ x y cs<-Set.toList gs] cgs = [z | z@(TT (x,y,cs) _)<-Set.toList gs', x==y] dcs = [[c| c@(a,D,b)<-Set.toList cs , a==b] | TT (_,_,cs) _<-cgs] compose gs = trace ("## "++show (Set.size gs)) $ foldr checkInsert gs $ do TT (x1,y1,cs1) l1 <- Set.toList gs TT (_,y2,cs2) l2 <- takeWhileTTfrom y1 . Set.toList $ setGT (TT (y1,Al""(-1),Set.empty) []) gs return $ TT (x1,y2,cs1 `comp` cs2) (l1++y1:l2) takeWhileTTfrom y = takeWhile (\(TT (y',_,_) _) -> y==y') checkInsert x s | Set.member x s = s | otherwise = Set.insert x s
data TT a b = TT a b deriving (Show) instance (Eq a, Eq b) => Eq (TT a b) where (TT x lx) == (TT y ly) = lx==lx && ly==ly && x == y instance (Ord a, Ord b) => Ord (TT a b) where (TT x lx) < (TT y ly) = lx==lx && ly==ly && x < y
Tracing by eye: sctAnal gc => null cgs => Set.toList gs' => let long = (Set.fromList [TT (x,y,cs) [] | To _ x y cs<-Set.toList gs]) in fixOnBy Set.size (==) compose (long) => if (Set.size (compose long)) == (Set.size long) then long else (compose long) => Set.size (compose long) => compose long => trace ("##"++show (Set.size long)) <rest> => Set.size long => long => Set.fromList [TT (x,y,cs) [] | To _ x y cs<-Set.toList gs] => Set.toList gs Which does not look like it will blow stack space. So I cannot see why the tracing function does not get to print the size. I would try to simplify the string the trace function prints into a literal instead of a calculation on the size. Then I would add many many more (trace "literal" $) functions to the code until I get some that print before it crashes. But I suspect you have done most of that.

Ahn, Ki Yung wrote:
Recently, I'm facing the dark side of laziness -- the memory leak because of laziness.
Are there standardized approaches for detecting and fixing these kind of problems?
Not really. As Don S. already said, try heap profiling. The function that is too lazy will show up as producer. Other than that, you'll just have to learn to look for the typical patterns. Understanding Haskell's evaluation model and being able to simulate it in your head also helps.
sctAnal gs = null cgs || all (not . null) dcs where gs' = fixSize compose $ Set.fromList [TT (x,y,cs) [] | To _ x y cs<-Set.toList gs] cgs = [z | z@(TT (x,y,cs) _)<-Set.toList gs', x==y] dcs = [[c| c@(a,D,b)<-Set.toList cs , a==b] | TT (_,_,cs) _<-cgs] compose gs = trace ("## "++show (Set.size gs)) $ foldr checkInsert gs $ do ^^^^^ point 1 TT (x1,y1,cs1) l1 <- Set.toList gs TT (_,y2,cs2) l2 <- takeWhileTTfrom y1 . Set.toList $ setGT (TT (y1,Al""(-1),Set.empty) []) gs return $ TT (x1,y2,cs1 `comp` cs2) (l1++y1:l2) ^^^^^^^^^^^ point 2 takeWhileTTfrom y = takeWhile (\(TT (y',_,_) _) -> y==y') checkInsert x s | Set.member x s = s | otherwise = Set.insert x s
I can see two sources of problems. Point 2 seems to be the cause of your immediate problem: this builds nested applications of (++) and never evaluates them. If the result is demanded, (++) calls itself recursively, and if the list is too long, the stack gets exhausted. 'seq' doesn't help, that would only let the (++) accumulate in the list's tail, but 'foldr seq' should help, and so would deepSeq. I wonder why
instance (Ord a, Ord b) => Ord (TT a b) where (TT x lx) < (TT y ly) = lx==lx && ly==ly && x < y
doesn't. Does the (lx == lx) get optimized away? The easiest solution would be to use a data structure that directly supports concatenation. Any implementation of a deque is good (FingerTrees? Having them around can never hurt...) and so is a function. Replace the list [a] by a function ([a] -> [a]), replace [] by id and replace (l1++y1:l2) by (l1.(y1:).l2). Also helps with the quadratic runtime, btw. At point 1i, there lurks another problem. You may find that some graphs will blow your stack or even your heap. That's because the repeated application of checkInsert is not evaluated and this thunk may get too deep or need more space than the Set it would buils. I think, you want foldl' (note the prime) here. Udo. -- F: Was ist ansteckend und kommutiert? A: Eine Abelsche Grippe.
participants (8)
-
Ahn, Ki Yung
-
Brian Hulley
-
Bulat Ziganshin
-
Chris Kuklewicz
-
dons@cse.unsw.edu.au
-
Spencer Janssen
-
Thomas Conway
-
Udo Stenzel