
Ahn, Ki Yung wrote:
Recently, I'm facing the dark side of laziness -- the memory leak because of laziness.
Typical pattern that I encounter the problem is like this.
My code was working fine and I was happy. I just wanted to inspect some properties of my code so I made a slight chage go the code such as adding counter argument or attaching auxiliary data filed to original data for tracing how the data has been constructed. All of a sudden the program runs out of memory or overflows the stack.
One problem is that it comes up unexpectedly. Another even worse problem is that sometimes I get no idea for the exact location causing the leak!
It really panics facing such darkness of lazy evaluation. Just a small innocent looking fix for inspection or tracing blow things up, sometime with no clue for its reason.
When we implement a debugging or tracing option in the software and let the user toggle those features, how could we be sure that turning on those features won't crash the software written in Haskell?
Are there standardized approaches for detecting and fixing these kind of problems?
Haskell may be type safe but not safe at all from unexpanded diversion, which is not because of the programmers' mistake but just because of the laziness.
I have posted an wiki article including one example of adding a counter to count the number of basic operations in sorting algorithm.
http://www.haskell.org/haskellwiki/Physical_equality
This was a rather simple situation and we figured out how to cure this by self equality check ( x==x ) forcing evaluation.
There are worse cases not being able to figure out the cure. I wrote a function for analyzing some property of a graph, which worked fine.
fixOnBy t p f x = if t x' `p` t x then x else fixOnBy t p f x' where x' = f x
fixSize f x = fixOnBy Set.size (==) f x
sctAnal gs = null cgs || all (not . null) dcs where gs' = fixSize compose $ Set.fromList [(x,y,cs) | To _ x y cs<-Set.toList gs] cgs = [z | z@(x,y,cs)<-Set.toList gs', x==y] dcs = [ [c| c@(a,D,b)<-Set.toList cs , a==b] | (_,_,cs)<-cgs] compose gs = trace ("## "++show (Set.size gs)) $ foldr Set.insert gs $ do (x1,y1,cs1) <- Set.toList gs (_,y2,cs2) <- takeWhileFst y1 $ Set.toList $ setGT (y1,Al""(-1),Set.empty) gs return (x1,y2,cs1 `comp` cs2) takeWhileFst y = takeWhile (\(y',_,_) -> y==y')
This function makes a transitive closure of the given set of relations by fixpoint iteration on the size of the set of weighted edges.
Sample output is like this.
*Main> main ## 170 ## 400 ## 1167 ## 2249 ## 2314 False
When I add an extra data field for tracing how the new relation was constructed, (e.g. tag [a,b,c] on a->c if it came from a->b and b->c) it suddenly overflows the stack even before printing out the trace.
I find that overflow a bit odd. What is the ghc command line? Are you using optimization flags?
The following is the code that leaks memory.
sctAnal gs = null cgs || all (not . null) dcs where gs' = fixSize compose $ Set.fromList [TT (x,y,cs) [] | To _ x y cs<-Set.toList gs] cgs = [z | z@(TT (x,y,cs) _)<-Set.toList gs', x==y] dcs = [[c| c@(a,D,b)<-Set.toList cs , a==b] | TT (_,_,cs) _<-cgs] compose gs = trace ("## "++show (Set.size gs)) $ foldr checkInsert gs $ do TT (x1,y1,cs1) l1 <- Set.toList gs TT (_,y2,cs2) l2 <- takeWhileTTfrom y1 . Set.toList $ setGT (TT (y1,Al""(-1),Set.empty) []) gs return $ TT (x1,y2,cs1 `comp` cs2) (l1++y1:l2) takeWhileTTfrom y = takeWhile (\(TT (y',_,_) _) -> y==y') checkInsert x s | Set.member x s = s | otherwise = Set.insert x s
data TT a b = TT a b deriving (Show) instance (Eq a, Eq b) => Eq (TT a b) where (TT x lx) == (TT y ly) = lx==lx && ly==ly && x == y instance (Ord a, Ord b) => Ord (TT a b) where (TT x lx) < (TT y ly) = lx==lx && ly==ly && x < y
Tracing by eye: sctAnal gc => null cgs => Set.toList gs' => let long = (Set.fromList [TT (x,y,cs) [] | To _ x y cs<-Set.toList gs]) in fixOnBy Set.size (==) compose (long) => if (Set.size (compose long)) == (Set.size long) then long else (compose long) => Set.size (compose long) => compose long => trace ("##"++show (Set.size long)) <rest> => Set.size long => long => Set.fromList [TT (x,y,cs) [] | To _ x y cs<-Set.toList gs] => Set.toList gs Which does not look like it will blow stack space. So I cannot see why the tracing function does not get to print the size. I would try to simplify the string the trace function prints into a literal instead of a calculation on the size. Then I would add many many more (trace "literal" $) functions to the code until I get some that print before it crashes. But I suspect you have done most of that.