Re: [Haskell-cafe] How can we detect and fix memory leak due to lazyness?

8 Aug 2006

      Ahn, Ki Yung wrote:
...
Recently, I'm facing the dark side of laziness
-- the memory leak because of laziness.
Typical pattern that I encounter the problem is like this.
My code was working fine and I was happy.
I just wanted to inspect some properties of my code so
I made a slight chage go the code such as adding counter
argument or attaching auxiliary data filed to original data for
tracing how the data has been constructed.
All of a sudden the program runs out of memory or overflows
the stack.
One problem is that it comes up unexpectedly. Another even
worse problem is that sometimes I get no idea for the exact
location causing the leak!
It really panics facing such darkness of lazy evaluation.
Just a small innocent looking fix for inspection or tracing
blow things up, sometime with no clue for its reason.
When we implement a debugging or tracing option in the
software and let the user toggle those features, how could
we be sure that turning on those features won't crash the
software written in Haskell?
Are there standardized approaches for detecting and fixing
these kind of problems?
Haskell may be type safe but not safe at all from unexpanded
diversion, which is not because of the programmers' mistake
but just because of the laziness.
I have posted an wiki article including one example of adding
a counter to count the number of basic operations in sorting algorithm.
http://www.haskell.org/haskellwiki/Physical_equality
This was a rather simple situation and we figured out how to
cure this by self equality check ( x==x ) forcing evaluation.
There are worse cases not being able to figure out the cure.
I wrote a function for analyzing some property of a graph,
which worked fine.
fixOnBy t p f x = if t x' `p` t x then x else fixOnBy t p f x' where x' 
= f x
fixSize f x = fixOnBy Set.size (==) f x
sctAnal gs = null cgs || all (not . null) dcs
where
  gs' = fixSize compose $ Set.fromList [(x,y,cs) | To _ x y 
cs<-Set.toList gs]
  cgs = [z | z@(x,y,cs)<-Set.toList gs', x==y]
  dcs = [ [c| c@(a,D,b)<-Set.toList cs , a==b] | (_,_,cs)<-cgs]
  compose gs = trace ("## "++show (Set.size gs)) $ foldr Set.insert gs $ do
    (x1,y1,cs1) <- Set.toList gs
    (_,y2,cs2)  <-  takeWhileFst y1 $ Set.toList $ setGT
(y1,Al""(-1),Set.empty) gs
    return (x1,y2,cs1 `comp` cs2)
  takeWhileFst y = takeWhile (\(y',_,_) -> y==y')
This function makes a transitive closure of the given set of relations
by fixpoint iteration on the size of the set of weighted edges.
Sample output is like this.
*Main> main
## 170
## 400
## 1167
## 2249
## 2314
False
When I add an extra data field for tracing how the new relation was
constructed, (e.g. tag [a,b,c] on a->c if it came from a->b and b->c)
it suddenly overflows the stack even before printing out the trace.
I find that overflow a bit odd.  What is the ghc command line?  Are you using 
optimization flags?
...
The following is the code that leaks memory.
sctAnal gs = null cgs || all (not . null) dcs
where
  gs' = fixSize compose $ Set.fromList [TT (x,y,cs) [] | To _ x y
cs<-Set.toList gs]
  cgs = [z | z@(TT (x,y,cs) _)<-Set.toList gs', x==y]
  dcs = [[c| c@(a,D,b)<-Set.toList cs , a==b] | TT (_,_,cs) _<-cgs]
  compose gs = trace ("## "++show (Set.size gs)) $ foldr checkInsert gs 
$ do
    TT (x1,y1,cs1) l1 <- Set.toList gs
    TT (_,y2,cs2) l2 <- takeWhileTTfrom y1 . Set.toList $ setGT (TT
(y1,Al""(-1),Set.empty) []) gs
    return $ TT (x1,y2,cs1 `comp` cs2) (l1++y1:l2)
  takeWhileTTfrom y = takeWhile (\(TT (y',_,_) _) -> y==y')
  checkInsert x s
                  | Set.member x s = s
                  | otherwise      = Set.insert x s
data TT a b = TT a b deriving (Show)
instance (Eq a, Eq b) => Eq (TT a b) where
 (TT x lx) == (TT y ly) = lx==lx && ly==ly && x == y
instance (Ord a, Ord b) => Ord (TT a b) where
 (TT x lx) < (TT y ly) = lx==lx && ly==ly && x < y
Tracing by eye:

sctAnal gc => null cgs => Set.toList gs' =>

let long = (Set.fromList [TT (x,y,cs) [] | To _ x y cs<-Set.toList gs])
in fixOnBy Set.size (==) compose (long) =>

if (Set.size (compose long)) == (Set.size long) then long else (compose long) =>

Set.size (compose long) => compose long =>

trace ("##"++show (Set.size long)) <rest> => Set.size long => long =>

Set.fromList [TT (x,y,cs) [] | To _ x y cs<-Set.toList gs] => Set.toList gs

Which does not look like it will blow stack space.  So I cannot see why the 
tracing function does not get to print the size.

I would try to simplify the string the trace function prints into a literal 
instead of a calculation on the size.  Then I would add many many more (trace 
"literal" $) functions to the code until I get some that print before it 
crashes.  But I suspect you have done most of that.

Re: [Haskell-cafe] How can we detect and fix memory leak due to lazyness?

Chris Kuklewicz