Re: heap profiling

18 Jun 2010

      On 18/06/2010 08:29, Evan Laforge wrote:
...
...
oh, that happens a lot.  e.g. in  f (g x), if f doesn't demand the value of
its argument for a long time, then the heap closure for (g x) is lagging.
  If the value of (g x) is never demanded, but is nevertheless stored in some
data structure for a while, then it is VOID.
Ok... so a (g x) closure is constructed, and it will keep 'x' alive
when it shouldn't, but what's the actual data constructor here?  If
'x' is [1, 2], will these two cons cells be counted as being in LAG
stage?  Or is it just the closure, which will show up as a
sat_something?
I think it depends whether the [1,2] has been "used" or not.  If it has 
been used, then the closures are in the USE state until the last use, 
otherwise they are in the LAG state.
...
...
The Monad instance for Writer looks like this:
instance (Monoid w) =>  Monad (Writer w) where
    return a = Writer (a, mempty)
    m>>= k  = Writer $ case runWriter m of
                            (a, w) ->  case runWriter (k a) of
                                (b, w') ->  (b, w `mappend` w')
I expect sat_s0a1 is the closure for (w `mappend` w').  If that is causing
your space leak, then maybe you need a Control.Monad.Writer.Stricter in
which the written value is forced strictly by>>=.
So I copied Writer/Strict.hs and made a new one with a slightly
stricter>>=, note the ! on w':
m>>= k  = WriterT $ do
         (a, w)<- runWriterT m
         (b, !w')<- runWriterT (k a)
         let ws = w `mappend` w'
         return (b, ws)
Initially this had no effect.  I put a ! on the first 'w', and now the
StricterWriter.sat_* closure is replaced by the #2 consumer, which is
PAP.  What *is* PAP?  Memory usage is exactly the same, I just have
more PAP now.  So I tried switching from DList and DList.append to []
and (++) and now... the lag is gone!  Only the ! on w' is needed.
The quadratic behaviour of (++) is not a practical worry in my case,
since I log rarely so most of the appends are with []s on one side or
both, but this is a bit troubling.  Writer + DList was specifically
recommended as an efficient way to log things, but if a giant space
leak is the result, then it's not such a great idea after all.  I'm
guessing it has something to do with how DList is implemented in terms
of function compositions, but since I can't see inside PAP I have to
guess.
Right, I wouldn't use DList for this.  Here's an alternative I use:

data AList a = ANil | ASing a | Append (AList a) (AList a)

lenA :: AList a -> Int
lenA ANil          = 0
lenA (ASing _)     = 1
lenA (Append l r)  = lenA l + lenA r

appendA ANil r = r
appendA l ANil = l
appendA l r    = Append l r

Note how appendA is strict(ish) and eliminates ANils, so in a writer 
monad it shouldn't build up a space leak.  I'm sure you can write toList 
(don't use (++) though).

I'd put the bang on ws:

       m>>= k  = WriterT $ do
           (a, w)  <- runWriterT m
           (b, w') <- runWriterT (k a)
           let !ws = w `mappend` w'
           return (b, ws)

The problem with this monad is that >>= isn't tail-recursive, so it will 
cause stack overflows on recursive monadic functions.  I suspect that a 
better alternative to the strict writer monad is the strict state monad 
in most cases, because its bind is tail-recursive.
...
To try to get a better picture of what was going on, I created a
simpler profile with a simplified monad stack: just logging and state
and doing some stylized actions... and now the results are the
reverse!  The stricter version on lists produces an enormous amount of
lag, the stock Writer.Strict is much better, but still produces quite
a bit of garbage (perhaps from the State, though I can't see what I'm
doing wrong there).  The dlist + normal strictness version is so fast
the gc can't even measure it!  But if I bump the 'depth' from 9 to 10
(which means factorially more calls), suddenly I'm getting stack
overflows... but initially only in hb mode.
So the more I look at it, the less I understand.  Clearly something is
different from my production profile and the test profile, but it
seems baffling that it makes such a difference as to reverse the
results.  I will try to reduce the production version element by
element until the results switch around.
I put the simple version at
http://hpaste.org/fastcgi/hpaste.fcgi/view?id=26329#a26329
This one displays much better performance with DList + Writer.Strict
than List + StrictWriter so I guess it's not too surprising.  However,
*something* is still generating a lot of lag and -hb causes a stack
trace, so I'm still doing it wrong somehow.
I also have a situation where -hb shows about 4mb of drag at the most,
but when I run with '-hc -hbdrag', I only see 10k peaks.  Shouldn't
filtering the graph by drag add up to as much drag as when the graph
isn't filtered?
That sounds suspicious.  If you can make a self-contained example that 
demonstrates it and create a ticket, that would be a great help.

Cheers,
	Simon
...
...
You see "*" when the type of a closure is polymorphic (as it would be in the
case of w `mappend` w').  stg_ap_2_upd_info is a generic thunk
implementation in the RTS - perhaps when profiling we should avoid using
these to give you more information.
Well, it's eating a lot of space... so it must be something.  I guess
if it's impossible to tell what it is, then it's just being needlessly
confusing there.
And thanks so much for the responses... space usage gets more
confusing the more I look at it, but I hope it will come clear
eventually given enough time.

Re: heap profiling

Simon Marlow