RE: Looing for advice on profiling

On 09 November 2004 12:54, Duncan Coutts wrote: [snip]
When I do time profiling, the big cost centres come up as putByte and putWord. When I profile for space it shows the large FiniteMaps dominating most everything else. I originally guessed from that that the serialisation must be forcing loads of thunks which is why it shows up so highly on the profile. However even after doing the deepSeq before serialisation, it takes a great deal of time, so I'm not sure what's going on.
let's get the simple things out of the way first: make sure you're compiling Binary with -O -funbox-strict-fields (very important). When compiling for profiling, don't compile Binary with -auto-all, because that will add cost centres to all the small functions and really skew the profile. I find this is a good rule of thumb when profiling: avoid -auto-all on your low-level libraries that you hope to be inlined a lot. You say your instances are created using DrIFT - I don't think we ever modified DrIFT to generate the right kind of instances for the Binary library in GHC, so are you using the instances designed for the nhc98 binary library? If so, make sure your instances are using put_ rather than put, because the former will allow binary output to run in constant stack space. Are you using BinMem, or BinIO?
The retainer profiling again shows that the FiniteMaps are holding on to most stuff.
A major problem no doubt is space use. For the large gtk/gtk.h, when I run with +RTS -B to get a beep every major garbage collection, the serialisation phase beeps continuously while the file grows. Occasionally it seems to freeze for 10s of seconds, not dong any garbage collection and not doing any file output but using 100% CPU, then it carries on outputting and garbage collecting furiously. I don't know how to work out what's going on when it does that.
I agree with Malcolm's conjecture: it sounds like a very long major GC pause.
I don't understand how it can be generating so much garbage when it is doing the serialisation stuff on a structure that has already been fully deepSeq'ed.
Yes, binary output *should* do zero allocation, and binary input should only allocate the structure being created. The Binary library is quite heavily tuned so that this is the case (if you compile with profiling and -auto-all, it will almost certainly break this property, though). Cheers, Simon

On Tue, 2004-11-09 at 14:45, Simon Marlow wrote:
On 09 November 2004 12:54, Duncan Coutts wrote:
[snip]
When I do time profiling, the big cost centres come up as putByte and putWord. When I profile for space it shows the large FiniteMaps dominating most everything else. I originally guessed from that that the serialisation must be forcing loads of thunks which is why it shows up so highly on the profile. However even after doing the deepSeq before serialisation, it takes a great deal of time, so I'm not sure what's going on.
let's get the simple things out of the way first: make sure you're compiling Binary with -O -funbox-strict-fields (very important). When compiling for profiling, don't compile Binary with -auto-all, because that will add cost centres to all the small functions and really skew the profile. I find this is a good rule of thumb when profiling: avoid -auto-all on your low-level libraries that you hope to be inlined a lot.
Ok, I was missing -funbox-strict-fields. I'll try that.
You say your instances are created using DrIFT - I don't think we ever modified DrIFT to generate the right kind of instances for the Binary library in GHC, so are you using the instances designed for the nhc98 binary library? If so, make sure your instances are using put_ rather than put, because the former will allow binary output to run in constant stack space.
It's using put_
Are you using BinMem, or BinIO?
BinIO
The retainer profiling again shows that the FiniteMaps are holding on to most stuff.
A major problem no doubt is space use. For the large gtk/gtk.h, when I run with +RTS -B to get a beep every major garbage collection, the serialisation phase beeps continuously while the file grows. Occasionally it seems to freeze for 10s of seconds, not dong any garbage collection and not doing any file output but using 100% CPU, then it carries on outputting and garbage collecting furiously. I don't know how to work out what's going on when it does that.
I agree with Malcolm's conjecture: it sounds like a very long major GC pause.
Right, ok.
I don't understand how it can be generating so much garbage when it is doing the serialisation stuff on a structure that has already been fully deepSeq'ed.
Yes, binary output *should* do zero allocation, and binary input should only allocate the structure being created. The Binary library is quite heavily tuned so that this is the case (if you compile with profiling and -auto-all, it will almost certainly break this property, though).
Yes, it's much better with optimisations. I'll try the -funbox-strict-fields and report back. Duncan
participants (2)
-
Duncan Coutts
-
Simon Marlow