
It seems I have two options:
A) compile without profiling support and run the compiled program with +RTS -sstderr
B) compile with profiling support -prof -auto-all and run the compiled program with +RTS -p -sstderr
In case A, I get a good measure of GC vs. mutator time, but I don't know the amount of time used by single functions, so I can't seperate between mutator time spent in the functions that really interest me and the time spent for the test frame.
In case B, I can seperate mutator time spent in single functions, but I have the impression that the GC time I then get includes the GC for the profiler, and hence is useless for me, because different GC times for different algorithms might just mean that one of the algorithms requires more profiling overhead.
If you run a profiled program with +RTS -sstderr, the time breakdown includes an extra category, PROF, which counts the time spent in the heap profiler. The amount of GC time consumed by the profiled program will indeed be different from the unprofiled program, because of profiling overheads - there's no way around this, I'm afraid. But you may find that the ratio of mutator to GC time in the profiled program is similar to the unprofiled program (I'd be interested to know whether this is/is not the case). To get the most reliable measure of GC time in a profiled program, do not turn on heap profiling, because this will cause extra GCs to be performed and will inflate the GC time. Turing on time profiling should be ok. Cheers, Simon

Simon Marlow wrote:
If you run a profiled program with +RTS -sstderr, the time breakdown includes an extra category, PROF, which counts the time spent in the heap profiler. The amount of GC time consumed by the profiled program will indeed be different from the unprofiled program, because of profiling overheads - there's no way around this, I'm afraid. But you may find that the ratio of mutator to GC time in the profiled program is similar to the unprofiled program (I'd be interested to know whether this is/is not the case).
It seems I stumbled on a situation where this is not the case. The following is a simplified version of my program:
module Main where
f xs = fst (go xs (id,id)) [] where go (x:xs) ys = go xs ((\(rw,rl) -> (\zs->x:rl zs,\zs->rw (x:zs))) ys) go [] ys = ys
main = do print (let r = (map f (replicate 3000 (replicate 3000 0))) in r==r)
Without profiling support, I get: ~> ghc-5.04.1 -O ratioGC.lhs ~> ./a.out +RTS -sstderr ... INIT time 0.00s ( 0.00s elapsed) MUT time 8.18s ( 8.20s elapsed) GC time 2.84s ( 3.13s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 11.02s ( 11.33s elapsed) ... When compiled for profiling, I get: ~> ghc-5.04.1 -O ratioGC.lhs -prof -auto-all ~> ./a.out +RTS -sstderr ... INIT time 0.01s ( 0.01s elapsed) MUT time 19.94s ( 21.38s elapsed) GC time 17.96s ( 19.70s elapsed) RP time 0.00s ( 0.00s elapsed) PROF time 0.00s ( 0.00s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 37.91s ( 41.09s elapsed) ... So, the MUT vs. GC time ratio varies much with the program compiled for profiling support. My actual aim is to compare different implementations of f with respect to both runtime for expression evaluation, and induced amount of GC. To measure really just the runtime for f (in contrast to the test frame with generation of test data and consumption via the forced (==)-check), I wanted to use cost centre profiling, but the big change in GC behaviour forbids this. Adding the GC time from the first run (without profiling) and the mutator time for cost centre f from a profiled run is also not an option, because the time attributed to f by the profiler is bigger than the overall runtime of the unprofiled program, so the scale differs completely. I guess I will settle for more accurate total runtimes (including GC time) without cost centre profiling, and hence without seperation of algorithm from test infrastructure. Regards, Janis. -- Janis Voigtlaender http://wwwtcs.inf.tu-dresden.de/~voigt/ mailto:voigt@tcs.inf.tu-dresden.de
participants (2)
-
Janis Voigtlaender
-
Simon Marlow