
Duncan That was my first thought - but what I'm looking for is some confirmation from those who know better that treating the GC as 'statistical source' is a valid hypothesis. If the thing is 'random' that's fine - if its timing is non-deterministic, that's not fine. So GC experts are there any hints you can give me? - are there any papers that cover this timing aspect? and are there any corner cases that might make the statistical approach risky? (or at worse invalid). I don't want to have to build a stochastic model of the GC, if I can help it! Neil On 4 May 2009, at 12:51, Duncan Coutts wrote:
On Fri, 2009-05-01 at 09:14 +0100, Neil Davies wrote:
Hi
With the discussion on threads and priority, and given that (in Stats.c) there are lots of useful pieces of information that the run time system is collecting, some of which is already visible (like the total amount of memory mutated) and it is easy to make other measures available - it has raised this question in my mind:
Given that you have access to that information (the stuff that comes out at the end of a run if you use +RTS -S) is it possible to estimate the time a GC will take before asking for one?
Ignoring, at least for the moment, all the issues of paging, processor cache occupancy etc, what are the complexity drivers for the time to GC?
I realise that it is going to depend on things like, volume of data mutated, count of objects mutated, what fraction of them are live etc - and even if it turns out that these things are very program specific then I have a follow-on question - what properties do you need from your program to be able to construct a viable estimate of GC time from a past history of such garbage collections?
Would looking at statistics suffice? Treat it mostly as a black box. Measure all the info you can before and after each GC and then use statistical methods to look for correlations to see if any set of variables predicts GC time.
Duncan