
On 06/03/10 06:56, Simon Cranshaw wrote:
For settings we are using -N7 -A8m -qg.
I'm surprised if turning off parallel GC improves things, unless you really aren't using all the cores (ThreadScope will tell you that). Do these flags give you an improvement in throughput, or just pause times?
I don't know if they are really the optimal values but I haven't found a significant improvement on these yet. I tried -qb but that was slow.
Interesting, I often find that -qb improves things.
I tried larger values of A but that didn't seem to make a big difference.
-A8m is close to the size of your L2 caches, right? That will certainly be better than the default of -A512k.
Also -N6 didn't make much difference. Specifying H values didn't seem to make much difference.
-H is certainly a mixed bag when it comes to parallel programs.
I have to admit I don't fully understand the implications of the values and was just experimenting to see what worked best.
So the heap size is trading off locality (cache hits) against GC time. The larger the heap, the fewer GCs you do, but the worse the locality. Usually I find keeping the nursery size (-A) close to the L2 cache size works best, although sometimes making it really big can be even better. -qg disables parallel GC completely. This is usually terrible for locality, because every GC will move all the recently allocated data from each CPU's L2 cache into the cache of the CPU doing GC, where it will have to be fetched out again after GC. -qb disables load-balancing in the parallel GC, which improves locality at the expense of parallelism, usually I find it is an improvement in parallel programs. Cheers, Simon