[Haskell-cafe] Re: Real-time garbage collection for Haskell

6 Mar 2010


      On 06/03/10 06:56, Simon Cranshaw wrote:
...
For settings we are using -N7 -A8m -qg.
I'm surprised if turning off parallel GC improves things, unless you 
really aren't using all the cores (ThreadScope will tell you that).

Do these flags give you an improvement in throughput, or just pause times?
...
I don't know if they are really the optimal values but I haven't found a
significant improvement on these yet.  I tried -qb but that was slow.
Interesting, I often find that -qb improves things.
...
I
tried larger values of A but that didn't seem to make a big difference.
-A8m is close to the size of your L2 caches, right?  That will certainly 
be better than the default of -A512k.
...
Also -N6 didn't make much difference.  Specifying H values didn't seem
to make much difference.
-H is certainly a mixed bag when it comes to parallel programs.
...
I have to admit I don't fully understand the
implications of the values and was just experimenting to see what worked
best.
So the heap size is trading off locality (cache hits) against GC time. 
The larger the heap, the fewer GCs you do, but the worse the locality. 
Usually I find keeping the nursery size (-A) close to the L2 cache size 
works best, although sometimes making it really big can be even better.

-qg disables parallel GC completely.  This is usually terrible for 
locality, because every GC will move all the recently allocated data 
from each CPU's L2 cache into the cache of the CPU doing GC, where it 
will have to be fetched out again after GC.

-qb disables load-balancing in the parallel GC, which improves locality 
at the expense of parallelism, usually I find it is an improvement in 
parallel programs.

Cheers,
	Simon

[Haskell-cafe] Re: Real-time garbage collection for Haskell

Simon Marlow