Behavior of the -H RTS option, possible doc/impl mismatch

Hi, I have questions regarding to the -H RTS option. I use GHC 7.0.1 on Linux x86-64. The User's Guide says: -Hsize [Default: 0] This option provides a “suggested heap size” for the garbage collector. The garbage collector will use about this much memory until the program residency grows and the heap size needs to be expanded to retain reasonable performance. However the actual behavior seems to be quite different. As an example, for a particular program: ./a.out +RTS -N7 -A256M -H2G uses around 7 GBytes of memory ./a.out +RTS -N7 -A256M -H6G uses around 13 GBytes of memory If the User's Guide is correct, changing -H2G to -H6G should not increase the heap usage beyond 6 GBytes. In the rts source, I see the parameter value (RtsFlags.GcFlags.heapSizeSuggestion) is used only to adjust the size of the allocation areas, not the entire heap. How is the -H option supposed to behave? How does it behave currently? Regards, Takano Akio

Hello Akio, Wednesday, February 16, 2011, 11:24:31 AM, you wrote:
./a.out +RTS -N7 -A256M -H2G uses around 7 GBytes of memory ./a.out +RTS -N7 -A256M -H6G uses around 13 GBytes of memory
ghc uses copying GC by default - when heap overflows, it copies all the live data to the new heap and use space of old heap for new allocations. it means that memory usage may grow in 2x jumps in the worst case and that memory usage may differ 2x from run to run due to minor changes in input data or RTS heap options if you need to decrease memory usage, consider -F and -c options. -M will be especially useful if you know memory usage in advance -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

On 16/02/2011 08:24, Akio Takano wrote:
Hi,
I have questions regarding to the -H RTS option. I use GHC 7.0.1 on Linux x86-64.
The User's Guide says:
-Hsize [Default: 0] This option provides a “suggested heap size” for the garbage collector. The garbage collector will use about this much memory until the program residency grows and the heap size needs to be expanded to retain reasonable performance.
However the actual behavior seems to be quite different. As an example, for a particular program:
./a.out +RTS -N7 -A256M -H2G uses around 7 GBytes of memory ./a.out +RTS -N7 -A256M -H6G uses around 13 GBytes of memory
If the User's Guide is correct, changing -H2G to -H6G should not increase the heap usage beyond 6 GBytes.
In the rts source, I see the parameter value (RtsFlags.GcFlags.heapSizeSuggestion) is used only to adjust the size of the allocation areas, not the entire heap.
How is the -H option supposed to behave? How does it behave currently?
It works by estimating how much memory will be required by the next GC, subtracting that from the -H value, and dividing up the remainder between the allocation areas (that's why it only affects the allocation area sizes). You could easily exceed the -H size by allocating huge arrays, for example. There is room for error in the "estimating" part. In the worst case the next GC could need to copy the entire heap, but that never happens in practice, so we estimate how much of the heap will be copied. If we get it wrong, then we end up exceeding the -H size. If we were too conservative, then we would end up using less than the -H size in most cases. I did actually try this: it gave strange results, e.g. when specifying -H64m on the command line the RTS would use only 40m or so, and run slower than the current -H algorithm. Anyway, with -N2 and above I don't recommend using -H, generally I've found it results in lower performance. -A1m might be good if your CPUs have larger L2 caches. I have some local patches that implement an option like -H but which applies to the old generation sizing rather than the nursery, which tends to work better with -N2 and above. Cheers, Simon

Hi Simon,
Thank you for explanation. I think I now understand why -H behaves that way.
2011/2/17 Simon Marlow
Anyway, with -N2 and above I don't recommend using -H, generally I've found it results in lower performance. -A1m might be good if your CPUs have larger L2 caches. I have some local patches that implement an option like -H but which applies to the old generation sizing rather than the nursery, which tends to work better with -N2 and above.
An experiment shows my program benefits from larger -H value, at least with a fixed -A. Also -A256M is much better than -A1M in my case, perhaps because decreasing the number of minor GCs is very important to the performance. -- Takano Akio
Cheers, Simon
participants (3)
-
Akio Takano
-
Bulat Ziganshin
-
Simon Marlow