
Hello Ravi, Wednesday, April 19, 2006, 12:05:28 AM, you wrote:
1. What "goes wrong" with mutable data and generational garbage collection to make this sort of workaround necessary?
generational GC (at least one used in GHC) is better suited for "functional" (immutable) data. this conception by itself based on that new allocated data can contain references to old ones but not vice versa. minor GC algorithms scans only new data, if they don't contain any references to some data cell (in new data), then all the heap don't contain references to this cell and it can be freed. most of the data in "normal" Haskell program are immutable so this works fine. but when there are mutable data, old part of heap can contain references to new data so on each minor gc we must scan not only new data but also all these mutable arrays/references. and this makes the problem when number of these references are too big. raising -A/-H just dramatically reduces number of minor GCs
The issue surprised me because I thought there was plenty of satisfied experience with generational gc for languages with many more destructive updates (e.g. Java) than Haskell. Is there some optimization that the ghc runtime is not implementing?
it seems that this issue was much less important for Haskell because many programs are written in fucntional way and not use mutable vars very much. one program that encounter this problem is ghc itself :) without -A it can spend 50% of worktime in GCs :)
2. If there is a known fix for this issue, what would it involve (and, if there are any guesses, how much work might it be)?
fixed partially in 6.6 (afaik, large IOArrays will still have the problem). try to find GHC error ticket about this, it have the complete story in short, ghc 6.6 keeps the list of all arrays/references that was updated after last GC, so it can scan only updated ones instead of all mutable objects. on the other side, this list contain entire array even if only one of it's element was updated. this entire array scanned on next minor GC. this can create problems for really large arrays (with millions of elements) also you can use unboxed arrays to avoid this scanning, this works in any GHC version
3. What is the best workaround for this issue if you *don't* know the maximum amount of memory available for your program? Would be it be best to fall back copying collection if you want your program to consume and release memory "as needed" or is there a cleverer trick?
use "-A10m", for example (i do this). also you can make some additional gain by setting "-H" to the amount of memory that will be anyway used or will be anyway free when program running. so you can use something like this: "-H300m -A10m" in this case program will alloc at startup 300 mb and run minor GC only when all this memory used. if for example after this GC 30 mb will remain used, then next GC will run only after that 270 free mb will be alloced. when memory usage will grow after 300 mb, minor GCs will be performed after each 10 mb ("-A") alloced -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com