
On 27 May 2005 21:02, Marcin 'Qrczak' Kowalczyk wrote:
[moved from libraries to glasgow-haskell-users]
Ross Paterson
writes: GCs that happen during this process will be more expensive, as they have to scan the stack. I suspect that GC costs are swamping everything else for large n.
I just tweaked the implementation GC in my compiler of my language, so that minor collection doesn't scan the whole stack, but only the part up to the deepest point where the stack pointer has been since the previous collection. Deeper regions contain only old pointers so they don't need to be scanned (I have only two generations).
A program which builds a list of length 270,000 non-tail-recursively, which in a strict language leads to proportional usage of the stack, and performs on average 4kB of temporary allocations for each element, so there are 9,000 GCs in total with the young heap of 128 kB, runs 10 times faster after the change. GC takes 5% of the time instead of 88%.
Nice trick. Unfortunately the same assumptions don't hold for GHC's garbage collector - objects are aged in the youngest generation, so it is usually at least two GCs before an object is promoted. We could still do the same trick, but instead we'd have to find maximum stack depth at which all objects below are in an older generation. Incedentally, how do you mark the stack depth? Overwrite a return address with a special one, and keep the real one around in a known place? Cheers, Simon

"Simon Marlow"
Nice trick. Unfortunately the same assumptions don't hold for GHC's garbage collector - objects are aged in the youngest generation, so it is usually at least two GCs before an object is promoted. We could still do the same trick, but instead we'd have to find maximum stack depth at which all objects below are in an older generation.
I don't know how to do it under these assumptions, i.e. how to find that depth.
Incedentally, how do you mark the stack depth? Overwrite a return address with a special one, and keep the real one around in a known place?
This is the first thing I tried :-) and even implemented it before I realized that it doesn't work under my assumptions. The problem is that I pass the return address in a virtual register, and it's saved on the stack by the callee only if it performs some non-tail calls. The return address is saved at the end of the stack frame. This means that a tail call followed by a non-tail call might read the special return address from the stack, but without jumping through it immediately. This return address is put again on the stack by a different function, with a possibly different stack frame size, so it lands in a different position on the stack, and thus it can't be found when GC wants to restore it. While changing the stack frame layout could perhaps make this workable, I found a much simpler solution. Since forever each stack frame contained a pointer used only by the GC and by stack trace printing. It points to a static structure containing the frame size and return address -> source location mapping. This pointer is now marked in the lowest bit by the GC. Pushing a fresh stack frame always puts an even pointer. BTW, a few days earlier I fixed a similar behavior for extensible arrays. Most mutable objects use a write barrier which stores changed locations, but arrays store references to changed objects because their payload is malloced and may be reallocated without a GC. This meant that code which repeatedly appends to a given array has O(N^2) complexity for huge arrays, as each GC scans the whole array. So I now store the size of the initial part of the array which is known to be unchanged since last GC. This is updated manually by particular operations. -- __("< Marcin Kowalczyk \__/ qrczak@knm.org.pl ^^ http://qrnik.knm.org.pl/~qrczak/
participants (2)
-
Marcin 'Qrczak' Kowalczyk
-
Simon Marlow