
[moved from libraries to glasgow-haskell-users]
Ross Paterson
GCs that happen during this process will be more expensive, as they have to scan the stack. I suspect that GC costs are swamping everything else for large n.
I just tweaked the implementation GC in my compiler of my language, so that minor collection doesn't scan the whole stack, but only the part up to the deepest point where the stack pointer has been since the previous collection. Deeper regions contain only old pointers so they don't need to be scanned (I have only two generations). A program which builds a list of length 270,000 non-tail-recursively, which in a strict language leads to proportional usage of the stack, and performs on average 4kB of temporary allocations for each element, so there are 9,000 GCs in total with the young heap of 128 kB, runs 10 times faster after the change. GC takes 5% of the time instead of 88%. The implementation relies on the fact that only the topmost stack frame can be mutated, so it's enough to look only one frame deeper than the stack pointer reached. Each frame contained a pointer which is used only for GC and for stack trace printing, and thus it can be marked in the lowest bit without impacting normal operations. I prefer to make non-tail recursion efficient than to have to rewrite algorithms to use a large heap instead of the stack. -- __("< Marcin Kowalczyk \__/ qrczak@knm.org.pl ^^ http://qrnik.knm.org.pl/~qrczak/