
On 12/07/2010 22:12, John Meacham wrote:
Yeah, I didn't realize how important the allocator was until I started benchmarking, spending time cutting the cost of marking garbage in half didn't help nearly as much as shaving a few cycles off the allocator. The fast pass of the allocator is actually very fast, each object type has its own allocator and free list so allocation is pretty much just pulling an object off of the free list, it is already of the appropriate size and its constant fields are pre-initialized as they arn't cleared during garbage collection. (there is a heuristic that claims full pages back to the general pool sometimes).
The slow path has to grab a full page and initialize it, but that isn't really much slower as I can prefetch the cache lines needed so the cost is on the order of another cache line fill. (thinking about computational complexity in terms of O(cache line fills) rather than O(operations) is much more appropriate on todays architectures.).
Right, you can see how important locality is by looking at these graphs that Don produced recently: http://haskell.org/haskellwiki/Ghc-gc-tune generational GC these days is important more for locality than for the benefits of avoiding repeated tracing. Speaking of prefetching, we get a lot of benefit in GHC from the automatic prefetching done by modern CPUs; I'm not sure how this would be affected by having multiple allocation regions. Manual prefetching is almost impossible to get right in my experience, see also Nethercote/Mycroft where they did some prefetching experiments with GHC: http://portal.acm.org/citation.cfm?id=773044 Cheers, Simon