[Haskell-cafe] Re: [Haskell] ANNOUNCE: Haskell 2010 Report (final)

14 Jul 2010

      On 12/07/2010 22:12, John Meacham wrote:
...
Yeah, I didn't realize how important the allocator was until I started
benchmarking, spending time cutting the cost of marking garbage in
half didn't help nearly as much as shaving a few cycles off the
allocator. The fast pass of the allocator is actually very fast, each
object type has its own allocator and free list so allocation is pretty
much just pulling an object off of the free list, it is already of the
appropriate size and its constant fields are pre-initialized as they
arn't cleared during garbage collection. (there is a heuristic that
claims full pages back to the general pool sometimes).
The slow path has to grab a full page and initialize it, but that isn't
really much slower as I can prefetch the cache lines needed so the cost
is on the order of another cache line fill. (thinking about
computational complexity in terms of O(cache line fills) rather than
O(operations) is much more appropriate on todays architectures.).
Right, you can see how important locality is by looking at these graphs 
that Don produced recently:

http://haskell.org/haskellwiki/Ghc-gc-tune

generational GC these days is important more for locality than for the 
benefits of avoiding repeated tracing.

Speaking of prefetching, we get a lot of benefit in GHC from the 
automatic prefetching done by modern CPUs; I'm not sure how this would 
be affected by having multiple allocation regions.  Manual prefetching 
is almost impossible to get right in my experience, see also 
Nethercote/Mycroft where they did some prefetching experiments with GHC:

http://portal.acm.org/citation.cfm?id=773044

Cheers,
	Simon

[Haskell-cafe] Re: [Haskell] ANNOUNCE: Haskell 2010 Report (final)

Simon Marlow