Re: [Haskell-cafe] Re: [Haskell] ANNOUNCE: Haskell 2010 Report (final)

14 Jul 2010


      On Wed, Jul 14, 2010 at 10:35:50AM +0100, Simon Marlow wrote:
...
...
Yeah, I didn't realize how important the allocator was until I started
benchmarking, spending time cutting the cost of marking garbage in
half didn't help nearly as much as shaving a few cycles off the
allocator. The fast pass of the allocator is actually very fast, each
object type has its own allocator and free list so allocation is pretty
much just pulling an object off of the free list, it is already of the
appropriate size and its constant fields are pre-initialized as they
arn't cleared during garbage collection. (there is a heuristic that
claims full pages back to the general pool sometimes).
The slow path has to grab a full page and initialize it, but that isn't
really much slower as I can prefetch the cache lines needed so the cost
is on the order of another cache line fill. (thinking about
computational complexity in terms of O(cache line fills) rather than
O(operations) is much more appropriate on todays architectures.).
Right, you can see how important locality is by looking at these graphs  
that Don produced recently:
http://haskell.org/haskellwiki/Ghc-gc-tune
generational GC these days is important more for locality than for the  
benefits of avoiding repeated tracing.
Yeah, jhc's GC is currently not generational, it is certainly something
I want to do, something I have considered as a quick hack was splitting
up allocations based on whether they were updatable or not. allocating
things that can statically be determined to be in HNF into an "older"
generation and allocating updatable thunks in a younger one. The theory
being updatable thunks are more likely to be overwritten by a
indirection and become garbage soon, I was thinking this could give me
some of the benefits of a generational collector without having to
rewrite my GC around being able to promote objects to an older
generation. I am not sure if it will work, but it is an idea I have been
toying with. Implementing better heap profiling support for jhc generated
code is a prerequisite in any case.
...
Speaking of prefetching, we get a lot of benefit in GHC from the  
automatic prefetching done by modern CPUs; I'm not sure how this would  
be affected by having multiple allocation regions.  Manual prefetching  
is almost impossible to get right in my experience, see also  
Nethercote/Mycroft where they did some prefetching experiments with GHC:
http://portal.acm.org/citation.cfm?id=773044
Ah, this paper looks very interesting, I was wondering if you had
experimented with prefetching just ahead of the allocation pointer.
Looks like it helped :)

        John

-- 
John Meacham - ⑆repetae.net⑆john⑈ - http://notanumber.net/