
On 2010-03-06 12:42 +0000 (Sat), Simon Marlow wrote:
Usually I find keeping the nursery size (-A) close to the L2 cache size works best, although sometimes making it really big can be even better.
Interesting to know. I got the impression that I was being encouraged to keep -A closer to the L1 cache size, myself.
-qg disables parallel GC completely. This is usually terrible for locality, because every GC will move all the recently allocated data from each CPU's L2 cache into the cache of the CPU doing GC, where it will have to be fetched out again after GC.
I've since explained to Cranshaw (we are getting to have *way* too many 'Simon's around here) about the issues with our different machines; some of this depends on the host on which we're doing the testing. * Our Core i7 hosts share 8 MB of L3 cache amongst four cores with two threads each. Thus, no locality penalties here. * Our Xeon E5420 host has two 4-core CPUs, and each pair of cores shares a 6 MB L2 cache. Thus there's a pretty good chance that something you need is in someone else's cache. I don't know if there's any difference between moving stuff between two caches on the same CPU and two caches on different CPUs. * Our Xeon E5520 host has two 4-core CPUs, each core of which has two threads. Each CPU (4 cores) shares an 8 MB L3 cache. Thus, presumably, less locality penalty than the E5420 but more than an i7. As a side note, I also see slightly less memory bandwidth on this system (for both caches and main memory) than I do on an i7. This gets complex pretty fast. And don't ask me about Intel's new style of having L1 and L3 or L2 and L3 caches rather than L1 and L2 caches.
-qb disables load-balancing in the parallel GC, which improves locality at the expense of parallelism, usually I find it is an improvement in parallel programs.
I'd think so too. Figuring out what went on here is going to have to
wait until I get more detailed GC information in the eventlog.
Followups to glasgow-haskell-users@haskell.org.
cjs
--
Curt Sampson