
On Sun, Feb 17, 2008 at 03:07:15AM +0300, Ruslan Evdokimov wrote:
2008/2/17, Jonathan Cast
: Wild guess? If you leave o as a thunk, to be evaluated once the program has e, then it has numbers, so you keep the entire 10-million entry list in memory. Evaluating e and o in parallel allows the system to start garbage collecting cons cells from numbers much earlier, which reduces residency (I'd've been unsuprised at more than two orders of magnitude). Managing the smaller heap (and especially not having to copy numbers on each GC) then makes the garbage collector go much faster, so you get a smaller run time.
But I also tested it on P-IV 3.0 with HT and 1GB (single core) running Windows-XP (ghc 6.8.2), and it works fine (fast & low GC) in all three cases without significant difference. Sure it didn't runs faster with -N2 'cause it's not dual-core.
This makes perfect sense - -N2 tells GHC to use two threads, and if you run two threads on a single-processor system it's implemented by running the threads alternatingly (around 100/s for modern Linux, probably similar for other systems). Thus, the two evaluations never get more than a hundreth of a second out of step, and memory usage is still low. Stefan