
On 16 Feb 2008, at 3:06 PM, Ruslan Evdokimov wrote:
Hi, all!
I have strange GHC behavior. Consider the code:
import Control.Parallel
main = print (o `par` (fromInteger e) / (fromInteger o)) where [e,o] = map sum $ map (`filter` numbers) [even, odd] numbers = [1..10000000]
When it compiled without threaded it has 19068 ms to run, 396 Mb total memory in use and %GC time 88.2%, the same with - threaded and +RTS -N1, but with +RTS -N2 it takes only 3806 ms to run, 3 Mb total memory in use and %GC time 8.1%. Why it so? It's a bug or I missed something?
Wild guess? If you leave o as a thunk, to be evaluated once the program has e, then it has numbers, so you keep the entire 10-million entry list in memory. Evaluating e and o in parallel allows the system to start garbage collecting cons cells from numbers much earlier, which reduces residency (I'd've been unsuprised at more than two orders of magnitude). Managing the smaller heap (and especially not having to copy numbers on each GC) then makes the garbage collector go much faster, so you get a smaller run time.
I test it on dual-core Athlon X2 4200+ 2Gb running 64bit Gentoo system. gcc 4.2.2 and ghc 6.8.2.
jcc