New subject: Scarry parallel garbage collection behavior (6.12.1, Mac OSX)

19 Feb 2010

      Hi, list. 

This is a long post, sorry. The bottom line: specifying +RTS -Nn where n > number of cores slows the program  compiled with 6.12.1 by orders of magnitude under Mac OSX. Turning off parallel garbage collections with -qg 
resolves the problem. Independent verification is appreciated.

I have a very simple program: 

import Control.Parallel

main=do
    let 
        s = s1  `par` ( s2 `pseq` s1 + s2) 
        s1 = sum [1..10000000]
        s2 = sum [1..10000005]
    print s

compiled with
ghc -O -threaded

running it with without any options 
time ./clocktime 

produces the following times:

real	0m3.779s
user	0m3.644s
sys	0m0.053s

Giving -N option to RTS (or -N2, I have two core macbook) shows decent speedup:
100000060000015

real	0m2.678s
user	0m4.762s
sys	0m0.043s

though increase in processor time by 30% seems suspicious. 
Things become weird if I increase the number of threads. I couldn't expect any further speedup for this task but what I get is absolutely unexpected:

time ./clocktime +RTS -N3
100000060000015

real	0m57.071s
user	1m40.910s
sys	0m0.921s

Almost a minute!!  Adding more threads does not help either :) 

time ./clocktime +RTS -N4
100000060000015

real	1m48.281s
user	3m10.832s
sys	0m1.590s

A bit of playing with Instruments/DTrace shows that runaway garbage collection puts the program on its knees. 
I the case of -N1 the top time consumers look like this: 
Self %	Total %	Self Run %	Running %	# Self	# Samples	ms Self Run	ms Running	Parent %	Library	Source Path	Symbol Name
12.8	12.8	408	clocktime	__gmpz_add
10.7	10.7	341	clocktime	integer_cmm_plusIntegerzh
10	10	321	clocktime	allocateLocal

etc. This looks absolutely reasonable. Going to -N2 changes the picture to: 
Self %	Total %	Self Run %	Running %	# Self	# Samples	ms Self Run	ms Running	Parent %	Library	Source Path	Symbol Name
15.6	15.6	989	clocktime	gcWorkerThread
11.6	11.6	736	clocktime	waitForGcThreads
9.5	9.5	603	clocktime	scavenge_until_all_done
7.4	7.4	471	clocktime	setContextSwitches
6.2	6.2	396	clocktime	__gmpz_add

-N3 Is awful: 
Self %	Total %	Self Run %	Running %	# Self	# Samples	ms Self Run	ms Running	Parent %	Library	Source Path	Symbol Name
51.7	51.7	29514	clocktime	gcWorkerThread
25.5	25.5	14591	clocktime	scavenge_until_all_done
7.2	7.2	4114	clocktime	waitForGcThreads
5.5	5.5	3180	clocktime	GarbageCollect
5.2	5.2	2998	clocktime	setContextSwitches
0.4	0.4	240	clocktime	integer_cmm_plusIntegerzh

Pavel

Scarry parallel garbage collection behavior (6.12.1, Mac OSX)

Pavel Perikov

Simon Marlow

Pavel Perikov

tags

participants (2)