
On 10/29/07, Josef Svenningsson
But using those flags yielded a very interesting result:
avgP: 4.3s
Superlinear speedup!? As you say, I would have expected something slightly larger than 9s. I think what happens here is that for avg4 the entire list has to be kept in memory between the two traversals whereas for avgP the beginning of the list can be garbage collected incrementally as the two threads traverse it. This could mean that the list never moves to the second generation in the memory manager and that can maybe account for the additional time savings. I'm not sure how to verify that this is the case though.
Bulat kindly suggested I use +RTS -s to monitor the garbage collectors behavior. It seems my hypothesis was right. avg4: 387 Mb total memory in use MUT time 2.43s ( 2.47s elapsed) GC time 15.32s ( 16.05s elapsed) avgP (+RTS -N2): 3 Mb total memory in use MUT time 4.61s ( 2.51s elapsed) GC time 0.04s ( 0.06s elapsed) So it seems that the garbage collector takes an awful lot of time when we allocate a big list like this. Hmmm. Strikes me as somewhat suboptimal. Cheers, /Josef