
On 10/28/07, Isaac Dupree
Josef Svenningsson wrote:
Less bogus timing: avg4: 18.0s avgS: 2.2s avgP: 17.4s
OK, so these figures make an even stronger case for my conclusion :-) Single traversal can be much faster than multiple traversals *when done right*.
Did you use +RTS -N2 on your program (or whatever it is that makes GHC actually use multiple threads? or is that not necessary?) Anyway I assume you wouldn't get better than 9.0s, which is still much worse than 2.2s.
Oh, this is getting embarrassing.. Indeed, I forgot to use +RTS -N2. But using those flags yielded a very interesting result: avgP: 4.3s Superlinear speedup!? As you say, I would have expected something slightly larger than 9s. I think what happens here is that for avg4 the entire list has to be kept in memory between the two traversals whereas for avgP the beginning of the list can be garbage collected incrementally as the two threads traverse it. This could mean that the list never moves to the second generation in the memory manager and that can maybe account for the additional time savings. I'm not sure how to verify that this is the case though. Cheers, /Josef