
On Thursday 20 May 2004 11:32 am, Ross Paterson wrote:
Unfortunately, I think a large part of the time here is probably accounted for by the time taken to generate the test data itself, if the results on my machine are anything to go by. Using your test data increases the execution time of the insertion test (for 100000 pairs) from 0.39 seconds to 1.88 seconds for AVL trees.
Building random data costs more, but random insertions are also more costly than ordered ones. (With ordered insertions, the insertion path is much shorter than the average tree depth.) Also, the difference varies between implementations.
I don't know if that's true for the trees used in FiniteMap, but for AVL trees newly inserted elements always become leaves, so I would guess that the insertion path is always about the same as the average tree depth, regardless of order. In fact I find it hard to see how else things could work, but without knowing the details of other implementations I guess I can't say for sure. One possible alternative explanation that occurs to me is that ordered insertions give better cache behaviour. Typically the insertion path will change little from one insertion to the next, so most of the nodes concerned will likely be resident in cache already, as they were just created by the previous insertion. So maybe the apparent difference between AVL trees and FiniteMap in this case is really due almost entirely to differing heap burn rate, rather than differences in the notional efficiency of the rebalancing algorithm. It would be good to try to establish a set of representative benchmarks for the various tree types though. Trouble is with a garbage collected language modern cached CPU's it's hard to be sure that benchmarks are representative of the true costs in real progams. It's also hard to be sure that the measured time for random data generation in isolation is representative of the cost where it runs in the presence of other cache disrupting code, for example. Every time I've tried to normalise my results this way I get answers that don't seem to make much sense :-( Regards -- Adrian Hey
participants (1)
-
Adrian Hey