
On 24 May 2004 12:03, Adrian Hey wrote:
On Thursday 20 May 2004 5:04 pm, Simon Marlow wrote:
And after eliminating some bogosity in the benchmark, I now get:
Insert Lookup Delete AVL 0.87 0.22 0.87 FiniteMap 1.07 0.17 1.11 DData.Map 1.08 0.16 1.02 DData.IntMap 0.65 0.16 0.61
So I declare that on random data, DData.Map has essentially the same performance as FiniteMap. IntMap is faster for insertion/deletion as expected. AVL is better than FiniteMap for insertion/deletion but slower for lookups.
Could you explain the bogosity you eliminated?
I did a full GC before running each part of the test to keep random GC effects to a minimum, and eliminated a couple of cases of badly optimised code in the test program.
The reason I ask is that those second results for lookup for AVL trees seem pretty reasonable to me, considering the AVL library is general purpose and not specialised for any kind of map. Currently the AVL implementation will have to read 3 heap records per tree node visited, vs. 2 for FiniteMap and DData.Map.
For the balanced tree implementations, I wouldn't expect to see any significant difference in lookup times (given the same degree of specialisation) and what differences there may be would most likely be due to differences in how well they managed to balance trees. (But I guess if random data is to be our benchmark, even simple unbalanced trees would probably look quite reasonable :-)
Of course, this isn't testing how well the balancing works. It is comparing raw performance of several balanced tree implementations, so measuring the overhead of balancing, if you like.
DData.IntMap is rather different from the other three, so I'm not sure exactly how well I would expect it to perform, but if you assume lookup time is proportional to the No. of heap records examined (seems to be reasonable comparing AVL and FiniteMap/DData.Map), then you'd expect a lookup time of about 0.08 for each if they were specialised for Int maps.
Yes, IntMap should get faster lookup. I looked at the code and managed to make a couple of improvements: adding a 'seq k $' at the beginning of lookup helps, and I fixed a small performance problem with the Word library, but interestingly these don't help the results. I still get 0.16s for the lookup test, so perhaps it is dominated by something else. Cheers, Simon

On Tuesday 25 May 2004 10:25 am, Simon Marlow wrote:
Yes, IntMap should get faster lookup. I looked at the code and managed to make a couple of improvements: adding a 'seq k $' at the beginning of lookup helps, and I fixed a small performance problem with the Word library, but interestingly these don't help the results. I still get 0.16s for the lookup test, so perhaps it is dominated by something else.
I think yesterdays changes announced by JP Bernardy probably improve matters a lot if there's any truth in my No. of heap records theory. Before this change it would have had the same 2 records/tree node overhead as FiniteMap/DData.Map. Regards -- Adrian Hey

On Tue, 25 May 2004 12:02:22 +0100, Adrian Hey
On Tuesday 25 May 2004 10:25 am, Simon Marlow wrote:
Yes, IntMap should get faster lookup. I looked at the code and managed to make a couple of improvements: adding a 'seq k $' at the beginning of lookup helps, and I fixed a small performance problem with the Word library, but interestingly these don't help the results. I still get 0.16s for the lookup test, so perhaps it is dominated by something else.
I think yesterdays changes announced by JP Bernardy probably improve matters a lot if there's any truth in my No. of heap records theory. Before this change it would have had the same 2 records/tree node overhead as FiniteMap/DData.Map.
I can see another speed improvement to lookup too: it now repeatedly does "fromIntegral" on the keys (via "zero" and "nomatch") maybe by let-binding and inlining this, we can make it faster. (I have not done any of this up till now, as it is not resistant to change but I guess that doesn't matter that much any more.) I'll tweak JP's latest sources and post them here again, so that Simon can run his benchmark on it. All the best, Daan.

Ok, here is a new "IntMap" module that might improve the performance of "lookup". Please, let's first test it before adding the changes to the real distribution. I have: 1) removed calls to "natFromInt" and "intFromNat". Supposedly, they should expand to nothing but who knows... 2) I have commented out the guard that tests if the integer is in the tree at all. When a key is present, lookup will be faster -- if the key is not present, lookup might be slower. Simon, I would be interested to see how it performs with and without the comment (2) (line 269). If performance is better without the comment, it means that ghc is spending time doing intFromNat/natFromInt which means that we might improve matters by storing Nat's instead of Int's in the IntMap. (I don't know if such tweaking is worth the trouble though.) All the best, Daan.

Ok, here is a new "IntMap" module that might improve the performance of "lookup". Please, let's first test it before adding the changes to the real distribution. I have: 1) removed calls to "natFromInt" and "intFromNat". Supposedly, they should expand to nothing but who knows... 2) I have commented out the guard that tests if the integer is in the tree at all. When a key is present, lookup will be faster -- if the key is not present, lookup might be slower. Simon, I would be interested to see how it performs with and without the comment (2) (line 269). If performance is better without the comment, it means that ghc is spending time doing intFromNat/natFromInt which means that we might improve matters by storing Nat's instead of Int's in the IntMap. (I don't know if such tweaking is worth the trouble though.) All the best, Daan.
participants (3)
-
Adrian Hey
-
Daan Leijen
-
Simon Marlow