
On 14 May 2004 16:47, Christian Maeder wrote:
My experiments (with large maps) wrt. performance (and memory consumption with and without optimization) of both Map and FiniteMap implementations revealed differences, but no crucial ones.
"FiniteMap Int a" performs better than "Map Int a", but not as good as "IntMap a", but the differences are unimportant for us.
How much better? I'm not keen to import something that's going to degrade performance of existing code noticeably. If DData.Map performs worse than DData.FiniteMap, I'm sure it can be improved (they use the same algorithm, after all).
(We would need a pure HashMap with O(1) lookup while extending the Map)
Could you elaborate? Cheers, Simon

Simon Marlow wrote:
"FiniteMap Int a" performs better than "Map Int a", but not as good as "IntMap a", but the differences are unimportant for us.
How much better? I'm not keen to import something that's going to degrade performance of existing code noticeably.
Ok, I'll send you numbers (later). Are you going to reimplement FiniteMap via DData.Map (or how could existing code suffer)?
(We would need a pure HashMap with O(1) lookup while extending the Map)
Could you elaborate?
Actually an IntMap or Array where we never try to lookup an index that has not been set before and once it's set, we'll never change the corresponding entry while avoiding the monad of mutable arrays or the HashTable implementation. Christian

Simon Marlow wrote:
"FiniteMap Int a" performs better than "Map Int a", but not as good as "IntMap a", but the differences are unimportant for us.
How much better? I'm not keen to import something that's going to degrade performance of existing code noticeably.
for a map with several hundred thousand entries: (user seconds) (page faults) FiniteMap-only: 5.61 30613 DData.Map-only: 6.50 36911 DData.Map+IntMap: 4.14 21296 FiniteMap+IntMap: 3.85 19973 for optimized code only (I'll send you the sources separately) Christian

On Friday 14 May 2004 5:05 pm, Simon Marlow wrote:
On 14 May 2004 16:47, Christian Maeder wrote:
My experiments (with large maps) wrt. performance (and memory consumption with and without optimization) of both Map and FiniteMap implementations revealed differences, but no crucial ones.
"FiniteMap Int a" performs better than "Map Int a", but not as good as "IntMap a", but the differences are unimportant for us.
How much better? I'm not keen to import something that's going to degrade performance of existing code noticeably.
If DData.Map performs worse than DData.FiniteMap, I'm sure it can be improved (they use the same algorithm, after all).
Would you be keen to use something that enhanced performance of existing code :-) I've just been trying some simple benchmarks of my AVL implementation vs. current FiniteMap implementation and using AVL trees seems to about twice as fast for insertion. On my machine building an AVL tree of 1000000 (key,value) pairs takes about 5 seconds, whereas the same takes about 10 seconds with FiniteMap (the keys are Ints for this test). Just thought I'd mention it as another data point for consideration if changes are occuring (and bragging a little too I confess :-) Regards -- Adrian Hey

G'day all.
Quoting Adrian Hey
I've just been trying some simple benchmarks of my AVL implementation vs. current FiniteMap implementation and using AVL trees seems to about twice as fast for insertion. On my machine building an AVL tree of 1000000 (key,value) pairs takes about 5 seconds, whereas the same takes about 10 seconds with FiniteMap (the keys are Ints for this test).
Just curious: Have you timed deletions? Cheers, Andrew Bromage

On Monday 17 May 2004 6:14 am, ajb@spamcop.net wrote:
G'day all.
Quoting Adrian Hey
: I've just been trying some simple benchmarks of my AVL implementation vs. current FiniteMap implementation and using AVL trees seems to about twice as fast for insertion. On my machine building an AVL tree of 1000000 (key,value) pairs takes about 5 seconds, whereas the same takes about 10 seconds with FiniteMap (the keys are Ints for this test).
Just curious: Have you timed deletions?
I've just tried that yesterday. Here are the results for 2 tests.. Test1: Build a tree/finitemap of 1000000 pairs. Test2: As Test1, followed by deletion of all 1000000 pairs. Pairs are inserted/deleted in numerical order [1,2..1000000] Times are in seconds. There seems to be some variability in timing measusrements (about +/- 5%), so the following figures are averages over a few runs. AVL FiniteMap Test1: 5.2 9.5 Test2: 8.3 13.4 Test2-Test1: 3.1 3.9 Not so much difference for deletion. Strangely, for both AVL and FiniteMap, deletion seems to be faster than insertion. I'm not sure what's going on here. Anyway, I'll try to post my AVL library (and the code for the above tests) as it currently stands somewhere today (it's not finished though). Regards -- Adrian Hey

On Mon, May 17, 2004 at 09:39:25AM +0100, Adrian Hey wrote:
Here are the results for 2 tests..
Test1: Build a tree/finitemap of 1000000 pairs. Test2: As Test1, followed by deletion of all 1000000 pairs.
Pairs are inserted/deleted in numerical order [1,2..1000000]
Inserting in random order gives a less dramatic difference. I changed your testData to testData = take 100000 $ randomRs (0, 2^30-1) (mkStdGen 7) (I have a slower machine), changed isEmpty to size to guarantee strictness, and got the following times (averaged over 50 runs each): ins ins+del Data.FiniteMap 4.783 8.304 Data.Tree.AVL 4.561 6.895 DData.Map 4.765 7.369 DData.IntMap 4.952 7.742 (I tried to be fair by only using ! and UNPACK on Ints in each case, and compiling them all the same way: -O.) I assume that Christian's results imply that IntMap is better for lookup, but it doesn't look very attractive here.

On Wednesday 19 May 2004 11:28 am, Ross Paterson wrote:
On Mon, May 17, 2004 at 09:39:25AM +0100, Adrian Hey wrote:
Here are the results for 2 tests..
Test1: Build a tree/finitemap of 1000000 pairs. Test2: As Test1, followed by deletion of all 1000000 pairs.
Pairs are inserted/deleted in numerical order [1,2..1000000]
Inserting in random order gives a less dramatic difference.
Yes, I considered this, but decided to do it with sorted data in order to give whatever balancing mechanism is employed a hammering.
I changed your testData to
testData = take 100000 $ randomRs (0, 2^30-1) (mkStdGen 7)
(I have a slower machine), changed isEmpty to size to guarantee strictness,
Actually, it shouldn't be necessary to do this (for AVL trees at least). They should behave as being strict in their recursive fields, and indeed were at one time. But I discovered this actually slowed things down compared with using explicit `seq` in the code to force evaluation (slow down was not huge, about 5% or so).
and got the following times (averaged over 50 runs each):
ins ins+del Data.FiniteMap 4.783 8.304 Data.Tree.AVL 4.561 6.895 DData.Map 4.765 7.369 DData.IntMap 4.952 7.742
Unfortunately, I think a large part of the time here is probably accounted for by the time taken to generate the test data itself, if the results on my machine are anything to go by. Using your test data increases the execution time of the insertion test (for 100000 pairs) from 0.39 seconds to 1.88 seconds for AVL trees. Anyway, FWIW here's the results on my machine (using isEmpty, not size) For 100000 sorted pairs, ins ins+del Data.FiniteMap 0.83 1.18 Data.Tree.AVL 0.39 0.71 For 100000 random pairs, ins ins+del Data.FiniteMap 2.04 3.80 Data.Tree.AVL 1.88 3.16 For 1000000 random pairs, ins ins+del Data.FiniteMap 32.1 69.4 Data.Tree.AVL 27.0 54.0 Sorry, I don't have DData installed at the moment :-( Hmm, dunno what to make of these huge times, other than it seems generating the test data costs more than using it :-). There still seems to be about 5 seconds difference between the two for insertion though. Regards -- Adrian Hey

On 19 May 2004 11:28, Ross Paterson wrote:
testData = take 100000 $ randomRs (0, 2^30-1) (mkStdGen 7)
(I have a slower machine), changed isEmpty to size to guarantee strictness, and got the following times (averaged over 50 runs each):
ins ins+del Data.FiniteMap 4.783 8.304 Data.Tree.AVL 4.561 6.895 DData.Map 4.765 7.369 DData.IntMap 4.952 7.742
(I tried to be fair by only using ! and UNPACK on Ints in each case, and compiling them all the same way: -O.) I assume that Christian's results imply that IntMap is better for lookup, but it doesn't look very attractive here.
Unable to resist a good benchmark, I did my own measurements. Same random input, but I did the timing in Haskell so I could factor out the test data construction. Results averaged over 10 runs: Insert Lookup Delete AVL 0.88 0.35 0.89 FiniteMap 1.08 0.19 1.17 DData.Map 1.09 0.23 1.13 DData.IntMap 0.72 0.18 0.61 I UNPACKed the appropriate fields in Map and IntMap. AVL loses out on lookup. Probably due to the extra pair that needs to be deconstructed at each node. I don't know why DData.Map comes out worse than FiniteMap for lookup: the implementations are the same, so perhaps the tree ends up differently balanced? Cheers, Simon

On 20 May 2004 16:34, Simon Marlow wrote:
On 19 May 2004 11:28, Ross Paterson wrote:
testData = take 100000 $ randomRs (0, 2^30-1) (mkStdGen 7)
(I have a slower machine), changed isEmpty to size to guarantee strictness, and got the following times (averaged over 50 runs each):
ins ins+del Data.FiniteMap 4.783 8.304 Data.Tree.AVL 4.561 6.895 DData.Map 4.765 7.369 DData.IntMap 4.952 7.742
(I tried to be fair by only using ! and UNPACK on Ints in each case, and compiling them all the same way: -O.) I assume that Christian's results imply that IntMap is better for lookup, but it doesn't look very attractive here.
Unable to resist a good benchmark, I did my own measurements. Same random input, but I did the timing in Haskell so I could factor out the test data construction. Results averaged over 10 runs:
Insert Lookup Delete AVL 0.88 0.35 0.89 FiniteMap 1.08 0.19 1.17 DData.Map 1.09 0.23 1.13 DData.IntMap 0.72 0.18 0.61
And after eliminating some bogosity in the benchmark, I now get: Insert Lookup Delete AVL 0.87 0.22 0.87 FiniteMap 1.07 0.17 1.11 DData.Map 1.08 0.16 1.02 DData.IntMap 0.65 0.16 0.61 So I declare that on random data, DData.Map has essentially the same performance as FiniteMap. IntMap is faster for insertion/deletion as expected. AVL is better than FiniteMap for insertion/deletion but slower for lookups. Cheers, Simon

On Thursday 20 May 2004 5:04 pm, Simon Marlow wrote:
And after eliminating some bogosity in the benchmark, I now get:
Insert Lookup Delete AVL 0.87 0.22 0.87 FiniteMap 1.07 0.17 1.11 DData.Map 1.08 0.16 1.02 DData.IntMap 0.65 0.16 0.61
So I declare that on random data, DData.Map has essentially the same performance as FiniteMap. IntMap is faster for insertion/deletion as expected. AVL is better than FiniteMap for insertion/deletion but slower for lookups.
Could you explain the bogosity you eliminated? The reason I ask is that those second results for lookup for AVL trees seem pretty reasonable to me, considering the AVL library is general purpose and not specialised for any kind of map. Currently the AVL implementation will have to read 3 heap records per tree node visited, vs. 2 for FiniteMap and DData.Map. For the balanced tree implementations, I wouldn't expect to see any significant difference in lookup times (given the same degree of specialisation) and what differences there may be would most likely be due to differences in how well they managed to balance trees. (But I guess if random data is to be our benchmark, even simple unbalanced trees would probably look quite reasonable :-) DData.IntMap is rather different from the other three, so I'm not sure exactly how well I would expect it to perform, but if you assume lookup time is proportional to the No. of heap records examined (seems to be reasonable comparing AVL and FiniteMap/DData.Map), then you'd expect a lookup time of about 0.08 for each if they were specialised for Int maps. Regards -- Adrian Hey
participants (5)
-
Adrian Hey
-
ajb@spamcop.net
-
Christian Maeder
-
Ross Paterson
-
Simon Marlow