
However, I have never tested this with proper benchmarks. (now that I think about it, I might have tested it with improper benchmarks... I'll
Simon Marlow wrote: try
to find on my computer when I get home).
Great, I'd be really interested to see some numbers.
From this data, I got the suspicion that every set implementation
I have found the benchmark code again, but unfortunately I have never tested against Data.FiniteMap... but I can adapt the sources and do some testing next week. A word of warning though, the tests are not necessarily honest, precise, and scientifically responsible. However, they may give a good intuition. I ran some common operations like "union", "member", and "fromList" on ordered and on random data and compared their wall-clock performance. All data was explicitly forced with performGC's in between to make an as honest as possible comparison. For a *real* test however, one should probably use specialized tools and specialized data sets. Here are the unscientific results for some DData structures. (ST) DData.Set: size balanced tree (PT) DData.IntSet: big-endian patricia tree ST(msec) PT(msec) relative ..timing: fromlist ordered0: 546 - 125 - 4.4 ..timing: fromlist ordered1: 531 - 125 - 4.2 (descending order) ..timing: fromlist xordered0: 78 - 125 - 0.6 (known to be ordered) ..timing: fromlist xordered1: 109 - 140 - 0.8 (known to be in reverse order) ..timing: fromlist random0: 281 - 156 - 1.8 ..timing: fromlist random1: 281 - 156 - 1.8 ..timing: fromlist random2: 609 - 359 - 1.7 ..timing: union random0 : 578 - 250 - 2.3 ..timing: union random1 : 656 - 296 - 2.2 ..timing: union random2 : 671 - 296 - 2.3 ..timing: union tangled : 281 - 125 - 2.2 ..timing: union disjoint1: 218 - 93 - 2.3 ..timing: union disjoint2: 296 - 156 - 1.9 ..timing: member0: 546 - 468 - 1.2 ..timing: member1: 484 - 484 - 1.0 ..timing: member2: 484 - 484 - 1.0 Here we can see that an IntSet is always faster than a "Set Int" except for known ordered data (= xordered) Here is an interesting bench mark for an unreleased version of DData that included "Treaps": these are trees where each node is associated with a random priority that determines the position of the node. As these are random, the tree will be balanced without doing explicit rebalancing. The functional Treap uses clever hash functions to get semi random priorities. Here is a Treap set compared to "Set Int". ST(msec) Treap(msec) relative ..timing: fromlist ordered0: 531 - 234 - 2.3 ..timing: fromlist ordered1: 531 - 203 - 2.6 ..timing: fromlist xordered0: 93 - 234 - 0.4 ..timing: fromlist xordered1: 109 - 203 - 0.5 ..timing: fromlist random0: 296 - 312 - 0.9 ..timing: fromlist random1: 281 - 312 - 0.9 ..timing: fromlist random2: 609 - 671 - 0.9 ..timing: union random0 : 578 - 593 - 1.0 ..timing: union random1 : 687 - 687 - 1.0 ..timing: union random2 : 687 - 687 - 1.0 ..timing: union tangled : 265 - 296 - 0.9 ..timing: union disjoint1: 203 - 203 - 1.0 ..timing: union disjoint2: 296 - 296 - 1.0 ..timing: member0: 546 - 468 - 1.2 ..timing: member1: 484 - 484 - 1.0 ..timing: member2: 468 - 484 - 1.0 ..timing: unions: 140 - 78 - 1.8 based on balanced trees will do about as good as any other. I hope to do some fresh testing against Data.FiniteMap soon, as it will be quite important to know about there absolute efficiency. All the best, Daan.
participants (1)
-
Daan Leijen