
Hi,
On Tue, Jul 10, 2012 at 1:59 AM, Johan Tibell
wrote: On Thu, Jul 5, 2012 at 8:01 AM, Milan Straka
wrote: Any balanced tree can support indexing API, as long as every subtree stores its size. That is obviously trivially true for size-balanced trees, as the subtree sizes are used for balancing. If for example AVL trees were used, sizes would have to be stored explicitly in every node for indexing API to work fast.
There might be some other benefits to adding the size as well. IIRC union is faster if the first set is smaller than the second set. If we can look up the sizes in O(1) time we can rearrange the arguments to make sure this is the case.
I've always wondered why Data.Map doesn't do this by default. E.g. I have:
-- | Map.union says it's more efficient with @big `union` small@, so this one -- flips the args to be more efficient. It's still left-biased. union2 :: (Ord k) => Map.Map k v -> Map.Map k v -> Map.Map k v union2 m1 m2 | Map.size m1 < Map.size m2 = m2 `right` m1 | otherwise = m1 `Map.union` m2 where right = Map.unionWith (\_ a -> a)
there are several problems with this. Most notably, it is not true that "set1 < set2" is always better than "set1 > set2" -- if you run the containers/benchmarks/SetOperations/bench-SetOperations-Map benchmark, you can see that for different input distributions, "set1 < set2" is sometimes faster and sometimes slower than "set1 > set2". For various input distributions, difference between "set1 < set2" and "set1 > set2" are -7.5%, -25%, -35%, -17%, +20%, +8%, -13%. Also union is more efficient than (unionWith const), because is preserves sharing of the nodes present in both sets. That could be solved by creating another implementation of right-biased unionR which also preserves sharing. Probably the best idea is just to remove the documentation comment about "bigset `union` smallset" being faster, as it is not true :) Cheers, Milan