
On Fri, 19 Mar 2004 09:00:22 +0100, Robert Will
both of the data structures are based on Adams' balancing algorithm which contains a bug -- at least in its proof. (Perhaps it is correct, but I don't know anyone that knows if.)
I have been redoing proofs of Adam's algorithms too some years ago when writing DData -- there were some errors but I think that calling it "bugged" is rather harsh :-) Anyway, I think you are referring to two separate cases (as far as I can remember though): 1) Given a constant "delta" (the maximal relative size differences between any two nodes in the tree) and the "ratio" (the ratio between the outer and inner sibling of the heavier subtree in a node that needs rebalancing). "delta" is "w" in Adam's paper, and "ratio" the inverse of alpha. Now, we can prove that delta must be larger than 4.65 with a ratio of 2, and we can prove that we need at least a ratio of 2 to maintain the balance invariants. In his paper though Adams uses an incorrect ratio of 1. However, both Data.FiniteMap and DData.Map use the correct ratio. 2) For "join", and "merge", the proof brakes down too, but I remember that it suffices to replace Adam's "<" comparisons with "<=" comparisons. This is why the balancing assertion checks in "Data.FiniteMap" are turned off (as they fail), but this relatively easy to fix. (DData.Map uses the correct comparison and works with assertion checking on). Unfortunately, I couldn't find my notes on this issue anymore, but I think that when you re-run the proofs with "<=", it will work out correctly. In general, I would like to stress that one tends to overrate the usefullness of balancing -- Balancing is automatic for random data and balancing is only needed to avoid some pathological worst-cases. Almost any choice of balancing scheme will do. This is why the Data.FiniteMap structure has proven its use in practice even when sometimes producing slightly unbalanced trees. All the best, Daan.
Instead of fixing the bug I have re-derived the algorithm from the invariant together with a (now hopefully correct) proof. The new algorithm is contained in Dessy: http://www.stud.tu-ilmenau.de/~robertw/dessy/fun/ more specifically in the file: http://www.stud.tu-ilmenau.de/~robertw/dessy/fun/WeightedTrees.hs
The derivation, proof (and hint to Adam's bug) are in: http://www.stud.tu-ilmenau.de/~robertw/dessy/fun/democratic/sequence.hs
(The so-called hedge-union, -intersection, and -difference is also implemented: function 'x_merge_width' does all three: http://www.stud.tu-ilmenau.de/~robertw/dessy/fun/Dessy.hs )
A paper which presents the new (much simpler) algorithm and its proof putting it in scientific context is in preparation.
Users that prefer proven algorithms over tested ones are free to use my implementation instead of the existing ones. (At moment, there is no proven and tested implementation. Of course this will change soon.)
As for the maintenance of Data.FiniteMap and DData.Map, I don't know whether it is worth "fixing" the implementations or one would better replace them by a completely new design. Anyway, algorithms have no copyright and everyone is free to use the new "weight-balancer according to Will" to implement his own data structures. Oh, by the way, some of the other balancers in Dessy (or DData) might be more efficient than weight-based balancing anyway, I'll make some measurements later this year.
good time, Bob _______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries