Bug in the proof of Data.FiniteMap and DData.Map

hi, both of the data structures are based on Adams' balancing algorithm which contains a bug -- at least in its proof. (Perhaps it is correct, but I don't know anyone that knows if.) Instead of fixing the bug I have re-derived the algorithm from the invariant together with a (now hopefully correct) proof. The new algorithm is contained in Dessy: http://www.stud.tu-ilmenau.de/~robertw/dessy/fun/ more specifically in the file: http://www.stud.tu-ilmenau.de/~robertw/dessy/fun/WeightedTrees.hs The derivation, proof (and hint to Adam's bug) are in: http://www.stud.tu-ilmenau.de/~robertw/dessy/fun/democratic/sequence.hs (The so-called hedge-union, -intersection, and -difference is also implemented: function 'x_merge_width' does all three: http://www.stud.tu-ilmenau.de/~robertw/dessy/fun/Dessy.hs ) A paper which presents the new (much simpler) algorithm and its proof putting it in scientific context is in preparation. Users that prefer proven algorithms over tested ones are free to use my implementation instead of the existing ones. (At moment, there is no proven and tested implementation. Of course this will change soon.) As for the maintenance of Data.FiniteMap and DData.Map, I don't know whether it is worth "fixing" the implementations or one would better replace them by a completely new design. Anyway, algorithms have no copyright and everyone is free to use the new "weight-balancer according to Will" to implement his own data structures. Oh, by the way, some of the other balancers in Dessy (or DData) might be more efficient than weight-based balancing anyway, I'll make some measurements later this year. good time, Bob

--- Robert Will
hi,
both of the data structures are based on Adams' balancing algorithm which contains a bug -- at least in its proof. (Perhaps it is correct, but I don't know anyone that knows if.)
Can you derive a counter-example from the demonstration bug? The current algorithm looks good enough for me, and I am reluctant to change the implementation. I'd let Daan choose the course of action with this respect. As a side note, I've re-implemented the "SpellCheck" test with DData, and it behaves ok. Besides, using "Set" instead of "Map" improves performance by 10% or so. Cheers, JP. __________________________________ Do you Yahoo!? Yahoo! Finance Tax Center - File online. File on time. http://taxes.yahoo.com/filing.html

On Mon, Mar 22, 2004 at 05:41:52AM -0800, JP Bernardy wrote:
Can you derive a counter-example from the demonstration bug?
Counter-example of what? The claim is that there's a bug in the proof that concatenating trees (the concat3 function in Adams' paper) is efficient; do you want a pair of trees that take a long time to concatenate with that algorithm, or do you want counterexamples to the claim? To tell the truth, after looking at the tech report ( http://www.swiss.ai.mit.edu/~adams/BB/92-10.ps.gz ), I don't see a proof that concat3 is efficient, so I don't see how there can be a bug in the (missing) proof. Robert Will, can you clarify which claim in the paper is faulty? Peace, Dylan Thurston

hi, On Mon, 22 Mar 2004, JP Bernardy wrote:
--- Robert Will
wrote: both of the data structures are based on Adams' balancing algorithm which contains a bug -- at least in its proof. (Perhaps it is correct, but I don't know anyone that knows if.)
Can you derive a counter-example from the demonstration bug?
No, since obviously otherwise I *would* know that the algorithm is wrong, but as I said, I don't and .... I don't know anyone that knows if. If you want to know, you can add a smart constructor that checks the balancing invariant. (E.g. in 'bin' which should then be called everywhere in place of 'Bin'.) If this assertion never fails, there's probably no bug. To prove the correctness of the algorithm, by the way.... it suffices to give a precondition for 'balance' (T' in Adams' paper) and prove that 'merge' (Adams' 'concat3') observes that precondition. That's the bug in Adams' proof. Adams proved 'balanced' correct for the precondition that that the balancing invariant is violated by at most one. (size left <= size right + 1 and v.v.) This is observed by any other call to 'balance', but not (?) from 'merge'.
The current algorithm looks good enough for me,
Yeah, in DData it looks better than FiniteMap where they just disabled the failing assertions. Oh, perhaps the algorithm is wrong with respect to the simple balancing invariant, but correct with respect to some more complicated (amortised?) invariant. That would explain Adams' failure (and my reluctance) to proof it _and_ its good behaviour in practice.
As a side note, I've re-implemented the "SpellCheck" test with DData, and it behaves ok.
As it does with many other applications... Robert

On Fri, 19 Mar 2004 09:00:22 +0100, Robert Will
both of the data structures are based on Adams' balancing algorithm which contains a bug -- at least in its proof. (Perhaps it is correct, but I don't know anyone that knows if.)
I have been redoing proofs of Adam's algorithms too some years ago when writing DData -- there were some errors but I think that calling it "bugged" is rather harsh :-) Anyway, I think you are referring to two separate cases (as far as I can remember though): 1) Given a constant "delta" (the maximal relative size differences between any two nodes in the tree) and the "ratio" (the ratio between the outer and inner sibling of the heavier subtree in a node that needs rebalancing). "delta" is "w" in Adam's paper, and "ratio" the inverse of alpha. Now, we can prove that delta must be larger than 4.65 with a ratio of 2, and we can prove that we need at least a ratio of 2 to maintain the balance invariants. In his paper though Adams uses an incorrect ratio of 1. However, both Data.FiniteMap and DData.Map use the correct ratio. 2) For "join", and "merge", the proof brakes down too, but I remember that it suffices to replace Adam's "<" comparisons with "<=" comparisons. This is why the balancing assertion checks in "Data.FiniteMap" are turned off (as they fail), but this relatively easy to fix. (DData.Map uses the correct comparison and works with assertion checking on). Unfortunately, I couldn't find my notes on this issue anymore, but I think that when you re-run the proofs with "<=", it will work out correctly. In general, I would like to stress that one tends to overrate the usefullness of balancing -- Balancing is automatic for random data and balancing is only needed to avoid some pathological worst-cases. Almost any choice of balancing scheme will do. This is why the Data.FiniteMap structure has proven its use in practice even when sometimes producing slightly unbalanced trees. All the best, Daan.
Instead of fixing the bug I have re-derived the algorithm from the invariant together with a (now hopefully correct) proof. The new algorithm is contained in Dessy: http://www.stud.tu-ilmenau.de/~robertw/dessy/fun/ more specifically in the file: http://www.stud.tu-ilmenau.de/~robertw/dessy/fun/WeightedTrees.hs
The derivation, proof (and hint to Adam's bug) are in: http://www.stud.tu-ilmenau.de/~robertw/dessy/fun/democratic/sequence.hs
(The so-called hedge-union, -intersection, and -difference is also implemented: function 'x_merge_width' does all three: http://www.stud.tu-ilmenau.de/~robertw/dessy/fun/Dessy.hs )
A paper which presents the new (much simpler) algorithm and its proof putting it in scientific context is in preparation.
Users that prefer proven algorithms over tested ones are free to use my implementation instead of the existing ones. (At moment, there is no proven and tested implementation. Of course this will change soon.)
As for the maintenance of Data.FiniteMap and DData.Map, I don't know whether it is worth "fixing" the implementations or one would better replace them by a completely new design. Anyway, algorithms have no copyright and everyone is free to use the new "weight-balancer according to Will" to implement his own data structures. Oh, by the way, some of the other balancers in Dessy (or DData) might be more efficient than weight-based balancing anyway, I'll make some measurements later this year.
good time, Bob _______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

On Tue, 23 Mar 2004, Daan Leijen wrote:
1) However, both Data.FiniteMap and DData.Map use the correct ratio.
I have seen this stated in your code, but that's not the bug that I have described. (Message attached again.)
2) replace Adam's "<" comparisons with "<=" comparisons.
That's not the "my" bug, neither.
I have been redoing proofs of Adam's algorithms too some years ago when writing DData -- there were some errors but I think that calling it "bugged" is rather harsh :-)
The precondition-problem is not easy to fix. However, I ackknowledge that your balancing-checks are really good evidence for the correctness of the algorithm. (I didn't notice them first, because you checked via an external function, not in a smart constructor. And I looked only there.) Well, having mathematically unproven code is a common thing and should indeed be no problem. It's just that I personnally prefer algorithms that I understand, i.e. I know why they are correct. For Adam's algorithm this is not the case, and I think other people would have serious problems, too, when explaining "under what (pre)condition does 'balance' produce a correctly balanced tree?" and "why does 'join' always observe that precondition?".
In general, I would like to stress that one tends to overrate the usefullness of balancing -- Balancing is automatic for random data and balancing is only needed to avoid some pathological worst-cases. Almost any choice of balancing scheme will do.
Right, and I also see worst-case bounds rather as some kind of assuration, not a direct efficiency booster. However, worst-case bounds also allow the use of simple operations in unforeseen situations, they make them more generally useable. For example (although ordering is not balancing), one usually provides a separate "fromOrderedList" operation to create ordered structures. Algorithms with good worst-case performance can do that job, too, without special functions (also easier for the user), and even more, they work on "almost ordered" lists, too. In general, when using unbalanced collections (and structures without good worst-case bounds in general), one has to be _more careful_. Robert On Tue, 23 Mar 2004, Robert Will wrote:
To prove the correctness of the algorithm, by the way.... it suffices to give a precondition for 'balance' (T' in Adams' paper) and prove that 'merge' (Adams' 'concat3') observes that precondition. That's the bug in Adams' proof. Adams proved 'balanced' correct for the precondition that that the balancing invariant is violated by at most one. (size left <= size right + 1 and v.v.) This is observed by any other call to 'balance', but not (?) from 'merge'.
The current algorithm looks good enough for me,
Yeah, in DData it looks better than FiniteMap where they just disabled the failing assertions. Oh, perhaps the algorithm is wrong with respect to the simple balancing invariant, but correct with respect to some more complicated (amortised?) invariant. That would explain Adams' failure (and my reluctance) to proof it _and_ its good behaviour in practice.
As a side note, I've re-implemented the "SpellCheck" test with DData, and it behaves ok.
As it does with many other applications...

On Fri, 26 Mar 2004 09:46:53 +0100, Robert Will
The precondition-problem is not easy to fix. However, I ackknowledge that your balancing-checks are really good evidence for the correctness of the algorithm. (I didn't notice them first, because you checked via an external function, not in a smart constructor. And I looked only there.)
Well, having mathematically unproven code is a common thing and should indeed be no problem. It's just that I personnally prefer algorithms that I understand, i.e. I know why they are correct. For Adam's algorithm this is not the case, and I think other people would have serious problems, too, when explaining "under what (pre)condition does 'balance' produce a correctly balanced tree?" and "why does 'join' always observe that precondition?".
I agree with this and I will take a close look at your proofs. It would be good to flesh out the true pre-conditions to Adams' "join" operation and show that these are fullfilled in DData/Data.FiniteMap. All the best, Daan.
participants (4)
-
Daan Leijen
-
dpt@lotus.bostoncoop.net
-
JP Bernardy
-
Robert Will