Re: Bug in the proof of Data.FiniteMap and DData.Map

23 Mar 2004

      On Fri, 19 Mar 2004 09:00:22 +0100, Robert Will  wrote:
...
both of the data structures are based on Adams' balancing algorithm
which contains a bug -- at least in its proof.  (Perhaps it is
correct, but I don't know anyone that knows if.)
I have been redoing proofs of Adam's algorithms too some years ago
when writing DData -- there were some errors but I think that calling
it "bugged" is rather harsh :-) Anyway, I think you are referring to
two separate cases (as far as I can remember though):

1) Given a constant "delta" (the maximal relative size differences between any
two nodes in the tree) and the "ratio" (the ratio between the outer and inner
sibling of the heavier subtree in a node that needs rebalancing). "delta" is
"w" in Adam's paper, and "ratio" the inverse of alpha. Now, we can prove that
delta must be larger than 4.65 with a ratio of 2, and we can prove that we
need at least a ratio of 2 to maintain the balance invariants. In his paper
though Adams uses an incorrect ratio of 1. However, both Data.FiniteMap and DData.Map
use the correct ratio.

2) For "join", and "merge", the proof brakes down too, but I remember that it suffices
to replace Adam's "<" comparisons with "<=" comparisons. This is why the balancing
assertion checks in "Data.FiniteMap" are turned off (as they fail), but this
relatively easy to fix. (DData.Map uses the correct comparison and works with
assertion checking on). Unfortunately, I couldn't find my notes on this issue
anymore, but I think that when you re-run the proofs with "<=", it will work out
correctly.

In general, I would like to stress that one tends to overrate the usefullness
of balancing -- Balancing is automatic for random data and balancing is only
needed to avoid some pathological worst-cases. Almost any choice of balancing
scheme will do. This is why the Data.FiniteMap structure has proven its use in
practice even when sometimes producing slightly unbalanced trees.

All the best,
  Daan.
...
Instead of fixing the bug I have re-derived the algorithm from the
invariant together with a (now hopefully correct) proof.  The new
algorithm is contained in Dessy:
    http://www.stud.tu-ilmenau.de/~robertw/dessy/fun/
more specifically in the file:
    http://www.stud.tu-ilmenau.de/~robertw/dessy/fun/WeightedTrees.hs
The derivation, proof (and hint to Adam's bug) are in:
    http://www.stud.tu-ilmenau.de/~robertw/dessy/fun/democratic/sequence.hs
(The so-called hedge-union, -intersection, and -difference is also
implemented: function 'x_merge_width' does all three:
    http://www.stud.tu-ilmenau.de/~robertw/dessy/fun/Dessy.hs )
A paper which presents the new (much simpler) algorithm and its proof
putting it in scientific context is in preparation.
Users that prefer proven algorithms over tested ones are free to use
my implementation instead of the existing ones.  (At moment, there is
no proven and tested implementation.  Of course this will change
soon.)
As for the maintenance of Data.FiniteMap and DData.Map, I don't know
whether it is worth "fixing" the implementations or one would better
replace them by a completely new design.  Anyway, algorithms have no
copyright and everyone is free to use the new "weight-balancer
according to Will" to implement his own data structures.  Oh, by the
way, some of the other balancers in Dessy (or DData) might be more
efficient than weight-based balancing anyway, I'll make some
measurements later this year.
good time,
Bob
_______________________________________________
Libraries mailing list
Libraries@haskell.org
http://www.haskell.org/mailman/listinfo/libraries

Re: Bug in the proof of Data.FiniteMap and DData.Map

Daan Leijen