
Hi all! Taylor Campbell found a bug in Data.Map but wasn't able to post to this mailing list so I'm submitting this on his behalf. ---------------------------- Deletion in Data.Map does not always preserve the balance of a tree. Deletion in Data.Set appears to preserve the balance in some cases where Data.Map does not; I don't know whether Data.Set always leaves the tree balanced. I have attached two files to test the balance of trees after a random sequence of insertions and a pathological sequence of deletions (deleteMin -- although this is not pathological for some applications such as priority queues). TestMap.hs reliably demonstrates for me that deleteMin readily yields an unbalanced tree, if I supply a size of 10000 to test_map. TestSet.hs does not. The salient difference between Data.Set and Data.Map, I believe, is the choice of delta (w in Adams' paper), which is 4 in Data.Set and 5 in Data.Map. Since Adams' paper claims that when alpha (i.e. 1/ratio) is 1/2, w must exceed about 4.646, this seems suspect. But I haven't analyzed the reasoning in Adams' paper to find whether his conclusions are wrong. What I do know is that I can break trees when w is 5, but I have not been able to break them when w is 4. I also noticed a comment in Set.hs saying that a value of 4 for w improved performance on many operations. I have also not been able to break trees if I change deletion to use join rather than balance. But that slows everything down a good deal more, I believe, than simply changing w from 5 to 4.

On Tue, Aug 03, 2010 at 10:59:45PM +0200, Daniel Peebles wrote:
Deletion in Data.Map does not always preserve the balance of a tree.
I've filed a ticket here: http://hackage.haskell.org/trac/ghc/ticket/4242 It would be useful if people familiar with the library could comment as to whether setting delta to 4 would be the right fix. Thanks Ian

Even the original paper (1973) describing trees of bounded balance had
a bug! I think it is fair to describe maintaining balance in BBSTs as
"subtle".
Some ideas that might help:
1. http://www.brics.dk/BRICS/ALCOM-FT/TR/ALCOMFT-TR-01-167.html
2. This seems like a place where calling in a theorem prover might be
appropriate:
(a) The operations are tricky
(b) The data structure is very important
(c) Coq has done similar work before:
http://www.lri.fr/~filliatr/fsets/ (The performance suffered, but I
think all of that can be alleviated with Coq's updated extraction
mechanism. Pragmas necessary for performance would still need to be
inserted after extraction.)
On Tue, Aug 3, 2010 at 4:59 PM, Daniel Peebles
Deletion in Data.Map does not always preserve the balance of a tree.

Hi, As some of you know, I am also working on patches for the containers. I actually know about two serious mistakes in the Adams' paper, so the first think I did was set the delta in Map to four. I am able to proof that for delta >= 5 the construction fails. I also think I am able to proof that for delta = 4 the construction works. This does not immediately disqualify Data.Map, because, as in the original paper, the rebalancing should be done after the balance is broken. But see what the Data.Map (and also Data.Set) do: balance :: k -> a -> Map k a -> Map k a -> Map k a balance k x l r | sizeL + sizeR <= 1 = Bin sizeX k x l r | sizeR >= delta*sizeL = rotateL k x l r (4) | sizeL >= delta*sizeR = rotateR k x l r (5) | otherwise = Bin sizeX k x l r where sizeL = size l sizeR = size r sizeX = sizeL + sizeR + 1 On the lines (4) and (5), you can see the rebalancing is done even in the case of sizeR >= delta*sizeL. If the rebalancing is done in the case of broken balance, ie. in sizeR > delta*sizeL, there is a simple counterexample (you will need fixed-width font): 2 (A) 1 5 4 6 3 7 and delete 1. If the rebalancing is done as it is in Data.Map, the subtree in (A) _should_ never be formed, because it _should_ be balanced away when created (one subtree is delta*size of the other). But consider this example (found using the TestMap.hs, I did not know about it): 2 1 8 0 5 9 4 6 10 11 3 7 This is balanced. When deleting 0, we get 8 2 9 1 5 10 11 4 6 3 7 which contains the subtree (A), so deleting 1 once again breaks the balance. If there is interest, I can formulate the proof for ratio = 2 and delta = 4, which is used in Data.Set (and soon in Data.Map). I thought it is not enough to publish (just to correct an error in someone else's paper), so I did not bother doing it yet. Cheers, Milan Straka
Hi all! Taylor Campbell found a bug in Data.Map but wasn't able to post to this mailing list so I'm submitting this on his behalf.
----------------------------
Deletion in Data.Map does not always preserve the balance of a tree. Deletion in Data.Set appears to preserve the balance in some cases where Data.Map does not; I don't know whether Data.Set always leaves the tree balanced. I have attached two files to test the balance of trees after a random sequence of insertions and a pathological sequence of deletions (deleteMin -- although this is not pathological for some applications such as priority queues). TestMap.hs reliably demonstrates for me that deleteMin readily yields an unbalanced tree, if I supply a size of 10000 to test_map. TestSet.hs does not. The salient difference between Data.Set and Data.Map, I believe, is the choice of delta (w in Adams' paper), which is 4 in Data.Set and 5 in Data.Map.
Since Adams' paper claims that when alpha (i.e. 1/ratio) is 1/2, w must exceed about 4.646, this seems suspect. But I haven't analyzed the reasoning in Adams' paper to find whether his conclusions are wrong. What I do know is that I can break trees when w is 5, but I have not been able to break them when w is 4. I also noticed a comment in Set.hs saying that a value of 4 for w improved performance on many operations.
I have also not been able to break trees if I change deletion to use join rather than balance. But that slows everything down a good deal more, I believe, than simply changing w from 5 to 4.
_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

On August 4, 2010 05:33:15 Milan Straka wrote:
I am able to proof that for delta >= 5 the construction fails. I also think I am able to proof that for delta = 4 the construction works.
I guess it depends on what value you are using for the ratio (1/alpha in Adams paper), but if you are using the standard ratio = 2 (alpha = 1/2), then delta = 4 (w = 4 in Adams paper) violates the alpha >= (2w+1)/(w^2-1) #3 single rotation constraint (page 27). It is actually a simplification of the full constraint, however, which was alpha >= (wn+w+n)/(n(w^2-1)). The key is then to realize that this is only violated when n = 1 for alpha = 1/2 and w = 4, and that this cannot actually occur in his Fig 6. diagram. It would require a right-right tree with 1/2 items. For n = 2 and above you are okay, so it would seem at least single rotations are guaranteed to work. Actually, I was surprised to hear about the w = 5 case as when I last implemented this (for a different language) I went over all the math (due to the comment in the Haskell code). Despite not agreeing with all his double rotation simplifications, I also got that alpha = 1/2 and w = 5 were okay. I'll have to take a closer look when I get more time. Cheers! -Tyson

On August 4, 2010 05:33:15 Milan Straka wrote:
If the rebalancing is done in the case of broken balance, ie. in sizeR > delta*sizeL, there is a simple counterexample (you will need fixed-width font): 2 (A) 1 5 4 6 3 7
and delete 1.
Urk. I see where the problem is. The first equation on page 27 gives the state of things once they are out of balance in conjunction with Fig 6. n + alpha n + 1 = wx + 1 In other words, an element has been added to the right side (containing n + alpha n + 1) items so it now contains one more item than w times the x items on the left side. The equalities and such that follow ensure a re-balance will always fix this situation. What they don't show, however, is what happens if things get out of balance in by deleting an item from the left-hand side. In this case, the right side can windup exceed the left by anywhere up to w times (i.e., as in your example). The required fix is to redo all the inequalities using n + alpha n + 1 = wx + e, for e = 1, 2, ..., w (in most cases only e = w will also need checking). Cheers! -Tyson PS: In short, unless I'm missing something, Adam only gave a proof/conditions for insertion. The deletion case still needs to be done.

Hi,
On August 4, 2010 05:33:15 Milan Straka wrote:
If the rebalancing is done in the case of broken balance, ie. in sizeR > delta*sizeL, there is a simple counterexample (you will need fixed-width font): 2 (A) 1 5 4 6 3 7
and delete 1.
Urk. I see where the problem is. The first equation on page 27 gives the state of things once they are out of balance in conjunction with Fig 6.
n + alpha n + 1 = wx + 1
In other words, an element has been added to the right side (containing n + alpha n + 1) items so it now contains one more item than w times the x items on the left side. The equalities and such that follow ensure a re-balance will always fix this situation.
What they don't show, however, is what happens if things get out of balance in by deleting an item from the left-hand side. In this case, the right side can windup exceed the left by anywhere up to w times (i.e., as in your example). The required fix is to redo all the inequalities using
n + alpha n + 1 = wx + e,
for e = 1, 2, ..., w (in most cases only e = w will also need checking).
Cheers! -Tyson
PS: In short, unless I'm missing something, Adam only gave a proof/conditions for insertion. The deletion case still needs to be done.
That is exactly the case. Actually, it is enough to consider deletions only, when you write down the inequalities. Another thing is that you have to solve several cases manually for small trees (a bit more than with insertion), because for n=1 (and maybe for n=2, I do not recall) some equations are not satisfied, but only for non-integral numbers of vertices, which cannot happen. I will write it down and post it somewhere. Cheers, Milan
participants (5)
-
Daniel Peebles
-
Ian Lynagh
-
Jim Apple
-
Milan Straka
-
Tyson Whitehead