ONeillPrimes.hs - priority queue broken?

Hi, I've recently tried to use the priority queue from the ONeillPrimes.hs, which is famous for being a very fast prime generator: actually, I translated the code to Scheme and dropped the values, to end up with a key-only heap implementation. However, the code didn't work quite well, and I decided to check the haskell code itself. Turns out that it is broken. module PQ where import Test.QuickCheck data PriorityQ k v = Lf | Br {-# UNPACK #-} !k v !(PriorityQ k v) !(PriorityQ k v) deriving (Eq, Ord, Read, Show) emptyPQ :: PriorityQ k v emptyPQ = Lf isEmptyPQ :: PriorityQ k v -> Bool isEmptyPQ Lf = True isEmptyPQ _ = False minKeyValuePQ :: PriorityQ k v -> (k, v) minKeyValuePQ (Br k v _ _) = (k,v) minKeyValuePQ _ = error "Empty heap!" minKeyPQ :: PriorityQ k v -> k minKeyPQ (Br k v _ _) = k minKeyPQ _ = error "Empty heap!" minValuePQ :: PriorityQ k v -> v minValuePQ (Br k v _ _) = v minValuePQ _ = error "Empty heap!" insertPQ :: Ord k => k -> v -> PriorityQ k v -> PriorityQ k v insertPQ wk wv (Br vk vv t1 t2) | wk <= vk = Br wk wv (insertPQ vk vv t2) t1 | otherwise = Br vk vv (insertPQ wk wv t2) t1 insertPQ wk wv Lf = Br wk wv Lf Lf siftdown :: Ord k => k -> v -> PriorityQ k v -> PriorityQ k v -> PriorityQ k v siftdown wk wv Lf _ = Br wk wv Lf Lf siftdown wk wv (t @ (Br vk vv _ _)) Lf | wk <= vk = Br wk wv t Lf | otherwise = Br vk vv (Br wk wv Lf Lf) Lf siftdown wk wv (t1 @ (Br vk1 vv1 p1 q1)) (t2 @ (Br vk2 vv2 p2 q2)) | wk <= vk1 && wk <= vk2 = Br wk wv t1 t2 | vk1 <= vk2 = Br vk1 vv1 (siftdown wk wv p1 q1) t2 | otherwise = Br vk2 vv2 t1 (siftdown wk wv p2 q2) deleteMinAndInsertPQ :: Ord k => k -> v -> PriorityQ k v -> PriorityQ k v deleteMinAndInsertPQ wk wv Lf = error "Empty PriorityQ" deleteMinAndInsertPQ wk wv (Br _ _ t1 t2) = siftdown wk wv t1 t2 leftrem :: PriorityQ k v -> (k, v, PriorityQ k v) leftrem (Br vk vv Lf Lf) = (vk, vv, Lf) leftrem (Br vk vv t1 t2) = (wk, wv, Br vk vv t t2) where (wk, wv, t) = leftrem t1 leftrem _ = error "Empty heap!" deleteMinPQ :: Ord k => PriorityQ k v -> PriorityQ k v deleteMinPQ (Br vk vv Lf _) = Lf deleteMinPQ (Br vk vv t1 t2) = siftdown wk wv t2 t where (wk,wv,t) = leftrem t1 deleteMinPQ _ = error "Empty heap!" toDescList :: Ord k => PriorityQ k v -> [(k,v)] toDescList q | isEmptyPQ q = [] | otherwise = (minKeyValuePQ q) : toDescList (deleteMinPQ q) fromList :: Ord k => [(k,v)] -> PriorityQ k v fromList = foldr (uncurry insertPQ) emptyPQ Here goes a test: *PQ> let s = map fst . toDescList . fromList . (`zip` (repeat ())) :: [Int]->[Int] *PQ> s [4,3,1,2] [1,2,3,4] Looks fine. *PQ> s [3,1,4,1,5,9,2,6,5,3,5,8] [1,1,2*** Exception: Empty heap! OK, probably it doesn't like duplicates. *PQ> s [3,1,4,5,9,2,6,8,10] [1,2,3,4,5,9,10] Whoops, 6 and 8 are lost. So, the morale is: don't use the priority queue from ONeillPrimes in your projects. It works for primes by a lucky chance. I haven't yet figured out, however, what exactly the bug is. -- Eugene Kirpichov Web IR developer, market.yandex.ru

Am Dienstag, 24. Februar 2009 19:16 schrieb Eugene Kirpichov:
Hi, I've recently tried to use the priority queue from the ONeillPrimes.hs, which is famous for being a very fast prime generator: actually, I translated the code to Scheme and dropped the values, to end up with a key-only heap implementation. However, the code didn't work quite well, and I decided to check the haskell code itself.
Turns out that it is broken.
Indeed.
module PQ where
import Test.QuickCheck
data PriorityQ k v = Lf
| Br {-# UNPACK #-} !k v !(PriorityQ k v) !(PriorityQ k | v)
deriving (Eq, Ord, Read, Show)
emptyPQ :: PriorityQ k v emptyPQ = Lf
isEmptyPQ :: PriorityQ k v -> Bool isEmptyPQ Lf = True isEmptyPQ _ = False
minKeyValuePQ :: PriorityQ k v -> (k, v) minKeyValuePQ (Br k v _ _) = (k,v) minKeyValuePQ _ = error "Empty heap!"
minKeyPQ :: PriorityQ k v -> k minKeyPQ (Br k v _ _) = k minKeyPQ _ = error "Empty heap!"
minValuePQ :: PriorityQ k v -> v minValuePQ (Br k v _ _) = v minValuePQ _ = error "Empty heap!"
insertPQ :: Ord k => k -> v -> PriorityQ k v -> PriorityQ k v insertPQ wk wv (Br vk vv t1 t2)
| wk <= vk = Br wk wv (insertPQ vk vv t2) t1 | otherwise = Br vk vv (insertPQ wk wv t2) t1
insertPQ wk wv Lf = Br wk wv Lf Lf
siftdown :: Ord k => k -> v -> PriorityQ k v -> PriorityQ k v -> PriorityQ k v siftdown wk wv Lf _ = Br wk wv Lf Lf siftdown wk wv (t @ (Br vk vv _ _)) Lf
| wk <= vk = Br wk wv t Lf | otherwise = Br vk vv (Br wk wv Lf Lf) Lf
siftdown wk wv (t1 @ (Br vk1 vv1 p1 q1)) (t2 @ (Br vk2 vv2 p2 q2))
| wk <= vk1 && wk <= vk2 = Br wk wv t1 t2 | vk1 <= vk2 = Br vk1 vv1 (siftdown wk wv p1 q1) t2 | otherwise = Br vk2 vv2 t1 (siftdown wk wv p2 q2)
deleteMinAndInsertPQ :: Ord k => k -> v -> PriorityQ k v -> PriorityQ k v deleteMinAndInsertPQ wk wv Lf = error "Empty PriorityQ" deleteMinAndInsertPQ wk wv (Br _ _ t1 t2) = siftdown wk wv t1 t2
leftrem :: PriorityQ k v -> (k, v, PriorityQ k v) leftrem (Br vk vv Lf Lf) = (vk, vv, Lf) leftrem (Br vk vv t1 t2) = (wk, wv, Br vk vv t t2) where (wk, wv, t) = leftrem t1 leftrem _ = error "Empty heap!"
deleteMinPQ :: Ord k => PriorityQ k v -> PriorityQ k v deleteMinPQ (Br vk vv Lf _) = Lf deleteMinPQ (Br vk vv t1 t2) = siftdown wk wv t2 t where (wk,wv,t) = leftrem t1 deleteMinPQ _ = error "Empty heap!"
toDescList :: Ord k => PriorityQ k v -> [(k,v)] toDescList q | isEmptyPQ q = []
| otherwise = (minKeyValuePQ q) : toDescList (deleteMinPQ q)
fromList :: Ord k => [(k,v)] -> PriorityQ k v fromList = foldr (uncurry insertPQ) emptyPQ
Here goes a test:
*PQ> let s = map fst . toDescList . fromList . (`zip` (repeat ())) :: [Int]->[Int] *PQ> s [4,3,1,2] [1,2,3,4]
Looks fine.
*PQ> s [3,1,4,1,5,9,2,6,5,3,5,8] [1,1,2*** Exception: Empty heap!
OK, probably it doesn't like duplicates.
That is not the problem.
*PQ> s [3,1,4,5,9,2,6,8,10] [1,2,3,4,5,9,10]
Whoops, 6 and 8 are lost.
So, the morale is: don't use the priority queue from ONeillPrimes in your projects. It works for primes by a lucky chance.
I haven't yet figured out, however, what exactly the bug is.
The problem is that deleteMinPQ and siftdown assume that if the left subqueue is empty then so is the right, but that assumption is sometimes wrong: *PQ> fromList [(k,k) | k <- [1 .. 7]] Br 1 1 (Br 2 2 (Br 4 4 Lf Lf) (Br 6 6 Lf Lf)) (Br 3 3 (Br 5 5 Lf Lf) (Br 7 7 Lf Lf)) *PQ> deleteMinPQ it Br 2 2 (Br 3 3 (Br 5 5 Lf Lf) (Br 7 7 Lf Lf)) (Br 4 4 Lf Lf) *PQ> deleteMinPQ it Br 3 3 (Br 4 4 Lf Lf) (Br 5 5 Lf Lf) *PQ> deleteMinPQ it Br 4 4 (Br 5 5 Lf Lf) Lf *PQ> deleteMinPQ it Br 5 5 Lf Lf *PQ> deleteMinPQ it Lf

Eugene Kirpichov wrote:
Hi, I've recently tried to use the priority queue from the ONeillPrimes.hs, which is famous for being a very fast prime generator: actually, I translated the code to Scheme and dropped the values, to end up with a key-only heap implementation. However, the code didn't work quite well, and I decided to check the haskell code itself.
Turns out that it is broken.
module PQ where
import Test.QuickCheck
data PriorityQ k v = Lf | Br {-# UNPACK #-} !k v !(PriorityQ k v) !(PriorityQ k v) deriving (Eq, Ord, Read, Show)
Let size Lf = 0 size (Br _ _ l r) = 1 + sizePQ l + sizePQ r be the size of the priority queue. To work, the code maintains heap order and the invariant that the left subtree is at least as large as the right one, and at most one element larger. validSize Lf = True validSize (Br _ _ l r) = validSize l && validSize r && 0 <= d && d <= 1 where d = size l - size r This invariant justifies the assumption that Daniel Fischer pointed out. The code is careful to maintain this invariant, but it is broken in one place:
leftrem :: PriorityQ k v -> (k, v, PriorityQ k v) leftrem (Br vk vv Lf Lf) = (vk, vv, Lf)
(Why not this?) leftrem (Br vk vv Lf _) = (vk, vv, Lf)
leftrem (Br vk vv t1 t2) = (wk, wv, Br vk vv t t2) where (wk, wv, t) = leftrem t1
Here, the left subtree is replaced by one that is one element smaller. This breaks the invariant if the two original subtrees had equal size. The bug is easy to fix; just swap the two subtrees on the right side: leftrem (Br vk vv t1 t2) = (wk, wv, Br vk vv t2 t) where (wk, wv, t) = leftrem t1
leftrem _ = error "Empty heap!" *PQ> s [3,1,4,1,5,9,2,6,5,3,5,8] [1,1,2*** Exception: Empty heap!
*PQ> s [3,1,4,1,5,9,2,6,5,3,5,8] [1,1,2,3,3,4,5,5,5,6,8,9] HTH, Bertram

Eugene Kirpichov wrote:
module PQ where
import Test.QuickCheck
data PriorityQ k v = Lf | Br {-# UNPACK #-} !k v !(PriorityQ k v) !(PriorityQ k v) deriving (Eq, Ord, Read, Show)
For the record, we can exploit the invariant that the sizes of the left and right subtrees have difference 0 or 1 to implement 'size' in better than O(n) time, where n is the size of the heap: -- Return number of elements in the priority queue. -- /O(log(n)^2)/ size :: PriorityQ k v -> Int size Lf = 0 size (Br _ _ t1 t2) = 2*n + rest n t1 t2 where n = size t2 -- rest n p q, where n = size q, and size p - size q = 0 or 1 -- returns 1 + size p - size q. rest :: Int -> PriorityQ k v -> PriorityQ k v -> Int rest 0 Lf _ = 1 rest 0 _ _ = 2 rest n (Br _ _ p1 p2) (Br _ _ q1 q2) = case r of 0 -> rest d p1 q1 -- subtree sizes: (d or d+1), d; d, d 1 -> rest d p2 q2 -- subtree sizes: d+1, (d or d+1); d+1, d where (d, r) = (n-1) `quotRem` 2 Of course we can reduce the cost to O(1) by annotating the heap with its size, but that is less interesting, and incurs a little overhead in the other heap operations. Bertram
participants (3)
-
Bertram Felgenhauer
-
Daniel Fischer
-
Eugene Kirpichov