
Leon Smith wrote:
With the urging and assistance of Omar AntolĂn Camarena, I will be adding two functions to data-ordlist: mergeAll and unionAll, which merge (or union) a potentially infinite list of potentially infinite ordered lists, under the assumption that the heads of the non-empty lists appear in a non-decreasing sequence.
However, as Omar pointed out to me, the following implementation of unionAll has a flaw:
unionAll :: Ord a => [[a]] -> [a] unionAll = foldr (\(x:xs) ys -> x : union xs ys) []
Namely unionAll [[1,2],[1,2]] should return [1,2], whereas it actually returns [1,1,2]. After some work, I believe I have generalized H. Apfelmus's algorithm to handle this; however it seems a bit complicated. I would love feedback, especially with regard to simplifications, bugs, testing strategies, and optimizations:
unionAll' :: Ord a => [[a]] -> [a] unionAll' = unionAllBy compare
data People a = VIP a (People a) | Crowd [a]
unionAllBy :: (a -> a -> Ordering) -> [[a]] -> [a] unionAllBy cmp xss = loop [ (VIP x (Crowd xs)) | (x:xs) <- xss ] where loop [] = [] loop ( VIP x xs : VIP y ys : xss ) = case cmp x y of LT -> x : loop ( xs : VIP y ys : xss ) EQ -> loop ( VIP x (union' xs ys) : unionPairs xss ) GT -> error "Data.List.Ordered.unionAll: assumption violated!" loop ( VIP x xs : xss ) = x : loop (xs:xss) loop [Crowd xs] = xs loop (xs:xss) = loop (unionPairs (xs:xss))
unionPairs [] = [] unionPairs [x] = [x] unionPairs (x:y:zs) = union' x y : unionPairs zs
union' (VIP x xs) (VIP y ys) = case cmp x y of LT -> VIP x (union' xs (VIP y ys)) EQ -> VIP x (union' xs ys) GT -> error "Data.List.Ordered.unionAll: assumption violated!" union' (VIP x xs) (Crowd ys) = VIP x (union' xs (Crowd ys)) union' (Crowd []) ys = ys union' (Crowd xs) (Crowd ys) = Crowd (unionBy cmp xs ys) union' xs@(Crowd (x:xt)) ys@(VIP y yt) = case cmp x y of LT -> VIP x (union' (Crowd xt) ys) EQ -> VIP x (union' (Crowd xt) yt) GT -> VIP y (union' xs yt)
I see no obvious deficiencies. :) Personally, I'd probably structure it like http://www.haskell.org/haskellwiki/Prime_numbers#Implicit_Heap so that your code becomes unionAll = serve . foldTree union' . map vip Your loop function is a strange melange of many different concerns (building a tree, union', adding and removing the VIP constructors). Note that it's currently unclear to me whether the lazy pattern match in pairs ~(x: ~(y:ys)) = f x y : pairs ys is beneficial or not; you used a strict one unionPairs (x:y:zs) = union' x y : unionPairs zs Daniel Fischer's experiments suggest that the strict one is better http://www.mail-archive.com/haskell-cafe@haskell.org/msg69807.html If you're really concerned about time & space usage, it might even be worth to abandon the lazy tree altogether and use a heap to achieve the same effect, similar to Melissa O'Neils prime number code. It's not as "neat", but much more predictable. :) Regards, Heinrich Apfelmus -- http://apfelmus.nfshost.com