
With the urging and assistance of Omar AntolĂn Camarena, I will be adding two functions to data-ordlist: mergeAll and unionAll, which merge (or union) a potentially infinite list of potentially infinite ordered lists, under the assumption that the heads of the non-empty lists appear in a non-decreasing sequence. Union takes two sorted lists and produces a new sorted list; an element occurs in the result as many times as the maximum number of occurrences in either list. The unionAll function generalizes this behavior to an infinite number of lists. A reasonable implementation of mergeAll is:
import Data.List.Ordered(merge, union)
mergeAll :: Ord a => [[a]] -> [a] mergeAll = foldr (\(x:xs) ys -> x : merge xs ys) []
However, for many inputs, we can do better; the library implementation of mergeAll is based on H. Apfelmus's article "Implicit Heaps", which presents a simplification of Dave Bayer's "venturi" algorithm. The difference is that the foldr version uses a line of comparisons, whereas "venturi" uses a tree of comparisons. http://apfelmus.nfshost.com/articles/implicit-heaps.html http://www.mail-archive.com/haskell-cafe@haskell.org/msg27612.html However, as Omar pointed out to me, the following implementation of unionAll has a flaw:
unionAll :: Ord a => [[a]] -> [a] unionAll = foldr (\(x:xs) ys -> x : union xs ys) []
Namely unionAll [[1,2],[1,2]] should return [1,2], whereas it actually returns [1,1,2]. After some work, I believe I have generalized H. Apfelmus's algorithm to handle this; however it seems a bit complicated. I would love feedback, especially with regard to simplifications, bugs, testing strategies, and optimizations:
unionAll' :: Ord a => [[a]] -> [a] unionAll' = unionAllBy compare
data People a = VIP a (People a) | Crowd [a]
unionAllBy :: (a -> a -> Ordering) -> [[a]] -> [a] unionAllBy cmp xss = loop [ (VIP x (Crowd xs)) | (x:xs) <- xss ] where loop [] = [] loop ( VIP x xs : VIP y ys : xss ) = case cmp x y of LT -> x : loop ( xs : VIP y ys : xss ) EQ -> loop ( VIP x (union' xs ys) : unionPairs xss ) GT -> error "Data.List.Ordered.unionAll: assumption violated!" loop ( VIP x xs : xss ) = x : loop (xs:xss) loop [Crowd xs] = xs loop (xs:xss) = loop (unionPairs (xs:xss))
unionPairs [] = [] unionPairs [x] = [x] unionPairs (x:y:zs) = union' x y : unionPairs zs
union' (VIP x xs) (VIP y ys) = case cmp x y of LT -> VIP x (union' xs (VIP y ys)) EQ -> VIP x (union' xs ys) GT -> error "Data.List.Ordered.unionAll: assumption violated!" union' (VIP x xs) (Crowd ys) = VIP x (union' xs (Crowd ys)) union' (Crowd []) ys = ys union' (Crowd xs) (Crowd ys) = Crowd (unionBy cmp xs ys) union' xs@(Crowd (x:xt)) ys@(VIP y yt) = case cmp x y of LT -> VIP x (union' (Crowd xt) ys) EQ -> VIP x (union' (Crowd xt) yt) GT -> VIP y (union' xs yt)
-- Leon