Computing a sorted list of products lazily

Hello, A colleague of mine recently asked if I knew of a lazy way to solve the following problem: Given two sets of sorted floating point numbers, can we lazily generate a sorted list of the products from their Cartesian product? The algorithm should return the same result as: sortProduct a b = sort [ x * y | x <- a, y <- b ] But, my friend is not satisfied with the strictness of sort. In particular, he doesn't want to have to hold the whole list in memory to sort it. He would like to compute the results one product at a time and in the correct order. Is there an existing algorithm for this problem? We searched but we don't seem to know the right search terms. Additionally, my friend is not writing this in Haskell, but a Haskell solution is fine because we should be able to transform it into a Java solution by hand. We think we know an algorithm that will produce the right result but it's a bit messy and we doubt we're the first to tackle this problem. Any help would be greatly appreciated. Thanks in advance! Jason

I did something similar a while ago to solve a problem posted on StackOverflow: http://porg.es/blog/sorted-sums-of-a-sorted-list Henry Laxen generalized my code a little bit so you can pass in any monotonic function (see the comments). I'm not sure of the laziness properties of this, but it might be lazy enough... you may need to swap for a boxed array.

Hello Jason, Saturday, April 18, 2009, 1:41:18 AM, you wrote:
The algorithm should return the same result as: sortProduct a b = sort [ x * y | x <- a, y <- b ]
i think it's well-known problem. you should write a function merging infinite list of sorted lists. in assumption that lists are ordered by their first element this should be doable -- Function that merge finite lists (kidnapped from Data.List): merge_lists :: (a -> a -> Ordering) -> [[a]] -> [a] merge_lists cmp [] = [] merge_lists cmp [xs] = xs merge_lists cmp xss = merge_lists cmp (merge_pairs cmp xss) merge_pairs :: (a -> a -> Ordering) -> [[a]] -> [[a]] merge_pairs cmp [] = [] merge_pairs cmp [xs] = [xs] merge_pairs cmp (xs:ys:xss) = merge cmp xs ys : merge_pairs cmp xss merge :: (a -> a -> Ordering) -> [a] -> [a] -> [a] merge cmp xs [] = xs merge cmp [] ys = ys merge cmp (x:xs) (y:ys) = case x `cmp` y of GT -> y : merge cmp (x:xs) ys _ -> x : merge cmp xs (y:ys) this function merges lists in pairs. rewriting it to make incremental merging, using "sorted by first element" property and omitting finite lists support, we get just: -- Merging infinite sorted lists merge_lists :: (Ord a) => [[a]] -> [a] merge_lists ((x:xs):xss) = x : merge xs (merge_lists xss) merge :: (Ord a) => [a] -> [a] -> [a] merge (x:xs) (y:ys) = case x > y of True -> y : merge (x:xs) ys _ -> x : merge xs (y:ys) works for me with test script: main = putStr (unlines$ map show$ merge_lists [[x*y | x<-[1..]] | y<-[1..]]) -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Am Freitag 17 April 2009 23:41:18 schrieb Jason Dagit:
Hello,
A colleague of mine recently asked if I knew of a lazy way to solve the following problem: Given two sets of sorted floating point numbers, can we lazily generate a sorted list of the products from their Cartesian product?
If the numbers are nonnegative, you could use sortedProducts xs ys = foldr merge1 [] [map (*x) ys | x <- xs] merge1 (x:xs) ys = x:merge xs ys merge1 [] ys = ys merge xs@(x:xt) ys@(y:yt) | x < y = x:merge xt ys | otherwise = y:merge xs yt (or remove duplicates, if you wish) Since the lists are sorted and the numbers nonnegative, each (map (*x) ys) is sorted and the heads of these are sorted, too, therefore we can use merge1, which makes the fold lazy enough to work even on infinite lists (it is not very fast, though, if speed is crucial, you might be better off to sort in chunks).
The algorithm should return the same result as: sortProduct a b = sort [ x * y | x <- a, y <- b ]
But, my friend is not satisfied with the strictness of sort. In particular, he doesn't want to have to hold the whole list in memory to sort it. He would like to compute the results one product at a time and in the correct order.
Is there an existing algorithm for this problem? We searched but we don't seem to know the right search terms. Additionally, my friend is not writing this in Haskell, but a Haskell solution is fine because we should be able to transform it into a Java solution by hand.
We think we know an algorithm that will produce the right result but it's a bit messy and we doubt we're the first to tackle this problem. Any help would be greatly appreciated. Thanks in advance!
Jason

Hi Daniel, I love your solution. Very elegant. Daniel Fischer wrote:
sortedProducts xs ys = foldr merge1 [] [map (*x) ys | x <- xs]
merge1 (x:xs) ys = x:merge xs ys merge1 [] ys = ys
merge xs@(x:xt) ys@(y:yt) | x < y = x:merge xt ys | otherwise = y:merge xs yt (or remove duplicates, if you wish)
Small addition: if you want this to run on finite input as well, add this case:
merge xs ys = xs ++ ys
Martijn.

Am Samstag 18 April 2009 00:31:50 schrieb Martijn van Steenbergen:
Hi Daniel,
I love your solution. Very elegant.
Daniel Fischer wrote:
sortedProducts xs ys = foldr merge1 [] [map (*x) ys | x <- xs]
merge1 (x:xs) ys = x:merge xs ys merge1 [] ys = ys
merge xs@(x:xt) ys@(y:yt)
| x < y = x:merge xt ys | otherwise = y:merge xs yt
(or remove duplicates, if you wish)
Small addition: if you want this to run on finite input as well, add
this case:
merge xs ys = xs ++ ys
Argh! When typing the foldr, I thought I mustn't forget the case for an empty list in merge1 and merge. Well, at least I remembered half of it :(
Martijn.

Jason Dagit wrote:
A colleague of mine recently asked if I knew of a lazy way to solve the following problem: Given two sets of sorted floating point numbers, can we lazily generate a sorted list of the products from their Cartesian product?
The algorithm should return the same result as: sortProduct a b = sort [ x * y | x <- a, y <- b ]
First create a matrix of all the products, represented as a list of non-empty lists such that the inner lists are sorted, and the outer list is sorted by the first elements of the inner lists. Then repeatedly remove the first element of the first list, and reorder the list-of-lists.
module SortProduct where
import Data.List (insertBy) import Data.Function (on)
products a b = [[x * y | x <- a] | y <- b]
toList [] = [] toList ([x] : rest) = x : toList rest toList ((x : xs) : rest) = x : toList (insertBy (compare `on` head) xs rest)
sortProduct a b = toList (products a b)
Tillmann

Jason Dagit wrote:
Hello,
A colleague of mine recently asked if I knew of a lazy way to solve the following problem: Given two sets of sorted floating point numbers, can we lazily generate a sorted list of the products from their Cartesian product?
The algorithm should return the same result as: sortProduct a b = sort [ x * y | x <- a, y <- b ]
But, my friend is not satisfied with the strictness of sort. In particular, he doesn't want to have to hold the whole list in memory to sort it. He would like to compute the results one product at a time and in the correct order.
The algorithm you're looking for is called "(lazy) cube pruning" and is quite common in machine translation. If you're willing to slog through MT papers, this one[1] gives a decent introduction as I recall. The algorithm works for any monotone combination function. The basic idea is to do a frontier search on the cartesian grid. We know a priori that the first elements of each list will give the "best" combination value. We return that value as the first of our new list, and mark it off on the grid (not really), we then look at the next element for each list and compute the combination values for each of the new fringe cells, and store them in a heap. Pick the best value out of the heap and return it as the next element of the list, 'mark it off' on the grid and explore the new fringe. Because the combination function is monotone, this search strategy guarantees you'll encounter every combination values in the right order to return them with minimal delay. The only downside is that you need to maintain the heap of seen values that aren't ready to be returned yet (which can grow quite large in the worst case). For MT systems where this can really be a problem, the cube search strategy is often integrated with beam search (i.e. if the heap grows larger than some size limit then the "worst" values start dropping out of the bottom)--- hence the name cube *pruning*. If you don't feel encumbered by the LGPL, we have a Java implementation of the full algorithm already. Your use case is much simpler than the variant necessary for MT, but the reference implementation can be quite helpful. Feel free to contact me off-list if you're interested, I can point you to our repository and where the algorithm is in the morass of code. [1] http://www.cis.upenn.edu/~lhuang3/acl-cube.pdf -- Live well, ~wren
participants (7)
-
Bulat Ziganshin
-
Daniel Fischer
-
Jason Dagit
-
Martijn van Steenbergen
-
porges@porg.es
-
Tillmann Rendel
-
wren ng thornton