
Jason Dagit wrote:
Hello,
A colleague of mine recently asked if I knew of a lazy way to solve the following problem: Given two sets of sorted floating point numbers, can we lazily generate a sorted list of the products from their Cartesian product?
The algorithm should return the same result as: sortProduct a b = sort [ x * y | x <- a, y <- b ]
But, my friend is not satisfied with the strictness of sort. In particular, he doesn't want to have to hold the whole list in memory to sort it. He would like to compute the results one product at a time and in the correct order.
The algorithm you're looking for is called "(lazy) cube pruning" and is quite common in machine translation. If you're willing to slog through MT papers, this one[1] gives a decent introduction as I recall. The algorithm works for any monotone combination function. The basic idea is to do a frontier search on the cartesian grid. We know a priori that the first elements of each list will give the "best" combination value. We return that value as the first of our new list, and mark it off on the grid (not really), we then look at the next element for each list and compute the combination values for each of the new fringe cells, and store them in a heap. Pick the best value out of the heap and return it as the next element of the list, 'mark it off' on the grid and explore the new fringe. Because the combination function is monotone, this search strategy guarantees you'll encounter every combination values in the right order to return them with minimal delay. The only downside is that you need to maintain the heap of seen values that aren't ready to be returned yet (which can grow quite large in the worst case). For MT systems where this can really be a problem, the cube search strategy is often integrated with beam search (i.e. if the heap grows larger than some size limit then the "worst" values start dropping out of the bottom)--- hence the name cube *pruning*. If you don't feel encumbered by the LGPL, we have a Java implementation of the full algorithm already. Your use case is much simpler than the variant necessary for MT, but the reference implementation can be quite helpful. Feel free to contact me off-list if you're interested, I can point you to our repository and where the algorithm is in the morass of code. [1] http://www.cis.upenn.edu/~lhuang3/acl-cube.pdf -- Live well, ~wren