On Wed, Nov 11, 2009 at 9:36 AM, Tillmann Vogt <Tillmann.Vogt@rwth-aachen.de> wrote:

Hong Yang schrieb:

The question is more about algorithm than Haskell. But I am going to code in
Haskell which I am still learning.

Suppose I have a large table, with hundreds of columns and thousands of
rows. But not every cell has a value (of String, or Int, or Double type).

I want to shuffle the rows to maximize the number of columns whose first 100
rows have at least one number, given a list of preferred column names since
there is no guarantee that every number column will have at least one number
in its first 100 rows after shuffling.

Can someone provide a good algorithm for this problem? (I do not have any
background in algorithms.) You can assume I already know which columns are
of Int or Double type.

I would say it depends on the distribution of values in the table. If there are rows with a lot of values and rows with few values, then I would first sort the rows after the number of cells with values. If you look at all the columns and the number of values for each row is unique then it would be perfectly solved. With a list of preferred columns and also a uniform distribution the problem might be hard (NP-complete?), but these hard problems can often be approximated, i.e with simulated annealing, which in short is switching two rows repeatedly as long as the result improves.

This is not a homework.
Thanks,

Hong

------------------------------------------------------------------------

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe