looking for a good algorithm

The question is more about algorithm than Haskell. But I am going to code in Haskell which I am still learning. Suppose I have a large table, with hundreds of columns and thousands of rows. But not every cell has a value (of String, or Int, or Double type). I want to shuffle the rows to maximize the number of columns whose first 100 rows have at least one number, given a list of preferred column names since there is no guarantee that every number column will have at least one number in its first 100 rows after shuffling. Can someone provide a good algorithm for this problem? (I do not have any background in algorithms.) You can assume I already know which columns are of Int or Double type. This is not a homework. Thanks, Hong

Hong Yang
I want to shuffle the rows to maximize the number of columns whose first 100 rows have at least one number
Sounds like the maximum coverage problem, which is said to be NP-hard. [citation needed] http://en.wikipedia.org/wiki/Maximum_coverage_problem -- Edit this signature at http://www.digitas.harvard.edu/cgi-bin/ken/sig The answer to the ultimate question of life the universe and everything = 42 but usually it's not.

Sorry, I forgot to ask an important question. Is the table stored in a dense format as in complete rows and complete columns or in a sparse table format?
The question is more about algorithm than Haskell. But I am going to code in Haskell which I am still learning.
Suppose I have a large table, with hundreds of columns and thousands of rows. But not every cell has a value (of String, or Int, or Double type).
I want to shuffle the rows to maximize the number of columns whose first 100 rows have at least one number, given a list of preferred column names since there is no guarantee that every number column will have at least one number in its first 100 rows after shuffling.
Can someone provide a good algorithm for this problem? (I do not have any background in algorithms.) You can assume I already know which columns are of Int or Double type.
-- Regards, Casey

Hi Casey,
Thanks very much for your zeal.
The table is a csv file. Actually the number of rows can be sixty thousand.
Hong
On Fri, Nov 13, 2009 at 5:08 PM, Casey Hawthorne
Sorry, I forgot to ask an important question.
Is the table stored in a dense format as in complete rows and complete columns or in a sparse table format?
The question is more about algorithm than Haskell. But I am going to code in Haskell which I am still learning.
Suppose I have a large table, with hundreds of columns and thousands of rows. But not every cell has a value (of String, or Int, or Double type).
I want to shuffle the rows to maximize the number of columns whose first 100 rows have at least one number, given a list of preferred column names since there is no guarantee that every number column will have at least one number in its first 100 rows after shuffling.
Can someone provide a good algorithm for this problem? (I do not have any background in algorithms.) You can assume I already know which columns are of Int or Double type.
-- Regards, Casey
participants (3)
-
Casey Hawthorne
-
Chung-chieh Shan
-
Hong Yang