Re: [Haskell-cafe] looking for a good algorithm

13 Nov 2009

      To: Casey Hawthorne 
Subject: Re: [Haskell-cafe] looking for a good algorithm
From: Casey Hawthorne 
Date: Thu, 12 Nov 2009 11:14:02 -0800

On third thought, convert the table to a 2D array of bits (or a 1D
array of bits mapped to a 2D coordinate system).
The bit is true for either an Int or Double.

If space is a concern and one can tolerate a low rate of false
positives, than a 2D Bloom Filter might be appropriate, or a 1D Bloom
Filter with mapping to 2D coordinates.

http://en.wikipedia.org/wiki/Bloom_filter

Then on to simmulated annealing, possibly.

Note: Donald Knuth's new fascicle is out:
Volume 4 Fascicle 1, Bitwise Tricks & Techniques; Binary Decision
Diagrams (2009), xiii+261pp. ISBN 0-321-58050-8

On Wed, 11 Nov 2009 15:32:35 -0800, you wrote:
...
So, as I understand it, you have a very large sparse table, thousands
of rows and hundreds of columns, of which each cell within a column of
type String, Int, or Double can contain one of those types or nothing.
Then you to want to shuffle the rows to maximize the number of columns
whose first 100 rows have at least one number (Int or Double), given a
list of preferred column names since there is no guarantee that every
number column will have at least one number in its first 100 rows
after shuffling.
I'm wondering about hashing on the rows and hashing on the columns,
then the column hash has the number of Int's or Double's (don't need
the String's) in that column and the rows they are in.
The row hash would have the number of Int's and Double's in that row
and what column's they are in.
Then;
Then scan the row hash and sort into descending order, and by tagging
those rows, not by actually moving them.
Then I think your ready for simmulated annealing.
--
Regards,
Casey