New subject: Greetings

30 Sep 2006

      I've done some stuff with maybe 50k rows at a time.  A few bits and pieces:

1: I've used HSQL 
(http://sourceforge.net/project/showfiles.php?group_id=65248) to talk to 
ODBC databases.  Works fine, but possibly a bit slowly.  I'm not sure 
where the delay is: it might just be the network I was running it over.  
One gotcha: the field function takes a field name, but its not random 
access.  Access the fields in query order or it crashes.

2: For large data sets laziness is your friend.  When reading files 
"getContents" presents an entire file as a list, but its really 
evaluated lazily.  This is implemented using unsafeInterleaveIO.  I've 
never used this, but in theory you should be able to set up a query that 
returns the entire database as a list and then step through it using 
lazy evaluation in the same way.

3: You don't say whether these algorithms are just row-by-row algorithms 
or whether there is something more sophisticated going on.  Either way, 
try to make things into lists and then apply map, fold and filter 
operations.  Its much more declarative and high level when you do it 
that way.

Let us know how you get on.

Paul.

Re: Greetings

Paul Johnson

Seth Gordon

Krasimir Angelov

tags

participants (3)