
Hello,
So before I embark on day 1 of the project, I thought I should check and see if anyone on this list has used Haskell to munge a ten-million-row database table, and if there are any particular gotchas I should watch out for.
One immediate thing to be careful about is how you do IO. Haskell is not very good, in my experience, at reading files fast. You'll probably want to skip the standard Haskell IO functions and use the lazy bytestring library (http://www.cse.unsw.edu.au/~dons/fps.html). Another thing to be careful about is laziness. I suspect it will be very easy to write code that does what you want but overflows your heap space due to delaying the computation on each row until after the entire file is read and the result of the complete computation is needed. More information on this is available at: http://haskell.org/haskellwiki/Performance. good luck, Jeff