
If you are doing row-by-row transformations, I would recommend giving a try to my csv-conduit or csv-enumerator packages on Hackage. They were designed with constant space operation in mind, which may help you here.
If you're keeping an accumulator around, however, you may still run into issues with too much laziness.
Ozgun
On Feb 2, 2013, at 7:17 PM, Erik de Castro Lopo
Nick Rudnick wrote:
thanks for the interesting info. I quite often have processing of CSV file data of about 100M-1G done.
What library are you using to process the CSV? I have had problems with excessive laziness causing processing of a 75Meg CSV file consuming 500+ megabytes and after I fixed it memory usage dropped to under a megabyte. Processing time dropped from over 10 minutes to about 2 minutes.
I blogged my problem and solution here:
http://www.mega-nerd.com/erikd/Blog/CodeHacking/Haskell/my_space_is_leaking....
I probably need to revisit that because the problem can probably be fixed without deepseq-generics and just using BangPatterns.
Erik -- ---------------------------------------------------------------------- Erik de Castro Lopo http://www.mega-nerd.com/
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe