
On Sun, Mar 21, 2010 at 08:28:23PM -0400, Patrick LeBoutillier wrote:
I'm no profiling expert, but I have a few questions though:
- What is the size (in bytes) of your input file?
The input file contains 31,129,639 bytes, with 710,355 lines and 27,954 individual packages/records. In fact it seems that even if the input stream repeats only the line " a" (that is, a space and the letter a) infinitely, the program eventually eats all memory. That's interesting. So I guess the lines are read in, but then held in memory and lazily not processed further until the entire file has been read, because it's not necessary? Or something. I tried using $! in readRecordFields like readRecordFields lines = (mapMaybe (readField $!) rl, rest) where (rl,rest) = getOneRecordLines lines but that didn't help either... I guess I don't fully understand $!. Does it force the entire computation below it to finish? To achieve sane memory behavior, it would seem necessary to parse the lines before they're all read, and then mapMaybe in readRecordFields would throw out the Nothings. After that I believe it should all be constant amount of memory with all lines starting with a space.
- Also, does memory usage improve if you remove the "sort"?
Yes. Then it only takes a few megabytes, regardless of how large the file is. Thanks, Sami