Excess mem consumption in file IO task

6 Jan 2009

      Hi!

I have some resource problems when extracting data from a file. The
task is as follows: I have a huge (500MB) binary file, containing some
interesting parts and lots of rubbish. Furthermore, there is a
directory that tells me the parts of the file (first- and last byte
index) that contain the substrings I need. My approach to do this is
to open the file and to pass the list of addresses along with the
handle to a function that processes the list step-by-step and calls a
subfunction which uses the handle to seek the start position of the
interesting block, reads the block into a bytestring (lazy or not,
didn't make any difference here) and calls the  function that scans
this byte string for the interesting part. Using this approach - which
results in a data structure with an approximate size of 10 MB - the
program uses hundreds of megabytes of RAM, which forces my computer to
swap (with the obvious results...).
I have right now two main suspects: The recursive function is
tail-recursive, but I don't know whether the usual way to write these
functions (with an accumulator etc) works in monadic code (the stage
is, of course, the IO monad, and I am using the do-notation as I don't
like the only other way I know, writing lambdas and lambdas and
lambdas into the function body). The other problem I can imagine is
the passing-around of the file handle, and the subsequent reading of
byte strings: Are those strings somehow attached to the handle, and
does the handle work in a different way than I expected, i.e. is the
handle copied while using it as an argument for another function, and
exists something like a register of handles that keeps the connection
upright and, therefore, excludes the (handle, string)-chunk from
garbage collection?
I have, of course, been experimenting with the "seq" - function, but,
honestly, I am not sure whether I got it right. Does a call to
"identity $! (function arguments ...)" force the full evaluation of
the function?
Greetings!

        Moritz

Moritz Tacke

Ertugrul Soeylemez

tags

participants (2)