
Hi! I have some resource problems when extracting data from a file. The task is as follows: I have a huge (500MB) binary file, containing some interesting parts and lots of rubbish. Furthermore, there is a directory that tells me the parts of the file (first- and last byte index) that contain the substrings I need. My approach to do this is to open the file and to pass the list of addresses along with the handle to a function that processes the list step-by-step and calls a subfunction which uses the handle to seek the start position of the interesting block, reads the block into a bytestring (lazy or not, didn't make any difference here) and calls the function that scans this byte string for the interesting part. Using this approach - which results in a data structure with an approximate size of 10 MB - the program uses hundreds of megabytes of RAM, which forces my computer to swap (with the obvious results...). I have right now two main suspects: The recursive function is tail-recursive, but I don't know whether the usual way to write these functions (with an accumulator etc) works in monadic code (the stage is, of course, the IO monad, and I am using the do-notation as I don't like the only other way I know, writing lambdas and lambdas and lambdas into the function body). The other problem I can imagine is the passing-around of the file handle, and the subsequent reading of byte strings: Are those strings somehow attached to the handle, and does the handle work in a different way than I expected, i.e. is the handle copied while using it as an argument for another function, and exists something like a register of handles that keeps the connection upright and, therefore, excludes the (handle, string)-chunk from garbage collection? I have, of course, been experimenting with the "seq" - function, but, honestly, I am not sure whether I got it right. Does a call to "identity $! (function arguments ...)" force the full evaluation of the function? Greetings! Moritz