
pete-expires-20070513:
When using readFile to process a large number of files, I am exceeding the resource limits for the maximum number of open file descriptors This is very annoying - I can't see any good reason why file descriptors should "run out" (before memory is exhausted). I guess the Linux kernel is intended for imperative use :-/ Read in data strictly, and there are two obvious ways to do that:
-- Via strings [..] -- Via ByteStrings [..] Perhaps this is an esoteric way, but I think the nicest approach is to
Donald Bruce Stewart wrote: parse into a strict structure. If you fully evaluate each Email (or whatever structure you parse into), there will be no unevaluated thunks linking to the file, and it will be closed. If the files are small (e.g. maildir or similar with one email in each?), you can use strict ByteString, but I generally use lazy ByteStrings for just about anything. Be aware that extracting a substring from a ByteString is performed by "slicing", so it keeps a pointer to the original string (along with offset and length). For strict ByteStrings, this would keep everything in memory, for lazy ByteStrings, you'd keep only the relevant chunks (so that would allow the body to be GC'ed, if you aren't interested in keeping it). (I wonder if the garbage collector could somehow discover strings that have been sliced down a lot, and copy only the relevant parts?) -k