
pete-expires-20070513:
dons@cse.unsw.edu.au (Donald Bruce Stewart) writes:
pete-expires-20070513:
When using readFile to process a large number of files, I am exceeding the resource limits for the maximum number of open file descriptors on my system. How can I enhance my program to deal with this situation without making significant changes?
Read in data strictly, and there are two obvious ways to do that:
-- Via strings:
readFileStrict f = do s <- readFile f length s `seq` return s
-- Via ByteStrings readFileStrict = Data.ByteString.readFile readFileStrictString = liftM Data.ByteString.unpack Data.ByteString.readFile
If you're reading more than say, 100k of data, I'd use strict ByteStrings without hesitation. More than 10M, and I'd use lazy bytestrings.
Correct me if I'm wrong, but isn't this exactly what I wanted to avoid? Reading the entire file into memory? In my previous email, I was trying to state that I wanted to lazily read the file because some of the files are quite large and there is no reason to read beyond the small set of headers. If I read the entire file into memory, this design goal is no longer met.
Nevertheless, I was benchmarking with ByteStrings (both lazy and strict), and in both cases, the ByteString versions of readFile yield the same error regarding max open files. Incidentally, the lazy bytestring version of my program was by far the fastest and used the least amount of memory, but it still crapped out regarding max open files.
So I'm back to square one. Any other ideas?
Hmm. Ok. So we need to have more hClose's happen somehow. Can you process files one at a time? -- Don