Re: [Haskell-cafe] Re: Lazy IO and closing of file handles

14 Mar 2007


      pete-expires-20070513:
...
dons@cse.unsw.edu.au (Donald Bruce Stewart) writes:
...
pete-expires-20070513:
...
When using readFile to process a large number of files, I am exceeding
the resource limits for the maximum number of open file descriptors on
my system.  How can I enhance my program to deal with this situation
without making significant changes?
Read in data strictly, and there are two obvious ways to do that:
-- Via strings:
readFileStrict f = do
        s <- readFile f
        length s `seq` return s
-- Via ByteStrings
    readFileStrict  = Data.ByteString.readFile
    readFileStrictString  = liftM Data.ByteString.unpack Data.ByteString.readFile
If you're reading more than say, 100k of data, I'd use strict
ByteStrings without hesitation. More than 10M, and I'd use lazy
bytestrings.
Correct me if I'm wrong, but isn't this exactly what I wanted to
avoid?  Reading the entire file into memory?  In my previous email, I
was trying to state that I wanted to lazily read the file because some
of the files are quite large and there is no reason to read beyond the
small set of headers.  If I read the entire file into memory, this
design goal is no longer met.
Nevertheless, I was benchmarking with ByteStrings (both lazy and
strict), and in both cases, the ByteString versions of readFile yield
the same error regarding max open files.  Incidentally, the lazy
bytestring version of my program was by far the fastest and used the
least amount of memory, but it still crapped out regarding max open
files.
So I'm back to square one.  Any other ideas?
Hmm. Ok. So we need to have more hClose's happen somehow. Can you
process files one at a time?

-- Don

Re: [Haskell-cafe] Re: Lazy IO and closing of file handles

dons＠cse.unsw.edu.au