Re: [Haskell-beginners] Reading Multiple Files and Iterate Function Application

12 Oct 2010

      On Tuesday 12 October 2010 11:02:47, Lorenzo Isella wrote:
...
Thanks Thomas. Yep, I do need some extra reading unfortunately.
One question: if I was to apply a function on many files file1,
file2...using e.g. Python, this would be my pipeline
read file1
do stuff on file 1
read file2
do stuff on file 2
......
Now, due to the laziness of haskell, can I here resort to this approach
read file1, file2... into a single list
map (do-my-stuff) on list
As far as I understand, this should not result e.g. into a huge RAM
consumptions since files are read and processed only when needed (hence
one at the time).
Am I on the right track?
Yes, but there are dangers on that way.
With readFile, the contents are read lazily upon demand, but the file is 
opened immediately for reading. So

contentsList <- mapM readFile fileList

or

allContents <- fmap concat $ mapM readFile fileList

can make you run out of file handles if fileList is long enough.

Also, the file handles aren't closed until the entire contents of the file 
has been read (there are a few situations where the handle is closed 
earlier) and they're not guaranteed to be immediately closed when the end 
of the file has been reached, they could linger for a GC or two.
That means you can also run out of file handles when you process the files 
sequentially (if you have a bad consumption pattern).

The memory usage depends on your consumption pattern, independent of 
whether theSting[s] you process come[s] from file readings or from a non-IO 
generator.
If you keep references to the beginning of the list, you get a leak, if you 
consume the list sequentially, it runs in small space.
...
Cheers
Lorenzo

Re: [Haskell-beginners] Reading Multiple Files and Iterate Function Application

Daniel Fischer