
On Tuesday 12 October 2010 11:02:47, Lorenzo Isella wrote:
Thanks Thomas. Yep, I do need some extra reading unfortunately. One question: if I was to apply a function on many files file1, file2...using e.g. Python, this would be my pipeline read file1 do stuff on file 1
read file2 do stuff on file 2
......
Now, due to the laziness of haskell, can I here resort to this approach
read file1, file2... into a single list
map (do-my-stuff) on list
As far as I understand, this should not result e.g. into a huge RAM consumptions since files are read and processed only when needed (hence one at the time). Am I on the right track?
Yes, but there are dangers on that way. With readFile, the contents are read lazily upon demand, but the file is opened immediately for reading. So contentsList <- mapM readFile fileList or allContents <- fmap concat $ mapM readFile fileList can make you run out of file handles if fileList is long enough. Also, the file handles aren't closed until the entire contents of the file has been read (there are a few situations where the handle is closed earlier) and they're not guaranteed to be immediately closed when the end of the file has been reached, they could linger for a GC or two. That means you can also run out of file handles when you process the files sequentially (if you have a bad consumption pattern). The memory usage depends on your consumption pattern, independent of whether theSting[s] you process come[s] from file readings or from a non-IO generator. If you keep references to the beginning of the list, you get a leak, if you consume the list sequentially, it runs in small space.
Cheers
Lorenzo