
On Sat, Mar 10, 2012 at 12:55 PM, Michael Schober
Hi everyone,
I'm currently trying to solve a problem in which I have to process a long list of files, more specifically I want to compute MD5 checksums for all files.
I have code which lists me all the files and holds it in the following data structure:
data DirTree = FileNode FilePath | DirNode FilePath [DirTree]
I tried the following:
-- calculates MD5 sums for all files in a dirtree addChecksums :: DirTree -> IO [(DirTree,MD5Digest)] addChecksums dir = addChecksums' [dir] where addChecksums' :: [DirTree] -> IO [(DirTree,MD5Digest)] addChecksums' [] = return [] addChecksums' (f@(FileNode fp):re) = do bytes <- BL.readFile fp rest <- addChecksums' re return ((f,md5 bytes):rest)
You're not computing the md5 sums before you have done the same for all other files in the directory... And since you're being lazy you don't even compute it _at all_ before you ask for it leter in your program. If readFile wasn't lazy, you would need to keep all the contents of those files in memory until after addChecksums is completely finished (which would be a big problem in itself), but since readFile is lazy, those file aren't read either until you need their content. But they're still opened, so you get a lot of opened handle you don't close, and opened handle are a limited resource in any OS so... What you need to do is computing the md5 sums as soon as you see the file and before you do anything else, so :
addChecksums' (f@(FileNode fp):re) = do bytes <- BL.readFile fp let !md5sum = md5 bytes rest <- addChecksums' re return ((f,md5sum):rest)
The ! before md5sum indicates that this let-binding should be immediately computed rather than deferred until needed which is the norm for let-binding. Don't forget to add {-# LANGUAGE BangPattern #-} at the beginning of your file. Since the file is read to its end by md5, the handle is automatically closed, so you shouldn't have the same problem. Note that you solution isn't very "functional-like", but rather imperative. On the other hand, making it more functional in this particular case come with its own brand of subtle difficulties. -- Jedaï