Re: [Haskell-beginners] Processing a list of files the Haskell way

10 Mar 2012

      Hi Michael,

Your code has a very C-like feel to it. I would first separate the
reading of the directory structure and the files and the walk over the
tree. Something like this:

data DirTree = FileNode FilePath | DirNode FilePath [DirTree]

walkDirTree :: (FilePath -> a) -> DirTree -> [a]
walkDirTree f (FileNode fp)  = [f fp]
walkDirTree f (DirNode fp fs)  = f fp : (fs >>= (walkDirTree f))

I know this isn't what you need, I didn't read your solution properly
when I wrote it, but it is a useful hint. The separation of the pure
part and the IO part of your program is important.

The problem of the open files is another beast. You are using lazy
bytestrings. Lazy bytestrings can keep the file descriptor open as
long as you haven't read all the bytes. I suspect you need to add some
strictness to your program. You can try to use strict bytestrings. Or
use seq to evaluate the md5 thunks earlier in the program execution.

Greets,

Edgar

On 3/10/12, Michael Schober  wrote:
...
Hi everyone,
I'm currently trying to solve a problem in which I have to process a
long list of files, more specifically I want to compute MD5 checksums
for all files.
I have code which lists me all the files and holds it in the following
data structure:
data DirTree = FileNode FilePath | DirNode FilePath [DirTree]
I tried the following:
-- calculates MD5 sums for all files in a dirtree
addChecksums :: DirTree -> IO [(DirTree,MD5Digest)]
addChecksums dir = addChecksums' [dir]
   where
     addChecksums' :: [DirTree] -> IO [(DirTree,MD5Digest)]
     addChecksums' [] = return []
     addChecksums' (f@(FileNode fp):re) = do
       bytes <- BL.readFile fp
       rest <- addChecksums' re
       return ((f,md5 bytes):rest)
     addChecksums' ((DirNode fp filelist):re) = do
       efiles <- addChecksums' filelist
       rest <- addChecksums' re
       return $ efiles ++ rest
This works fine, but only for a small number of files. If I try it on a
big directory tree, the memory gets junked up and it aborts with an
error message telling me that there are too many open files.
So I guess, I have to sequentialize the code a little bit more. But at
the same time, I want to keep it as functional as possible and I don't
want to write C-like code.
What would be the Haskell way to do something like this?
Thanks for all the input,
Michael
_______________________________________________
Beginners mailing list
Beginners@haskell.org
http://www.haskell.org/mailman/listinfo/beginners

Re: [Haskell-beginners] Processing a list of files the Haskell way

edgar klerks