
On Monday 07 June 2010 14:06:22, Anand Mitra wrote:
Hello All,
I want to build a program which will recursively scan a directory and build md5sum for all the files. The intent is to do something similar to unison but more specific to my requirements. I am having trouble in the initial part of building the md5sums.
I did some digging around and found that "System.Directory.Tree" is a very close match for what I want to do. In fact after a little poking around I could do exactly what I wanted.
,----
| import Monad | import System.Directory.Tree | import System.Directory | import Data.Digest.Pure.MD5 | import qualified Data.ByteString.Lazy.Char8 as L | | calcMD5 = | readDirectoryWith (\x-> liftM md5 (L.readFile x))
Does calcMD5 = readDirectoryWith (\x -> do txt <- readFile x return $! md5 txt) help?
`----
This work perfectly for small directories. readDirectoryWith is already defined in the library and exactly what we want
,----
| *Main> calcMD5 "/home/mitra/Desktop/" | | "/home/mitra" :/ Dir {name = "Desktop", contents = [File {name = | "060_LocalMirror_Workflow.t.10.2.62.9.log", file = | f687ad04bc64674134e55c9d2a06902a},File {name = "cmd_run", file = | 6f334f302b5c0d2028adeff81bf2a0d9},File {name = "cmd_run~",
`----
However when ever I give it something more challenging it gets into trouble.
,----
| *Main> calcMD5 "/home/mitra/laptop/" | *** Exception: /home/mitra/laptop/ell/calc-2.02f/calc.info-27: | openFile: resource exhausted (Too many open files) | *Main> 29~
`----
If I understand what is happening it seems to be doing all the opens before consuming them via md5. This works fine for small directories but for any practical setup this could potentially be very large. I tried forcing the md5 evaluation in the hope that the file descriptor will be freed once the entire file is read. That did not help, either because I could not get it right or there is some more subtle I am missing.
I also had a look at the code in module "System.Directory.Tree" and although it gave me some understanding of how it works I am no closer to a solution.
regards