Hi Again,

Thanks to the help from this group I have got past my first problem and now stuck on another. I have succeeded in building a DirTree using library "System.Directory.Tree" with properties of md5sum and modification time. I have also built  the parts that can compare two such trees and find the files that have changed.

I was trying to serializing the tree so that it can be saved for later identification of changed files. The module "Tree" incidentally derives Show but not Read and hence I cannot read the file serialized file. Searching a bit I decided to use the infrastructure in "Data.Binary" to do the job for me.

As soon as I started this I realized that I would have to modify Tree.hs module. This was required because DirTree does not derive from Typeable and Data which is required for it to be serialized via "Data.Binary". After patching the Binary module to derive from Typeable and Data I get the following error.


System/Directory/Tree.hs:95:47:
    No instance for (Data Exception)
      arising from the 'deriving' clause of a data type declaration
                   at System/Directory/Tree.hs:95:47-50
    Possible fix:
      add an instance declaration for (Data Exception)
      or use a standalone 'deriving instance' declaration instead,
         so you can specify the instance context yourself
    When deriving the instance for (Data (DirTree a))
Failed, modules loaded: BinaryDerive.

Secondly I suspect that I could have derived it without having to modify the original module source. The compilation error does give a hint about "standalone deriving instance" but trying stuff at http://www.haskell.org/ghc/docs/6.12.2/html/users_guide/deriving.html did not help me much.

In short what is the simplest way I can serialize "System.Directory.Tree" using Binary. Is there a better alternative to Binary for serialization ? Are there solutions to the problem I have outlined above ? or is my approach incorrect. Is it possible to add the deriving of datatype DirTree without modifying the module ?

regards
--
Anand Mitra


On Mon, Jun 7, 2010 at 5:36 PM, Anand Mitra <anand.mitra@gmail.com> wrote:
Hello All,

I want to build a program which will recursively scan a directory and
build md5sum for all the files. The intent is to do something similar
to unison but more specific to my requirements. I am having trouble in
the initial part of building the md5sums.

I did some digging around and found that "System.Directory.Tree" is a
very close match for what I want to do. In fact after a little poking
around I could do exactly what I wanted.

,----
| import Monad
| import System.Directory.Tree
| import System.Directory
| import Data.Digest.Pure.MD5
| import qualified Data.ByteString.Lazy.Char8 as L
|
| calcMD5 =
|     readDirectoryWith (\x-> liftM md5 (L.readFile x))
`----

This work perfectly for small directories. readDirectoryWith is
already defined in the library and exactly what we want

,----
| *Main> calcMD5 "/home/mitra/Desktop/"
|
| "/home/mitra" :/ Dir {name = "Desktop", contents = [File {name =
| "060_LocalMirror_Workflow.t.10.2.62.9.log", file =
| f687ad04bc64674134e55c9d2a06902a},File {name = "cmd_run", file =
| 6f334f302b5c0d2028adeff81bf2a0d9},File {name = "cmd_run~",
`----

However when ever I give it something more challenging it gets into
trouble.

,----
| *Main> calcMD5 "/home/mitra/laptop/"
| *** Exception: /home/mitra/laptop/ell/calc-2.02f/calc.info-27:
|    openFile: resource exhausted (Too many open files)
| *Main> 29~
`----

If I understand what is happening it seems to be doing all the opens
before consuming them via md5. This works fine for small directories
but for any practical setup this could potentially be very large. I
tried forcing the md5 evaluation in the hope that the file descriptor
will be freed once the entire file is read. That did not help, either
because I could not get it right or there is some more subtle I am
missing.

I also had a look at the code in module "System.Directory.Tree" and
although it gave me some understanding of how it works I am no closer
to a solution.

regards
--
Anand Mitra