What about something as simple as this?


import           Control.Monad (forM)
import           System.Directory (doesDirectoryExist, getDirectoryContents)
import           System.FilePath ((</>))
import qualified Data.ByteString as B
import           Data.Digest.OpenSSL.MD5 (md5sum)
import qualified Data.Map as M

getRecursiveContents :: FilePath -> IO [FilePath]
getRecursiveContents topdir = do
        names <- getDirectoryContents topdir
        let properNames = filter (`notElem` [".", ".."]) names
        paths <- forM properNames $ \name -> do
                let path = topdir </> name
                isDirectory <- doesDirectoryExist path
                if isDirectory
                        then getRecursiveContents path
                        else return [path]
        return (concat paths)

getMD5 :: FilePath -> IO String
getMD5 file = md5sum `fmap` B.readFile file

main :: IO ()
main = do
        files <- getRecursiveContents "."
        md5s <- sequence $ map getMD5 files
        let m = M.fromListWith (++) $ zip md5s [[f] | f <- files]
        putStrLn $ M.showTree m


The biggest part is the "getRecursiveContent", shamelessly stolen from RWH.

L.




On Sun, Mar 18, 2012 at 5:43 PM, Yawar Amin <yawar.amin@gmail.com> wrote:
Hi Michael,

Michael Schober <Micha-Schober <at> web.de> writes:

> [...]
> I took the liberty to modify the output a little bit to my needs - maybe
> a future reader will find it helpful, too. It's attached below.

I kind of played around with your example a little bit and wondered if it
could be implemented in terms of just the basic Haskell Platform
modules and functions. So as an exercise I rolled my own directory
traversal and duplicate finder functions. This is what I came up with:

- walkDirWith: walks a given directory with a given function that takes a
Handle to any (unknown type) value, and returns association lists of
paths and the unknown type values.

- filePathMap: I think roughly analogous to your duplicates function.

- main: In the third line of the main function, I use hFileSize as an
example of a function that takes a Handle to an IO value, in this case IO
Integer. A hash function could easily be put in here. The last line
pretty-prints the Map in a tree-like format.

import System.IO
import System.Environment (getArgs)
import System.Directory ( doesDirectoryExist
                       , getDirectoryContents)
import Control.Monad (mapM)
import Control.Applicative ((<$>))
import System.FilePath ((</>))
import qualified Data.Map as M

walkDirWith :: FilePath -> (Handle -> IO r) -> IO [(r, FilePath)] ->
              IO [(r, FilePath)]
walkDirWith path f walkList = do
 isDir <- doesDirectoryExist path
 if isDir
   then do
     paths <- getDirectoryContents path
     concat <$> mapM (\p -> walkDirWith (path </> p) f walkList)
                     [p | p <- paths, p /= ".", p /= ".."]
   else do
     rValue <- withFile path ReadMode f
     ((:) (rValue, path)) <$> walkList

filePathMap :: Ord r => [(r, FilePath)] -> M.Map r [FilePath]
filePathMap pathPairs =
 foldl (\theMap (r, path) -> M.insertWith' (++) r [path] theMap)
       M.empty
       pathPairs

main :: IO ()
main = do
 [dir] <- getArgs
 fileSizes <- walkDirWith dir hFileSize $ return []
 putStr . M.showTree $ filePathMap fileSizes

Obviously there's no right or wrong way to do it, but I'm wondering
what you think.

Regards,

Yawar



_______________________________________________
Beginners mailing list
Beginners@haskell.org
http://www.haskell.org/mailman/listinfo/beginners