
What about something as simple as this?
import Control.Monad (forM)
import System.Directory (doesDirectoryExist, getDirectoryContents)
import System.FilePath ((>))
import qualified Data.ByteString as B
import Data.Digest.OpenSSL.MD5 (md5sum)
import qualified Data.Map as M
getRecursiveContents :: FilePath -> IO [FilePath]
getRecursiveContents topdir = do
names <- getDirectoryContents topdir
let properNames = filter (`notElem` [".", ".."]) names
paths <- forM properNames $ \name -> do
let path = topdir > name
isDirectory <- doesDirectoryExist path
if isDirectory
then getRecursiveContents path
else return [path]
return (concat paths)
getMD5 :: FilePath -> IO String
getMD5 file = md5sum `fmap` B.readFile file
main :: IO ()
main = do
files <- getRecursiveContents "."
md5s <- sequence $ map getMD5 files
let m = M.fromListWith (++) $ zip md5s [[f] | f <- files]
putStrLn $ M.showTree m
The biggest part is the "getRecursiveContent", shamelessly stolen from RWH.
L.
On Sun, Mar 18, 2012 at 5:43 PM, Yawar Amin
Hi Michael,
Michael Schober
writes: [...] I took the liberty to modify the output a little bit to my needs - maybe a future reader will find it helpful, too. It's attached below.
I kind of played around with your example a little bit and wondered if it could be implemented in terms of just the basic Haskell Platform modules and functions. So as an exercise I rolled my own directory traversal and duplicate finder functions. This is what I came up with:
- walkDirWith: walks a given directory with a given function that takes a Handle to any (unknown type) value, and returns association lists of paths and the unknown type values.
- filePathMap: I think roughly analogous to your duplicates function.
- main: In the third line of the main function, I use hFileSize as an example of a function that takes a Handle to an IO value, in this case IO Integer. A hash function could easily be put in here. The last line pretty-prints the Map in a tree-like format.
import System.IO import System.Environment (getArgs) import System.Directory ( doesDirectoryExist , getDirectoryContents) import Control.Monad (mapM) import Control.Applicative ((<$>)) import System.FilePath ((>)) import qualified Data.Map as M
walkDirWith :: FilePath -> (Handle -> IO r) -> IO [(r, FilePath)] -> IO [(r, FilePath)] walkDirWith path f walkList = do isDir <- doesDirectoryExist path if isDir then do paths <- getDirectoryContents path concat <$> mapM (\p -> walkDirWith (path > p) f walkList) [p | p <- paths, p /= ".", p /= ".."] else do rValue <- withFile path ReadMode f ((:) (rValue, path)) <$> walkList
filePathMap :: Ord r => [(r, FilePath)] -> M.Map r [FilePath] filePathMap pathPairs = foldl (\theMap (r, path) -> M.insertWith' (++) r [path] theMap) M.empty pathPairs
main :: IO () main = do [dir] <- getArgs fileSizes <- walkDirWith dir hFileSize $ return [] putStr . M.showTree $ filePathMap fileSizes
Obviously there's no right or wrong way to do it, but I'm wondering what you think.
Regards,
Yawar
_______________________________________________ Beginners mailing list Beginners@haskell.org http://www.haskell.org/mailman/listinfo/beginners