getRecursiveContents - example from `Real World Haskell'

`Real World Haskell' is a great book. I really love it. When I tried an example from the 9th Chapter, I was a bit disappointed: *Main> f <- getRecursiveContents "/home/johann/" Heap exhausted; Current maximum heap size is 128000000 bytes (122 Mb); use `+RTS -M<size>' to increase it. The function lookes like this: getRecursiveContents :: FilePath -> IO [FilePath] getRecursiveContents topdir = do names <- getDirectoryContents topdir let properNames = filter (`notElem` [".", ".."]) names paths <- forM properNames $ \name -> do -- 1 let path = topdir > name isDirectory <- doesDirectoryExist path if isDirectory then getRecursiveContents path else return [path] return (concat paths) -- 2 OK, I'm using a small machine and my home directory contains ~30,000 files. But that couldn't be the real problem. And even if this function is a small example it should work reliable. The programming language I know best (and this is meant relative -- I'm only a `would be programmer') is python. Python has good support for functional programming, but no builtin tail recursion. So my first idea about the bug in `getRecursiveContents' went in this direction. Two hours later I had worked out this solution: getRecursiveContents :: FilePath -> IO [FilePath] getRecursiveContents = getRecursiveContents' [] where getRecursiveContents' l p = E.handle (\_ -> return (p:l)) $ do -- 3 c <- getDirectoryContents p let c' = filter (`notElem` [".", ".."]) c x <- foldM (\l' p' -> getRecursiveContents' l' (p > p')) l c' -- 4 return (x) Folding (4) and appending (3) would give less memory usage than mapping (1) and concatenation (2), I thought. This function worked well for small directory (for which the original one did, too). But tested with my home directory it went into an infinite loop. That led me to the actually problem: `doesDirectoryExist' also accepts symlinks to directories. Another hour later this was fixed: getRecursiveContents :: FilePath -> IO [FilePath] getRecursiveContents = getRecursiveContents' [] where getRecursiveContents' l p = do s <- getSymbolicLinkStatus p if isDirectory s then E.handle (\_ -> return (p:l)) $ do c <- getDirectoryContents p let c' = filter (`notElem` [".", ".."]) c x <- foldM (\l' p' -> getRecursiveContents' l' (p > p')) l c' return (x) else return (p:l) Finally I fixed the original function (this only took about 30 min :-). The handle (5) catches errors caused by unreadable directories getRecursiveContents :: FilePath -> IO [FilePath] getRecursiveContents topdir = E.handle (\_ ->return [topdir]) $ do -- 5 names <- getDirectoryContents topdir let properNames = filter (`notElem` [".", ".."]) names paths <- forM properNames $ \name -> do let path = topdir > name s <- getSymbolicLinkStatus path if isDirectory s then getRecursiveContents path else return [path] return (concat paths) The imports for all functions mentioned above are: import Control.Monad ( forM, filterM, foldM ) import qualified Control.Exception as E import System.Directory (doesDirectoryExist, getDirectoryContents) import System.FilePath ((>)) import System.Posix (getSymbolicLinkStatus, isDirectory) Any suggestions about this solution are welcome. Johann
participants (1)
-
Johann Giwer