Re: [Haskell] System.FilePath survey

I (just like everyone else) have a path module, which does some things a little differently. I'm going to give a minimal working version in this literate Haskell message, along with commentary, and will package up the full code if anyone's interested. I have only used it in one little program, so I'm don't pretend all the details are right. However, I think my approach has promise, and would be grateful for your feedback. My main premise is that portably manipulating paths is _hard_, because filesystem semantics are messy and ill-specified and differ across systems; and so we should 1. be conservative in what we try to provide, and 2. use whatever techniques we can to avoid errors (by either the implementor or the user).
{-# OPTIONS -fglasgow-exts #-}
(Note that users of this module don't need glasgow-exts.)
module FSPath where import Control.Monad (liftM) import Data.List import System.Directory import System.IO
class (Read p, Show p) => Path p where
Portable path operations are class methods; that is, there is no single path type. One obvious advantage is that we can use the same (overloaded) functions on different path types, representing paths on different systems, in a single program, and never risk mixing them up. (I consider the ability to manipulate foreign paths a requirement for a production library.) A less obvious advantage, which I think may be bigger in practice, is that we can define a hierarchy of classes, each element representing different assumptions about paths (which may be common to different sets of systems). This helps us cope with the variety of path semantics in the world. In Path, I define operations that can be performed without IO, and have roughly the same, well-understood meaning on most systems on which Haskell runs:
currentPath :: p
A path representing the current working directory, eg. "." on unix.
prefixes :: p -> [(p, ChildName)]
The "parents" of this path, up until we hit something that is not a simple name, eg. "foo/../bar/baz" -> [("foo/../bar", "baz", ("foo/..", "bar")] on unix. (Actually, these are not necessarily parents, which is why I didn't name the method that.)
addChild :: Monad m => p -> ChildName -> m p
Add a single child, eg. "foo/bar" -> "baz" -> "foo/bar/baz". I use Monad for possible failure, say if the child were not a plain name, eg. ".." on unix.
append :: Monad m => p -> p -> m p
The second path relative to the first. There are some ambiguous cases on Windows that perhaps should fail. That's it for pure path ops. I know that typically path libraries have more, but most of them seem either to me unnecessary, redundant or of questionable portability. (Perhaps those that are less portable could be added in subclasses.) I'm sure that other people will differ, so I would welcome use cases for additional operations you find useful. ChildName is just
type ChildName = String
for now; maybe it should be part of the class? I define another class containing IO operations. The module should only give out instances that represent paths native to the currently running system.
class Path p => NativePath p where
I haven't dared to redesign how IO uses paths, but this module should be easily adapted to such a redesign. I suppose eventually all filesystem IO operations could be methods in this class, and the functions taking FilePath deprecated or removed. For now, I just include two methods that might need special treatment:
getChildren :: p -> IO [p] canonicalize :: p -> IO p
and for other IO, define a helper:
filePathIO :: NativePath p => (FilePath -> IO a) -> (p -> IO a) filePathIO f p = f (show p)
To get an instance of these classes that's right for the current system, I use some of the (more elementary) techniques from Ken and Oleg's "Implicit Configurations". The user supplies a function that is polymorphic in the path type, to a function that runs it on the native type. Here are two variants (defined later):
withNativePath :: FilePath -> (forall p. NativePath p => p -> IO r) -> IO r withNativeCurrentPath :: (forall p. NativePath p => p -> IO r) -> IO r
The argument to the user's function has the native type, and unification automatically propagates that type to other values. This avoids the need for an explicit dictionary of operations. Eg., to read two filenames, append them, and open the result:
example1 :: IO Handle example1 = do path1 <- getLine withNativePath path1 (\path1 -> do path2 <- readLn path <- append path1 path2 filePathIO (flip openFile ReadMode) path)
The overloaded function readLn reads a path of the right type. For the instances of these classes, I define algebraic data types. The alternative, retaining the string representation, would have some advantages: It would probably perform better, and moreover it would not require parsing the whole string for some operations, so there is less chance of the parser stumbling on an unfamiliar construct. On the other hand, it is safer and saner to fully parse paths up-front, and other path libraries (eg. in Boost) seem to do that. A string representation encourages thinking about what the path looks like, rather than what it means. What clinches the deal is that path manipulation on an algebraic type is much more comprehensible than on a string. In libraries (in various languages) where every operation parses the path string, I have to stare hard at the code to understand what it does and convince myself it's right. So I agree with a common sentiment expressed in this thread: That's not the Haskell way! There is also the question of whether to use a single "generic" algebraic type, or let each system define its own. The former might be considered to simplify the API, because users have only one type to deal with. But since I've already chosen the typeclass design, there's no need to restrain myself to one type. Even so, I first tried to define a generic type, hoping it would simplify the implementation. I found it did the opposite, because every system used different parts of the generic type, and for different purposes, so it was hard to be sure I had all the cases right. Further, it meant that anyone wanting to change the generic type for the benefit of one system would have to be careful not to break all the others. Finally, at some level, paths on different systems are fundamentally different, the bytes-on-unix/unicode-on-windows issue a prime example. So I defined one type per system. Here's the type for unix:
data UPath = UPath UStart [UElement] -- from leaf deriving Eq
data UStart = UAbsolute | USlashSlash | URelative deriving Eq
data UElement = UChild ChildName | UParent deriving Eq
The implementation of class operation is pretty nice on this representation, although there are still some tricky cases.
uCurrentPath :: UPath uCurrentPath = UPath URelative []
uPrefixes :: UPath -> [(UPath, ChildName)] uPrefixes (UPath start es) = foldr prefix [] (init (tails es)) where prefix (UChild s:es) ps = (UPath start es, s) : ps prefix (UParent:_) _ = []
uAddChild :: Monad m => UPath -> ChildName -> m UPath uAddChild (UPath start es) s = case reads s of [(e@(UChild _), "")] -> return (UPath start (e:es)) _ -> fail "not a unix child"
uAppend :: UPath -> UPath -> UPath uAppend p p'@(UPath UAbsolute _) = p' uAppend p p'@(UPath USlashSlash _) = p' uAppend (UPath UAbsolute []) (UPath URelative es') = UPath UAbsolute (dropWhile (UParent ==) es') uAppend (UPath USlashSlash []) (UPath URelative es') = UPath USlashSlash (dropWhile (UParent ==) es') uAppend (UPath start es) (UPath URelative es') = UPath start (es' ++ es)
instance Path UPath where currentPath = uCurrentPath prefixes = uPrefixes addChild = uAddChild append p1 p2 = return (uAppend p1 p2)
uGetChildren :: UPath -> IO [UPath] uGetChildren p = do ss <- getDirectoryContents (show p) return (concatMap (uAddChild p) ss)
uCanonicalize :: UPath -> IO UPath uCanonicalize p = canonicalizePath (show p) >>= readIO
instance NativePath UPath where getChildren = uGetChildren canonicalize = uCanonicalize
Note that the unix functions (and perhaps also the constructors) can be exported, for anyone who wants to operate specifically on unix paths, and take advantage of their full structure. The IO operations and NativePath instance should probably be ifdef'ed out on non-unix. (I don't like ifdefs that change the signature of a module, but I think it's passable here.) If you want to try this module for now, you have to use unix. :-)
withNativePath p f = (readIO p :: IO UPath) >>= f withNativeCurrentPath f = f (currentPath :: UPath)
Here's the example of rm -rf:
rmTree :: FilePath -> IO () rmTree s = withNativePath s rm where rm p = doesDirectoryExist' p >>= \b -> if b then rmDir p else rmFile p where rmDir p = do getChildren p >>= mapM rm removeDirectory' p rmFile p = removeFile' p doesDirectoryExist' = filePathIO doesDirectoryExist removeFile' = filePathIO removeFile removeDirectory' = filePathIO removeDirectory
Lastly, the read and show functions. (You can stop now.)
instance Show UPath where show (UPath UAbsolute []) = "/" show (UPath UAbsolute es) = '/' : join '/' (map show (reverse es)) show (UPath USlashSlash []) = "//" show (UPath USlashSlash es) = "//" ++ join '/' (map show (reverse es)) show (UPath URelative []) = "." show (UPath URelative es) = join '/' (map show (reverse es))
instance Show UElement where show (UChild s) = s show UParent = ".."
instance Read UPath where readsPrec _ "" = fail "empty string is not a unix path" readsPrec _ s = let (slashes, s') = span (== '/') s es = map fst (concatMap reads (split '/' s')) es' = dropWhile (UParent ==) es in case slashes of "//" -> return (UPath USlashSlash (reverse es'), "") ('/':_) -> return (UPath UAbsolute (reverse es'), "") [] -> return (UPath URelative (reverse es), "")
instance Read UElement where readsPrec _ "" = fail "empty string is not a unix element" readsPrec _ "." = fail ". is not a unix element" readsPrec _ ".." = return (UParent, "") readsPrec _ s | any ('\0' ==) s = fail "\\NUL is not allowed in unix paths" | otherwise = return (UChild s, "")
split :: Eq a => a -> [a] -> [[a]] split sep [] = [[]] split sep (x:xs) | x == sep = [] : split sep xs | otherwise = let (r:rs) = split sep xs in (x:r) : rs
join :: a -> [[a]] -> [a] join sep [x] = x join sep (x:xs) = x ++ [sep] ++ join sep xs
Andrew
participants (1)
-
Andrew Pimlott