Re: [Haskell] System.FilePath survey

4 Feb 2006

      I (just like everyone else) have a path module, which does some things a
little differently.  I'm going to give a minimal working version in this
literate Haskell message, along with commentary, and will package up the
full code if anyone's interested.  I have only used it in one little
program, so I'm don't pretend all the details are right.  However, I
think my approach has promise, and would be grateful for your feedback.

My main premise is that portably manipulating paths is _hard_, because
filesystem semantics are messy and ill-specified and differ across
systems; and so we should 1. be conservative in what we try to provide,
and 2. use whatever techniques we can to avoid errors (by either the
implementor or the user).
...
{-# OPTIONS -fglasgow-exts #-}
(Note that users of this module don't need glasgow-exts.)
...
module FSPath where
import Control.Monad (liftM)
import Data.List
import System.Directory
import System.IO
...
class (Read p, Show p) => Path p where
Portable path operations are class methods; that is, there is no single
path type.  One obvious advantage is that we can use the same
(overloaded) functions on different path types, representing paths on
different systems, in a single program, and never risk mixing them up.
(I consider the ability to manipulate foreign paths a requirement for a
production library.)  A less obvious advantage, which I think may be
bigger in practice, is that we can define a hierarchy of classes, each
element representing different assumptions about paths (which may be
common to different sets of systems).  This helps us cope with the
variety of path semantics in the world.

In Path, I define operations that can be performed without IO, and have
roughly the same, well-understood meaning on most systems on which
Haskell runs:
...
currentPath :: p
A path representing the current working directory, eg. "." on unix.
...
prefixes :: p -> [(p, ChildName)]
The "parents" of this path, up until we hit something that is not a
simple name, eg.

  "foo/../bar/baz" -> [("foo/../bar", "baz", ("foo/..", "bar")]

on unix.  (Actually, these are not necessarily parents, which is why I
didn't name the method that.)
...
addChild :: Monad m => p -> ChildName -> m p
Add a single child, eg. "foo/bar" -> "baz" -> "foo/bar/baz".  I use Monad
for possible failure, say if the child were not a plain name, eg. ".." on
unix.
...
append :: Monad m => p -> p -> m p
The second path relative to the first.  There are some ambiguous cases
on Windows that perhaps should fail.

That's it for pure path ops.  I know that typically path libraries have
more, but most of them seem either to me unnecessary, redundant or of
questionable portability.  (Perhaps those that are less portable could
be added in subclasses.)  I'm sure that other people will differ, so I
would welcome use cases for additional operations you find useful.

ChildName is just
...
type ChildName = String
for now; maybe it should be part of the class?

I define another class containing IO operations.  The module should only
give out instances that represent paths native to the currently running
system.
...
class Path p => NativePath p where
I haven't dared to redesign how IO uses paths, but this module should be
easily adapted to such a redesign.  I suppose eventually all filesystem
IO operations could be methods in this class, and the functions taking
FilePath deprecated or removed.  For now, I just include two methods
that might need special treatment:
...
getChildren :: p -> IO [p]
  canonicalize :: p -> IO p
and for other IO, define a helper:
...
filePathIO :: NativePath p => (FilePath -> IO a) -> (p -> IO a)
filePathIO f p = f (show p)
To get an instance of these classes that's right for the current system,
I use some of the (more elementary) techniques from Ken and Oleg's
"Implicit Configurations".  The user supplies a function that is
polymorphic in the path type, to a function that runs it on the native
type.  Here are two variants (defined later):
...
withNativePath :: FilePath -> (forall p. NativePath p => p -> IO r) -> IO r
withNativeCurrentPath :: (forall p. NativePath p => p -> IO r) -> IO r
The argument to the user's function has the native type, and unification
automatically propagates that type to other values.  This avoids the
need for an explicit dictionary of operations.  Eg., to read two
filenames, append them, and open the result:
...
example1 :: IO Handle
example1 = do path1 <- getLine
              withNativePath path1 (\path1 -> do
                path2 <- readLn
                path  <- append path1 path2
                filePathIO (flip openFile ReadMode) path)
The overloaded function readLn reads a path of the right type.

For the instances of these classes, I define algebraic data types.  The
alternative, retaining the string representation, would have some
advantages:  It would probably perform better, and moreover it would not
require parsing the whole string for some operations, so there is less
chance of the parser stumbling on an unfamiliar construct.  On the other
hand, it is safer and saner to fully parse paths up-front, and other
path libraries (eg. in Boost) seem to do that.  A string representation
encourages thinking about what the path looks like, rather than what it
means.  What clinches the deal is that path manipulation on an algebraic
type is much more comprehensible than on a string.  In libraries (in
various languages) where every operation parses the path string, I have
to stare hard at the code to understand what it does and convince myself
it's right.  So I agree with a common sentiment expressed in this
thread:  That's not the Haskell way!

There is also the question of whether to use a single "generic"
algebraic type, or let each system define its own.  The former might be
considered to simplify the API, because users have only one type to deal
with.  But since I've already chosen the typeclass design, there's no
need to restrain myself to one type.  Even so, I first tried to define a
generic type, hoping it would simplify the implementation.  I found it
did the opposite, because every system used different parts of the
generic type, and for different purposes, so it was hard to be sure I
had all the cases right.  Further, it meant that anyone wanting to
change the generic type for the benefit of one system would have to be
careful not to break all the others.  Finally, at some level, paths on
different systems are fundamentally different, the
bytes-on-unix/unicode-on-windows issue a prime example.  So I defined
one type per system.

Here's the type for unix:
...
data UPath = UPath UStart [UElement]  -- from leaf
           deriving Eq
data UStart = UAbsolute
            | USlashSlash
            | URelative
            deriving Eq
data UElement = UChild ChildName
              | UParent
              deriving Eq
The implementation of class operation is pretty nice on this
representation, although there are still some tricky cases.
...
uCurrentPath :: UPath
uCurrentPath = UPath URelative []
uPrefixes :: UPath -> [(UPath, ChildName)]
uPrefixes (UPath start es) = foldr prefix [] (init (tails es)) where
  prefix (UChild s:es) ps = (UPath start es, s) : ps
  prefix (UParent:_)   _  = []
uAddChild :: Monad m => UPath -> ChildName -> m UPath
uAddChild (UPath start es) s = case reads s of
  [(e@(UChild _), "")] -> return (UPath start (e:es))
  _                    -> fail "not a unix child"
uAppend :: UPath -> UPath -> UPath
uAppend p p'@(UPath UAbsolute   _) = p'
uAppend p p'@(UPath USlashSlash _) = p'
uAppend (UPath UAbsolute []) (UPath URelative es') =
           UPath UAbsolute (dropWhile (UParent ==) es')
uAppend (UPath USlashSlash []) (UPath URelative es') =
           UPath USlashSlash (dropWhile (UParent ==) es')
uAppend (UPath start es) (UPath URelative es') =
           UPath start (es' ++ es)
instance Path UPath where
  currentPath = uCurrentPath
  prefixes = uPrefixes
  addChild = uAddChild
  append p1 p2 = return (uAppend p1 p2)
uGetChildren :: UPath -> IO [UPath]
uGetChildren p = do ss <- getDirectoryContents (show p)
                    return (concatMap (uAddChild p) ss)
uCanonicalize :: UPath -> IO UPath
uCanonicalize p = canonicalizePath (show p) >>= readIO
instance NativePath UPath where
  getChildren = uGetChildren
  canonicalize = uCanonicalize
Note that the unix functions (and perhaps also the constructors) can be
exported, for anyone who wants to operate specifically on unix paths,
and take advantage of their full structure.  The IO operations and
NativePath instance should probably be ifdef'ed out on non-unix.  (I
don't like ifdefs that change the signature of a module, but I think
it's passable here.)

If you want to try this module for now, you have to use unix. :-)
...
withNativePath p f = (readIO p :: IO UPath) >>= f
withNativeCurrentPath f = f (currentPath :: UPath)
Here's the example of rm -rf:
...
rmTree :: FilePath -> IO ()
rmTree s = withNativePath s rm where
  rm p = doesDirectoryExist' p >>= \b -> if b then rmDir p
                                              else rmFile p where
    rmDir p = do getChildren p >>= mapM rm
                 removeDirectory' p
    rmFile p = removeFile' p
    doesDirectoryExist' = filePathIO doesDirectoryExist
    removeFile' = filePathIO removeFile
    removeDirectory' = filePathIO removeDirectory
Lastly, the read and show functions.  (You can stop now.)
...
instance Show UPath where
  show (UPath UAbsolute [])   = "/"
  show (UPath UAbsolute es)   = '/' : join '/' (map show (reverse es))
  show (UPath USlashSlash []) = "//"
  show (UPath USlashSlash es) = "//" ++ join '/' (map show (reverse es))
  show (UPath URelative [])   = "."
  show (UPath URelative es)   = join '/' (map show (reverse es))
instance Show UElement where
  show (UChild s) = s
  show UParent    = ".."
instance Read UPath where
  readsPrec _ "" = fail "empty string is not a unix path"
  readsPrec _ s  =
    let (slashes, s') = span (== '/') s
        es = map fst (concatMap reads (split '/' s'))
        es' = dropWhile (UParent ==) es
    in  case slashes of
          "//"    -> return (UPath USlashSlash (reverse es'), "")
          ('/':_) -> return (UPath UAbsolute (reverse es'), "")
          []      -> return (UPath URelative (reverse es), "")
instance Read UElement where
  readsPrec _ ""                  = fail "empty string is not a unix element"
  readsPrec _ "."                 = fail ". is not a unix element"
  readsPrec _ ".."                = return (UParent, "")
  readsPrec _ s | any ('\0' ==) s = fail "\\NUL is not allowed in unix paths"
                | otherwise       = return (UChild s, "")
split :: Eq a => a -> [a] -> [[a]]
split sep [] = [[]]
split sep (x:xs)
  | x == sep  = [] : split sep xs
  | otherwise = let (r:rs) = split sep xs
                in  (x:r) : rs
join :: a -> [[a]] -> [a]
join sep [x]    = x
join sep (x:xs) = x ++ [sep] ++ join sep xs
Andrew

Andrew Pimlott

tags

participants (1)