
After the discussion about file paths over the last several days I went home and put together a quick trial implementation for unix file
robert dockins wrote: paths, with the idea of adding windows, SMB and maybe VMS (why not?) paths. This is great. Comments below.
data PathRoot = UnixFilesystemRoot | WindowsDrive Char | SMBShare String String | VMSDevice String | ... -- whatever else we need
I would say that all paths are relative to something, whether it's the Unix root, or the current directory, or whatever. Therefore I would call this something like PathStart, and add: | CurrentDirectory | CurrentDirectoryOfWindowsDrive Char | RootOfCurrentWindowsDrive What is a pathname, broadly speaking? Answer: it's a description of a path in a directed graph with labeled edges. It consists of a single node designator (the starting point) and a sequence of edge designators, i.e. data Pathname = Pathname { pathStart :: PathStart, pathEdges :: [String] } Most of the time all we care about is either the final node or the final edge that we reach by following the path. The only reason we specify the rest of the path is that there are only a few nodes that we can name directly; to refer to any other location on the graph we have to give "driving directions" from one of those nodes. There's no reason the OS couldn't make nodes and edges first-class entities--it would solve a multitude of problems--but most don't, so forget that. On Unix, there are two nodes we can name directly, the "root" and the "current directory". On Windows, there are 26 roots and 26 current directories which we can name directly; additionally we can name the root or current directory of the current drive, which is one of those 26, and there are an arbitrary number of network share roots, and \\.\, and perhaps some other stuff I don't know about. Symbolic links complicate things a bit, since they are followed like edges but are actually paths (so they may be affected by seemingly unrelated changes to the graph). They're rather like VPNs, actually, though I'm not sure how far I want to push that analogy. Whether we're talking about the final node or the final edge depends on the OS call; this is the usual pointer-vs-pointee confusion that's also found in most programming languages outside the ML family. Probably we can ignore it, with the exception of the "/foo" vs "/foo/" distinction, which we must preserve. This can probably be handled by parsing the latter as Pathname { pathStart = UnixFilesystemRoot, pathEdges = ["foo","."] }.
class (Show p) => Path p where
Okay, I'm not convinced that a Path class is the right approach. For the reasons given above, I think I'd rather have a single Path datatype, probably with its data constructors exported. What do we gain with the class approach? Well... (A) Functions that accept paths can be polymorphic on the path type (where String is a path type). (B) We can have different datatypes for the paths of different operating systems. It seems like these are two very different problems which should be solved with different typeclasses, if they're to be solved with typeclasses at all. I think (A) can be solved very simply, and independently of the specification of a path-handling library: class IsPath a where withCPath :: a -> (Ptr CChar -> IO b) -> IO b instance IsPath String where withCPath = withCString -- tricky i18n issues! instance IsPath [CChar] where withCPath = withArray0 0 instance IsPath PathADT where withCPath = withCString . pathToString instance IsPath (Ptr CChar) where withCPath = flip ($) openFile :: (IsPath p) => p -> ... I'm tentatively opposed to (B), since I think that the only interesting difference between Win32 and Posix paths is in the set of starting points you can name. (The path separator isn't very interesting.) But maybe it does make sense to have separate starting-point ADTs for each operating system. Then of course there's the issue that Win32 edge labels are Unicode, while Posix edge labels are [Word8]. Hmm.
isAbsolute :: p -> Bool
Definition: a path is absolute if its meaning is independent of (Posix: the current directory) (Win32: all current directories and the current drive).
pathCleanup :: p -> p -- remove .. and suchlike
This can't be done safely except in a few special cases (e.g. "/.." -> "/"). I'm not sure it should be here.
hasExtension :: p -> String -> Bool
This is really an operation on a single component of the path. I think it would make more sense to make it an ordinary function with type String -> String -> Bool and use the basename method to get the appropriate path component.
pathToForeign :: p -> IO (Ptr CChar) pathFromForeign :: Ptr CChar -> IO p
This interface is problematic. Is the pointer returned by pathToForeign a heap pointer which the caller is supposed to free? If so, a Ptr CChar instance would have to copy the pathname every time. And I don't understand exactly what pathFromForeign is supposed to do. -- Ben