
I would say that all paths are relative to something, whether it's the Unix root, or the current directory, or whatever. Therefore I would call this something like PathStart, and add:
| CurrentDirectory | CurrentDirectoryOfWindowsDrive Char | RootOfCurrentWindowsDrive
This is true in a sense, but I think making the distinction explicit is helpful for a number of the operations we want to do. For example, what is the parent of the relative path "."? Answer is "..". What is the parent of "/." on unix? Answer is "/.". I would also argue that it only makes sense to append a relative path on the right (ie, we can't append "/tmp/foo" onto "/usr/local", but we can append "tmp/foo"). Relative paths can refer to different things in the filesystem depending on process-local state, whereas absolute paths will always refer to the same thing (until the filesystem changes, or if you do something esoteric like "chroot"). Relative paths are really "path fragments."
On Unix, there are two nodes we can name directly, the "root" and the "current directory". On Windows, there are 26 roots and 26 current directories which we can name directly; additionally we can name the root or current directory of the current drive, which is one of those 26, and there are an arbitrary number of network share roots, and \\.\, and perhaps some other stuff I don't know about.
There are a few others. I took a look at MSDN earlier and was astounded.
Whether we're talking about the final node or the final edge depends on the OS call; this is the usual pointer-vs-pointee confusion that's also found in most programming languages outside the ML family. Probably we can ignore it, with the exception of the "/foo" vs "/foo/" distinction, which we must preserve.
I've solved that as you suggested where "foo/" goes to "foo/."
class (Show p) => Path p where Okay, I'm not convinced that a Path class is the right approach.
I'm not convinced either, but it feels natural to me.
I'm tentatively opposed to (B), since I think that the only interesting difference between Win32 and Posix paths is in the set of starting points you can name. (The path separator isn't very interesting.) But maybe it does make sense to have separate starting-point ADTs for each operating system. Then of course there's the issue that Win32 edge labels are Unicode, while Posix edge labels are [Word8]. Hmm.
I think these differences make separate implementations worthwhile. The question then is wether to abstract them via a type class, or with a datatype like: data FilePath = POSIXFilePath POSIXPath | WinFilePath WinPath Disadvantage here is that the datatype is closed. Advantage is that pattern matching tells you what kind of path you have staticly.
pathCleanup :: p -> p -- remove .. and suchlike
This can't be done safely except in a few special cases (e.g. "/.." -> "/"). I'm not sure it should be here.
More than you would think, if you follow the conventions of modern unix shells. eg, "foo/.." is always equal to ".", and "foo/bar/../../.." is equal to "..", and "foo///bar" is equal to "foo/bar". This is the behavior that "cd" gives on modern posix shells (rather than doing a chdir on the ".." hardlink, which does strange things in the presence of symlinks). The operation is sufficently useful that I think it should be included. It lets us know, for example, that "/bar/../foo/tmp" and "/foo/tmp" refer to the same file, without resorting to any IO operations.
hasExtension :: p -> String -> Bool
This is really an operation on a single component of the path. I think it would make more sense to make it an ordinary function with type String -> String -> Bool and use the basename method to get the appropriate path component.
The problem is that String doesn't faithfully capture the representation of path edges. For POSIX it is a sequence of Word8 (except for 0x2F). In my implementation of UnixPaths, each path carries along an encoding component, which (theoreticly) tells you how to do [Word8] <-> [Char] translations. Eventually we will get a real IO layer complete with character encodings and this will be meaningful. The comparison needs to be done with encodings in mind.
pathToForeign :: p -> IO (Ptr CChar) pathFromForeign :: Ptr CChar -> IO p
This interface is problematic. Is the pointer returned by pathToForeign a heap pointer which the caller is supposed to free? If so, a Ptr CChar instance would have to copy the pathname every time. And I don't understand exactly what pathFromForeign is supposed to do.
Agree, I like the withCPath interface better. pathFromForeign takes a path representation directly from C land, without going through String first (again with encoding issues in mind). Although it should perhaps be: pathFromForeign :: Ptr () -> IO p instead (might be wide chars).