Re: [Haskell-cafe] File path programme

27 Jan 2005

      ...
I would say that all paths are relative to something, whether it's the 
Unix root, or the current directory, or whatever. Therefore I would call 
this something like PathStart, and add:
| CurrentDirectory
    | CurrentDirectoryOfWindowsDrive Char
    | RootOfCurrentWindowsDrive
This is true in a sense, but I think making the distinction explicit is
helpful for a number of the operations we want to do.  For example, what
is the parent of the relative path "."?  Answer is "..".  What is the
parent of "/." on unix?  Answer is "/.".  I would also argue that it
only makes sense to append a relative path on the right (ie, we can't
append "/tmp/foo" onto "/usr/local", but we can append "tmp/foo").
Relative paths can refer to different things in the filesystem depending
on process-local state, whereas absolute paths will always refer to the
same thing (until the filesystem changes, or if you do something
esoteric like "chroot").  Relative paths are really "path fragments."
...
On Unix, there are two nodes we can name directly, the "root" and the 
"current directory". On Windows, there are 26 roots and 26 current 
directories which we can name directly; additionally we can name the 
root or current directory of the current drive, which is one of those 
26, and there are an arbitrary number of network share roots, and \\.\, 
and perhaps some other stuff I don't know about.
There are a few others.  I took a look at MSDN earlier and was
astounded.
...
Whether we're talking about the final node or the final edge depends on 
the OS call; this is the usual pointer-vs-pointee confusion that's also 
found in most programming languages outside the ML family. Probably we 
can ignore it, with the exception of the "/foo" vs "/foo/" distinction, 
which we must preserve.
I've solved that as you suggested where "foo/" goes to "foo/."
...
...
class (Show p) => Path p where
Okay, I'm not convinced that a Path class is the right approach.
I'm not convinced either, but it feels natural to me.
...
I'm tentatively opposed to (B), since I think that the only interesting 
difference between Win32 and Posix paths is in the set of starting 
points you can name. (The path separator isn't very interesting.) But 
maybe it does make sense to have separate starting-point ADTs for each 
operating system. Then of course there's the issue that Win32 edge 
labels are Unicode, while Posix edge labels are [Word8]. Hmm.
I think these differences make separate implementations worthwhile.  The
question then is wether to abstract them via a type class, or with a
datatype like:

data FilePath
   = POSIXFilePath POSIXPath
   | WinFilePath   WinPath

Disadvantage here is that the datatype is closed.  Advantage is that
pattern matching tells you what kind of path you have staticly.
...
...
pathCleanup :: p -> p           -- remove .. and suchlike
This can't be done safely except in a few special cases (e.g. "/.." -> 
"/"). I'm not sure it should be here.
More than you would think, if you follow the conventions of modern unix
shells.  eg, "foo/.." is always equal to ".", and "foo/bar/../../.." is
equal to "..", and "foo///bar" is equal to "foo/bar".  This is the
behavior that "cd" gives on modern posix shells (rather than doing a
chdir on the ".." hardlink, which does strange things in the presence of
symlinks).  The operation is sufficently useful that I think it should
be included.  It lets us know, for example, that "/bar/../foo/tmp" and
"/foo/tmp" refer to the same file, without resorting to any IO
operations.
...
...
hasExtension :: p -> String -> Bool
...
This is really an operation on a single component of the path. I think 
it would make more sense to make it an ordinary function with type 
String -> String -> Bool and use the basename method to get the 
appropriate path component.
The problem is that String doesn't faithfully capture the representation
of path edges.  For POSIX it is a sequence of Word8 (except for 0x2F).
In my implementation of UnixPaths, each path carries along an encoding
component, which (theoreticly) tells you how to do [Word8] <-> [Char]
translations.  Eventually we will get a real IO layer complete with
character encodings and this will be meaningful.  The comparison needs
to be done with encodings in mind.
...
...
pathToForeign :: p -> IO (Ptr CChar)
pathFromForeign :: Ptr CChar -> IO p
This interface is problematic. Is the pointer returned by pathToForeign 
a heap pointer which the caller is supposed to free? If so, a Ptr CChar 
instance would have to copy the pathname every time. And I don't 
understand exactly what pathFromForeign is supposed to do.
Agree, I like the withCPath interface better.  pathFromForeign takes a
path representation directly from C land, without going through String
first (again with encoding issues in mind).  Although it should perhaps
be:

pathFromForeign :: Ptr () -> IO p

instead (might be wide chars).

Re: [Haskell-cafe] File path programme

Robert Dockins