Suggested additions to System.FilePath.Posix/Windows

Hello Neil I used System.FilePath.Posix quite extensively recently, and I thank you for the package filepath. There were however two words that I needed which I could not construct from those in System.FilePath.Posix. They are maybe of interest to you and others. I submit these two words to you for consideration for inclusion in System.FilePath.Posix. Please change the names as you see fit. I do not know if they make sense for System.FilePath.Windows. If the do not make sense, then please feel free to drop them so as to preserve the interface. As requested, I Cc'ed the haskell-cafe, but I am not at the moment following these threads, so if anyone else responds, please Cc me if you wish. Thanks again and cheers, - Marcus P.S. Here they are. Although I use ksh(1) as an example, this is a feature of POSIX shells.
-- | 'reduceFilePath' returns a pathname that is reduced to canonical -- form equivalent to that of ksh(1), that is, symbolic link names are -- treated literally when finding the directory name. See @cd -L@ of -- ksh(1). Specifically, extraneous separators @(\"/\")@, dot -- @(\".\")@, and double-dot @(\"..\")@ directories are removed.
reduceFilePath :: FilePath -> FilePath reduceFilePath = joinPath . filePathComponents
This is in turn built on filePathComponents that does all the work.
filePathComponents :: FilePath -> [FilePath] filePathComponents "" = [] filePathComponents (c:cs) = reverse $ snd $ foldl accumulate (if c == pathSeparator then ([],["/"]) else ([c],[])) (cs++[pathSeparator]) where accumulate :: (String,[String]) -> Char -> (String,[String]) accumulate (cs, css) c = if c == pathSeparator then ([],(if null cs then id else cons cs) css) else (cs++[c],css) cons :: String -> [String] -> [String] cons cs css | cs == "." = css | cs /= ".." || null css = cs : css | otherwise = let hd = head css tl = tail css in if hd == [pathSeparator] then css else if hd == ".." then cs : css else if null tl then ["."] else tl
// -- Marcus D. Gabriel, Ph.D. Saint Louis, FRANCE http://www.marcus.gabriel.name mailto:marcus@gabriel.name Tel: +33.3.89.69.05.06 Portable: +33.6.34.56.07.75

Hi Marcus,
Thanks for your suggestions. I'm a Windows user so aren't really
qualified to comment on these suggestions - it depends what Posix
users would like. I suggest you follow the Library Submission Process
- filepath is now a core library, and as such I don't have the
freedom/power to change it as I like, and it's generally something
lots of people should think about.
http://www.haskell.org/haskellwiki/Library_submissions
Thanks
Neil
PS. I'm off for 3 weeks starting very soon, so will be unlikely to
reply to any email thread for a long time :-)
On Thu, Sep 17, 2009 at 10:58 AM, Marcus D. Gabriel
Hello Neil
I used System.FilePath.Posix quite extensively recently, and I thank you for the package filepath. There were however two words that I needed which I could not construct from those in System.FilePath.Posix. They are maybe of interest to you and others.
I submit these two words to you for consideration for inclusion in System.FilePath.Posix. Please change the names as you see fit.
I do not know if they make sense for System.FilePath.Windows. If the do not make sense, then please feel free to drop them so as to preserve the interface.
As requested, I Cc'ed the haskell-cafe, but I am not at the moment following these threads, so if anyone else responds, please Cc me if you wish.
Thanks again and cheers, - Marcus
P.S. Here they are. Although I use ksh(1) as an example, this is a feature of POSIX shells.
-- | 'reduceFilePath' returns a pathname that is reduced to canonical -- form equivalent to that of ksh(1), that is, symbolic link names are -- treated literally when finding the directory name. See @cd -L@ of -- ksh(1). Specifically, extraneous separators @(\"/\")@, dot -- @(\".\")@, and double-dot @(\"..\")@ directories are removed.
reduceFilePath :: FilePath -> FilePath reduceFilePath = joinPath . filePathComponents
This is in turn built on filePathComponents that does all the work.
filePathComponents :: FilePath -> [FilePath] filePathComponents "" = [] filePathComponents (c:cs) = reverse $ snd $ foldl accumulate (if c == pathSeparator then ([],["/"]) else ([c],[])) (cs++[pathSeparator]) where accumulate :: (String,[String]) -> Char -> (String,[String]) accumulate (cs, css) c = if c == pathSeparator then ([],(if null cs then id else cons cs) css) else (cs++[c],css) cons :: String -> [String] -> [String] cons cs css | cs == "." = css | cs /= ".." || null css = cs : css | otherwise = let hd = head css tl = tail css in if hd == [pathSeparator] then css else if hd == ".." then cs : css else if null tl then ["."] else tl
//
-- Marcus D. Gabriel, Ph.D. Saint Louis, FRANCE http://www.marcus.gabriel.name mailto:marcus@gabriel.name Tel: +33.3.89.69.05.06 Portable: +33.6.34.56.07.75

Hello Neil, Thanks for the pointer Neil. I will read the site. Besides, it allows me to submit a fixed version since I just found a bug! Cheers, - Marcus P.S. 452 black box tests of my little command utility, and I still forgot about a corner case. Now there are 454 black box test cases :). Neil Mitchell wrote:
Hi Marcus,
Thanks for your suggestions. I'm a Windows user so aren't really qualified to comment on these suggestions - it depends what Posix users would like. I suggest you follow the Library Submission Process - filepath is now a core library, and as such I don't have the freedom/power to change it as I like, and it's generally something lots of people should think about.
http://www.haskell.org/haskellwiki/Library_submissions
Thanks
Neil
PS. I'm off for 3 weeks starting very soon, so will be unlikely to reply to any email thread for a long time :-)
On Thu, Sep 17, 2009 at 10:58 AM, Marcus D. Gabriel
wrote: Hello Neil
I used System.FilePath.Posix quite extensively recently, and I thank you for the package filepath. There were however two words that I needed which I could not construct from those in System.FilePath.Posix. They are maybe of interest to you and others.
I submit these two words to you for consideration for inclusion in System.FilePath.Posix. Please change the names as you see fit.
I do not know if they make sense for System.FilePath.Windows. If the do not make sense, then please feel free to drop them so as to preserve the interface.
As requested, I Cc'ed the haskell-cafe, but I am not at the moment following these threads, so if anyone else responds, please Cc me if you wish.
Thanks again and cheers, - Marcus
P.S. Here they are. Although I use ksh(1) as an example, this is a feature of POSIX shells.
-- | 'reduceFilePath' returns a pathname that is reduced to canonical -- form equivalent to that of ksh(1), that is, symbolic link names are -- treated literally when finding the directory name. See @cd -L@ of -- ksh(1). Specifically, extraneous separators @(\"/\")@, dot -- @(\".\")@, and double-dot @(\"..\")@ directories are removed.
reduceFilePath :: FilePath -> FilePath reduceFilePath = joinPath . filePathComponents
This is in turn built on filePathComponents that does all the work.
filePathComponents :: FilePath -> [FilePath] filePathComponents "" = [] filePathComponents (c:cs) = reverse $ snd $ foldl accumulate (if c == pathSeparator then ([],["/"]) else ([c],[])) (cs++[pathSeparator]) where accumulate :: (String,[String]) -> Char -> (String,[String]) accumulate (cs, css) c = if c == pathSeparator then ([],(if null cs then id else cons cs) css) else (cs++[c],css) cons :: String -> [String] -> [String] cons cs css | cs == "." = css | cs /= ".." || null css = cs : css | otherwise = let hd = head css tl = tail css in if hd == [pathSeparator] then css else if hd == ".." then cs : css else if null tl then ["."] else tl -- Marcus D. Gabriel, Ph.D. Saint Louis, FRANCE http://www.marcus.gabriel.name mailto:marcus@gabriel.name Tel: +33.3.89.69.05.06 Portable: +33.6.34.56.07.75

On Thu, 2009-09-17 at 11:58 +0200, Marcus D. Gabriel wrote:
-- | 'reduceFilePath' returns a pathname that is reduced to canonical -- form equivalent to that of ksh(1), that is, symbolic link names are -- treated literally when finding the directory name. See @cd -L@ of -- ksh(1). Specifically, extraneous separators @(\"/\")@, dot -- @(\".\")@, and double-dot @(\"..\")@ directories are removed.
reduceFilePath :: FilePath -> FilePath reduceFilePath = joinPath . filePathComponents
So it's like the existing System.Directory.canonicalizePath but it's pure and it does not do anything with symlinks. On the other hand because it's pure it can do something with non-local paths. Is there anything POSIX-specific about this? I don't see it. Duncan

On Sep 19, 2009, at 07:45 , Duncan Coutts wrote:
On Thu, 2009-09-17 at 11:58 +0200, Marcus D. Gabriel wrote:
-- | 'reduceFilePath' returns a pathname that is reduced to canonical -- form equivalent to that of ksh(1), that is, symbolic link names are -- treated literally when finding the directory name. See @cd -L@ of -- ksh(1). Specifically, extraneous separators @(\"/\")@, dot -- @(\".\")@, and double-dot @(\"..\")@ directories are removed.
So it's like the existing System.Directory.canonicalizePath but it's pure and it does not do anything with symlinks. On the other hand because it's pure it can do something with non-local paths.
Is there anything POSIX-specific about this? I don't see it.
It's making assumptions about the safety of eliding "..". (What does \ \machine\share\..\ do?) On the other hand that's also unsafe on POSIX in the presence of symlinks. In general I consider path cleanup not involving validation against the filesystem to be risky. -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH

Brandon S. Allbery KF8NH wrote:
On Sep 19, 2009, at 07:45 , Duncan Coutts wrote:
On Thu, 2009-09-17 at 11:58 +0200, Marcus D. Gabriel wrote:
-- | 'reduceFilePath' returns a pathname that is reduced to canonical -- form equivalent to that of ksh(1), that is, symbolic link names are -- treated literally when finding the directory name. See @cd -L@ of -- ksh(1). Specifically, extraneous separators @(\"/\")@, dot -- @(\".\")@, and double-dot @(\"..\")@ directories are removed.
So it's like the existing System.Directory.canonicalizePath but it's pure and it does not do anything with symlinks. On the other hand because it's pure it can do something with non-local paths.
Is there anything POSIX-specific about this? I don't see it.
It's making assumptions about the safety of eliding "..". (What does \\machine\share\..\ do?) On the other hand that's also unsafe on POSIX in the presence of symlinks. In general I consider path cleanup not involving validation against the filesystem to be risky.
I agree; this came up before during the design of System.FilePath, and it's why the current library doesn't have a way to remove "..". The docs should probably explain this point, because it's non-obvious that you can't just "clean up" a path to remove the ".." and end up with something that means the same thing. Cheers, Simon

Simon Marlow wrote:
Brandon S. Allbery KF8NH wrote:
On Sep 19, 2009, at 07:45 , Duncan Coutts wrote:
On Thu, 2009-09-17 at 11:58 +0200, Marcus D. Gabriel wrote:
-- | 'reduceFilePath' returns a pathname that is reduced to canonical -- form equivalent to that of ksh(1), that is, symbolic link names are -- treated literally when finding the directory name. See @cd -L@ of -- ksh(1). Specifically, extraneous separators @(\"/\")@, dot -- @(\".\")@, and double-dot @(\"..\")@ directories are removed.
So it's like the existing System.Directory.canonicalizePath but it's pure and it does not do anything with symlinks. On the other hand because it's pure it can do something with non-local paths.
Is there anything POSIX-specific about this? I don't see it.
It's making assumptions about the safety of eliding "..". (What does \\machine\share\..\ do?) On the other hand that's also unsafe on POSIX in the presence of symlinks. In general I consider path cleanup not involving validation against the filesystem to be risky.
I agree; this came up before during the design of System.FilePath, and it's why the current library doesn't have a way to remove "..". The docs should probably explain this point, because it's non-obvious that you can't just "clean up" a path to remove the ".." and end up with something that means the same thing.
Cheers, Simon
A few points to explain my point of view. It's a little long. Yes, reduceFilePath is pure and not an IO action, but this was not important for my application. What counted is the preservation of the logical structure of the path which was a design choice and that the paths in question may no longer exist in the file system. Thus canonicalizePath could not help me. These are the most important points. For me, if these two points are not of generally interest, than there should be no System.FilePath.reduceFilePath or equivalent. The essential POSIX standard (IEEE Std 1003.1) can be found at http://www.opengroup.org/onlinepubs/009695399/utilities/cd.html, that is, cd - change the working directory. The key points are in the OPTIONS section and steps 8 and 9 of the DESCRIPTIONS section. I used ksh(1) as my guide, but bash(1) or dash(1) work also. See also http://www.opengroup.org/onlinepubs/009695399/, that is, section 4.11 Pathname Resolution. So, the function reduceFilePath does not make any assumptions about eliding of "..", it simple attempts to implement the behaviour of cd -L of a POSIX shell consistent with Path Resolution of section 4.11 minus the dereferencing of symbolic links. Whether reduceFilePath does this correctly or not is another question. (It does not, sorry about that, but I have an older version that does minus the leading double slash rule.) Although it is true that if you just clean up the path it may no longer resolve to the same object in the file system as would the result of a call to canonicalizePath, Python has a library function whose name I cannot remember in which the documentation just states that this may change the meaning of the path, that is, let the programmer beware. In my case, I verified and resolved the initial inputs from the user so that either an error message occurred or I could use reduceFilePath in confidence during processing. That is to say, the file system validation was done upfront so that I could safely maintain the logical structure which was the design choice and in certain cases continue processing even if the pathnames no longer referred to anything. This means that \\machine\share\..\ is \\machine\ logically. Thus, the application should either not use reduceFilePath or it should set up conditions to avoid or catch this case. The blind or unthinking use of reduceFilePath is not only risky, it's a mistake. Just like unsafePerformIO, let the programmer beware. If reduceFilePath is useless under Windows but at least makes some kind of sense, then for me, a debugged, POSIX compliant System.FilePath.reduceFilePath would have been nice. So I propose it. If it makes absolutely no sense under Windows, then drop it so as to maintain the interface which I used wherever I could and appreciated greatly. Cheers, - Marcus
participants (5)
-
Brandon S. Allbery KF8NH
-
Duncan Coutts
-
Marcus D. Gabriel
-
Neil Mitchell
-
Simon Marlow