Proposal: Add System.FilePath to base

Hi, PROPOSAL I guess most people saw this coming, but here is the proposal to add System.FilePath to base. The System.FilePath module is located at http://www-users.cs.york.ac.uk/~ndm/projects/libraries.php#filepath In particular I propose the addition of System.FilePath (a cross-platform FilePath manipulation library), System.FilePath.Windows (Windows paths on all OS's) and System.FilePath.Posix (Posix paths on all OS's). The Version_* stuff is not being proposed for addition. Following the library submission guidelines: * Currency, I do not include a darcs patch, that can be done later - this is a stand alone module so currency is not a necessary criteria. The current module does not use CPP in a particularly nice way (so as to leave the primary module selecting its OS at runtime). The code that would actually be committed will be functionally identical, but slightly faster. If people agree to everything else I'm happy to wave a patch past people before the final committing. * Portability, Windows+Linux, GHC+Hugs+Yhc * Style, not applicable to adding a new module. The module is clean and well implemented, by most style guidelines. * Documentation, detailed - see the haddock link above * Tests, very thorough, over 200 tests/properties DISCUSSION Since the people who hang out on the libraries list may not include all the people who have previously commented on the FilePath module in previous versions, I have CC'd some of these people (particularly those with a strong opinion), so as not to exclude anyone from the discussion. If these people want to continue to discuss System.FilePath they should probably join the libraries list for the time being. Since this module has had extensive comments on it previously, I have attempted to summarise some of the reasons that people have given against the inclusion of these modules, along with the basic counter arguments. This is just an attempt to give some background to people new to the discussion - feel free to expand on any point, or raise new points etc - discussion welcome. 1 type FilePath = String is a bad design choice, which should not be encouraged * This is the way it is, Haskell 98 says so * This library allows better structuring of FilePath's, so makes the interface more abstract * This library makes a good attempt to be correct in it's transformations * We now have explicit properties for FilePath interaction - this isn't string hackery * Paves the way for an ADT, in the future (if desired) * Reverse compatible 2 The base libraries are not the place for a FilePath library * Setup.hs scripts will need it, and they only depend on base and Cabal * Could remove the version from Cabal * Encourages programmers to use it * FilePath functions are (very) easy to write wrongly * FilePath handling should have been present from the begining * It's already there internally, see System.Directory.Internals 3 The base libraries are being split up * This is something happening in the future * Put FilePath in, where it currently belongs, move it later * The FilePath library belongs with type FilePath, which belongs in base * If we ever move to an ADT, this library will have to be in base 4 The interface in this library is poorly designed * Repeated iterations in public have lead to a reasonable concensus * Clear and concrete properties given for all operations * If particular problems still remain there is some flexibility 5 Why not type FilePath = ByteString * If you argue for this, and for 1, you're not internally consistent! * It's not reverse compatible * This should be tackled as a whole thing, i.e. String defaulting * Not lazy I'll open a trac bug shortly. I suspect that there will be plenty of discussion, so I'd rather not set any time limit until everyone has had a chance to express their full opinion. However if we could try and wrap this up before Christmas that would be nice :) Thanks Neil

Neil Mitchell wrote:
I guess most people saw this coming, but here is the proposal to add System.FilePath to base. The System.FilePath module is located at http://www-users.cs.york.ac.uk/~ndm/projects/libraries.php#filepath
Some comments about the documentation:
searchPathSeparator :: Char A list of possible file separators, between the $PATH variable
Should be: "The character that is used to separate the entries in the $PATH environment variable."
addExtension :: FilePath -> String -> FilePath Add an extension, even if there is already one there. E.g. addExtension "foo.txt" "bat" -> "foo.txt.bat". addExtension "file.txt" "bib" == "file.txt.bib" addExtension "file." ".bib" == "file..bib" addExtension "file" ".bib" == "file.bib" Windows: addExtension "\\\\share" ".txt" == "\\\\share\\.txt"
I don't understand the last example. Is that because "\\\\xyz" where "xyz" contains no pathSeparator can never be a file? What, then, is the result of (addExtension "/" "x")? What, in general, is the behavior in case the first argument ends with a path separator? Speaking generally, for many functions I miss a precise description of the semantics that includes all the corner cases. This would (IMHO) be even more useful than the examples. It would also be nice to have principles explained somewhere, e.g. "In order to be easily reversible, splitting never discards separator characters. Instead these remain with either the first element of the result pair (for directory-filename splits) or the second element (for filename-extension splits)". I'm not sure if this principle is really generally followed, but if it is, then this and similar explanations would be very useful. It would also help to have guiding principles for the usual corner cases, like (see above) what happens with adjacent path and extension separators. Ben

Benjamin Franksen wrote:
What, then, is the result of (addExtension "/" "x")? What, in general, is the behavior in case the first argument ends with a path separator?
Another example where the semantics is unclear (to me) is
takeBaseName :: FilePath -> String Source Get the base name, without an extension or path. takeBaseName "file/test.txt" == "test" takeBaseName "dave.ext" == "dave"
What is (takeBaseName "dir.ext/")? "dir"? Or ""? A general principle that would clarify this could be "An extension of a file path is always a suffix of the file path. It never contains a path (directory) separator." Ben

Hi
searchPathSeparator :: Char A list of possible file separators, between the $PATH variable
Should be: "The character that is used to separate the entries in the $PATH environment variable."
Indeed, it used to be something else, I'll fix that.
addExtension :: FilePath -> String -> FilePath Windows: addExtension "\\\\share" ".txt" == "\\\\share\\.txt"
I don't understand the last example. Is that because "\\\\xyz" where "xyz" contains no pathSeparator can never be a file?
No, on Windows \\drive (\\\\ because its escaped Haskell) is a server share name, not a file name. This library understands that you can't add an extension to a drive, since that doesn't make sense.
What, then, is the result of (addExtension "/" "x")? What, in general, is the behavior in case the first argument ends with a path separator?
My guess is "/.x" - although I'm not at a computer with Haskell at the moment. It's a valid question so I'll update the documentation to include this corner case as a test.
Speaking generally, for many functions I miss a precise description of the semantics that includes all the corner cases.
The idea was to provide enough properties for corner cases that you can figure it out, precisely, from just the properties. If there are any other cases you feel unclear on I'll definately add them as tests.
This would (IMHO) be even more useful than the examples. It would also be nice to have principles explained somewhere, e.g. "In order to be easily reversible, splitting never discards separator characters. Instead these remain with either the first element of the result pair (for directory-filename splits) or the second element (for filename-extension splits)". I'm not sure if this principle is really generally followed, but if it is, then this and similar explanations would be very useful. It would also help to have guiding principles for the usual corner cases, like (see above) what happens with adjacent path and extension separators.
Words are wolly, properties are quickchecked and precise, but I can add some overriding principles and design notes to clarify things in peoples minds.
takeBaseName :: FilePath -> String Source Get the base name, without an extension or path. takeBaseName "file/test.txt" == "test" takeBaseName "dave.ext" == "dave" | What is (takeBaseName "dir.ext/")? "dir"? Or ""? A general principle that | would clarify this could be "An extension of a file path is always a suffix | of the file path. It never contains a path (directory) separator."
That can indeed be expressed, I'll add some properties to cover this case and others later. I will update the documentation later today or tomorrow, and let you know when I have done so. Thanks Neil

Hi
What, then, is the result of (addExtension "/" "x")? What, in general, is the behavior in case the first argument ends with a path separator?
Properties added: addExtension "/" "x" == "/.x" takeBaseName (addExtension (asDirectory x) "ext") == ".ext"
| What is (takeBaseName "dir.ext/")? "dir"? Or ""? A general principle that | would clarify this could be "An extension of a file path is always a suffix | of the file path. It never contains a path (directory) separator."
takeBaseName "" == "" takeBaseName "test" == "test" takeBaseName (asDirectory x) == "" Hopefully this makes things a bit clearer. Thanks for the comments Neil

Neil Mitchell wrote:
I guess most people saw this coming, but here is the proposal to add System.FilePath to base. The System.FilePath module is located at http://www-users.cs.york.ac.uk/~ndm/projects/libraries.php#filepath
I don't object to putting this in the base package. There's one outstanding issue I still have with the design: I suggest removing isDirectory :: FilePath -> Bool isFile :: FilePath -> Bool asDirectory :: FilePath -> FilePath asFile :: FilePath -> FilePath At the least, they are confusingly named, because forall x. isDirectory (takeDirectory x) == False (and I definitely don't suggest fixing this by making takeDirectory append a path separator). The point is, the absence of a path separator at the end of a FilePath does not imply that the FilePath refers to a file rather than a directory. If we must have these, then I suggest renaming them: hasTrailingPathSeparator :: FilePath -> Bool addTrailingPathSeparator :: FilePath -> FilePath dropTrailingPathSeparator :: FilePath -> FilePath Also, what's going on with System.FilePath.Windows and System.FilePath.Posix? Their documentation is empty. CPP shenanigans? Cheers, Simon

Hi
isDirectory :: FilePath -> Bool isFile :: FilePath -> Bool asDirectory :: FilePath -> FilePath asFile :: FilePath -> FilePath
At the least, they are confusingly named, because
forall x. isDirectory (takeDirectory x) == False
(and I definitely don't suggest fixing this by making takeDirectory append a path separator).
The point is, the absence of a path separator at the end of a FilePath does not imply that the FilePath refers to a file rather than a directory.
If we must have these, then I suggest renaming them:
hasTrailingPathSeparator :: FilePath -> Bool addTrailingPathSeparator :: FilePath -> FilePath dropTrailingPathSeparator :: FilePath -> FilePath
Ok, that sounds entirely reasonable. I actually prefer the long names - it encourages people not to use them! I don't suspect these will be widely used, however some people found these were the single place where they needed to get into a FilePath and do string manipulation - something I'd really like to discourage.
Also, what's going on with System.FilePath.Windows and System.FilePath.Posix? Their documentation is empty. CPP shenanigans?
The documentation for System.FilePath.Windows and Posix are both identical to System.FilePath - they export exactly the same API and have exactly the same semantics, just tied to either Windows: or Posix: properties given in the main one. I'll update the text in these to modules to make this more clear. Thanks Neil

Neil Mitchell wrote:
The documentation for System.FilePath.Windows and Posix are both identical to System.FilePath - they export exactly the same API and have exactly the same semantics, just tied to either Windows: or Posix: properties given in the main one. I'll update the text in these to modules to make this more clear.
Yes, I understand what these modules do, I was just surprised that the documentation for both modules was empty. I expected to see the same interface as System.FilePath. Why the #ifdef __HADDOCK__ in System/FilePath/Posix.hs? Also, rather than having 3 copies of System.FilePath, we should have at most 2 (this is just an implementation issue, of course). Cheers, Simon

Hi
Yes, I understand what these modules do, I was just surprised that the documentation for both modules was empty. I expected to see the same interface as System.FilePath. Why the #ifdef __HADDOCK__ in System/FilePath/Posix.hs?
Rather than listing the similarities I found it easier to explain the differences, and where you should be using them. I can certainly revert to having the full interface shown.
Also, rather than having 3 copies of System.FilePath, we should have at most 2 (this is just an implementation issue, of course).
Agreed. For my standalone FilePath implementation I do have 3 genuinely different implementations - FilePath selects on the OS at runtime, Windows is one way, Posix is the other. This is done so that FilePath doesn't require cpp, so that people developing against the FilePath module in hugs (i.e. me!) don't have to preprocess it. That also means it has to use nasty CPP tricks, i.e. macro substitution, instead of just conditional compilation. Before actually submitting the code to base, I will move to a more standard base way of doing things, relying on CPP for all modules, and only including 2 actual modules. It's not too hard to do that, but it's probably easier to do it once at the end, hence I am waiting til a result is reached. Thanks Neil

Hello Neil, Friday, November 24, 2006, 2:51:25 PM, you wrote:
Before actually submitting the code to base, I will move to a more standard base way of doing things, relying on CPP for all modules, and only including 2 actual modules.
i think that the best way is to module FilePath import FilePath.Windows as Windows import FilePath.Unix as Unix isProperPath = if weOnWindows then Windows.isProperPath else Unix.isProperPath then GHC can optimize it, throwing away unused branch at compile-time. and yhc will work just fine with its portable bytecode -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Hi Bulat,
i think that the best way is to
module FilePath
import FilePath.Windows as Windows import FilePath.Unix as Unix
isProperPath = if weOnWindows then Windows.isProperPath else Unix.isProperPath
That results in an awful lot of boilerplate code (about 3 lines per function), plus reduces the amount of sharing between Windows and Unix - they are almost identical. If you look at the current method, FilePath has all the code, and Windows and Posix use clever CPP hacks to get it all working. Although these hacks are clever, I don't want them to go in base, its not good style.
then GHC can optimize it, throwing away unused branch at compile-time. and yhc will work just fine with its portable bytecode
As it stands currently, GHC can optimise and yhc can do it portably. Hugs can even use it without CPP - everyone should be happy ;) Thanks Neil

Hi
hasTrailingPathSeparator :: FilePath -> Bool addTrailingPathSeparator :: FilePath -> FilePath dropTrailingPathSeparator :: FilePath -> FilePath
Done, I have put up new haddock documentation, for the newest iteration: http://www-users.cs.york.ac.uk/~ndm/projects/filepath/System-FilePath-Versio...
Also, what's going on with System.FilePath.Windows and System.FilePath.Posix? Their documentation is empty. CPP shenanigans?
I have made a note that the interface is identical to System.FilePath. The documentation is intentionally blank, so people don't have to search for differences to System.FilePath, to find out there aren't any. I'm not fussed as to whether the normal type signature documentation goes in to these modules or not, although I think leaving it out makes the meaning clearer. Thanks Neil

Hello Neil, Wednesday, November 22, 2006, 3:25:54 PM, you wrote:
* Portability, Windows+Linux, GHC+Hugs+Yhc
may be i'm wrong but adding NHC compatibility seems like an important issue. reasons: base lib and cabal now supports nhc and anything proposed to be used here need to follow the same requirements
3 The base libraries are being split up * This is something happening in the future * Put FilePath in, where it currently belongs, move it later * The FilePath library belongs with type FilePath, which belongs in base * If we ever move to an ADT, this library will have to be in base
if you spend your time working on it, it's better to work in right direction instead of playing ping-pong - now you are moving this module from FilePath library to the Base and in future someone will make just the opposite operation :) -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Hi
* Portability, Windows+Linux, GHC+Hugs+Yhc
may be i'm wrong but adding NHC compatibility seems like an important issue. reasons: base lib and cabal now supports nhc and anything proposed to be used here need to follow the same requirements
I'd be deeply shocked if it wasn't NHC, Helium and Jhc compatible as well! Provided your Haskell compiler supports a really small section of the report, and your libraries have some really easy pure Haskell functions, it should work. For Portability read "anywhere, any compiler". NHC does not work on Windows, so I have not had a chance to try it.
if you spend your time working on it, it's better to work in right direction instead of playing ping-pong - now you are moving this module from FilePath library to the Base and in future someone will make just the opposite operation :)
Whatever happens in the future, I suspect FilePath won't become a single standalone library after the base library split happens. Given that no one knows what will happen, or when, its best that when the base library splits the task is "split up the base library", not "integrate new modules into a new system which includes the base library and other stuff". Thanks Neil

Hello Neil, Thursday, November 23, 2006, 4:26:50 PM, you wrote:
* Portability, Windows+Linux, GHC+Hugs+Yhc
functions, it should work. For Portability read "anywhere, any compiler". NHC does not work on Windows, so I have not had a chance to try it.
ok, you may write "Tested with ... Should work on any H98-compatible compiler"
if you spend your time working on it, it's better to work in right direction instead of playing ping-pong - now you are moving this module from FilePath library to the Base and in future someone will make just the opposite operation :)
Whatever happens in the future, I suspect FilePath won't become a single standalone library after the base library split happens.
why not start with the library containing single module? in the future other modules will be added here as well. following this path, you will not play in move-it move-it game nor make things worser than in current state so, i propose to include your lib in the core (or base) library set, bundle it with ghc and hugs, and host it on darcs.haskell.org like other ghc-bundled libs
Given that no one knows what will happen, or when, its best that when the base library splits the task is "split up the base library", not "integrate new modules into a new system which includes the base library and other stuff".
we want to split it just because it was inflated so much. so please don't add even more to this problem. i can only repeat that the problem lies in the GHC HQ decision which makes impossible to propose adding FilePath lib to the set of core libs so you are forced to propose adding it to the base library -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Hi
so, i propose to include your lib in the core (or base) library set, bundle it with ghc and hugs, and host it on darcs.haskell.org like other ghc-bundled libs
Ok, perhaps I can rephrase my position (in light of your currently ongoing discussion): "I propose that this module is added to the set of modules which is guaranteed to be available in every Haskell implementation, and can be used by programs with a simple import statement." Currently that means "is in base". If you can convince the GHC and Hugs people that for their next releases they will guarantee to bundle a set of libraries and treat them as though they were in base, then my proposal is asking to be included in that set. (The Yhc position on this is that its a really good idea, and we'll definitely do that!) As a side note, I think that the process for getting a library into this set of guaranteed libraries should be exactly the same as the current process for getting a library into base. If these core libraries have the same status as base currently does, then they need the same level of scrutiny. Thanks Neil

Hello Neil, Thursday, November 23, 2006, 7:48:36 PM, you wrote: now i'm 100% agree with your position. if GHC HQ will be against adding FilePath library to the set of core libs, i'm vote for inclusion FilePath module to base lib exclusively for using in Cabal scripts. while for other purposes i propose to release separate FilePath library anyway, so it can evolve and be upgraded as user wish afaik, Hugs peoples (including you) just follows GHC decisions in this area
so, i propose to include your lib in the core (or base) library set, bundle it with ghc and hugs, and host it on darcs.haskell.org like other ghc-bundled libs
Ok, perhaps I can rephrase my position (in light of your currently ongoing discussion):
"I propose that this module is added to the set of modules which is guaranteed to be available in every Haskell implementation, and can be used by programs with a simple import statement."
Currently that means "is in base". If you can convince the GHC and Hugs people that for their next releases they will guarantee to bundle a set of libraries and treat them as though they were in base, then my proposal is asking to be included in that set. (The Yhc position on this is that its a really good idea, and we'll definitely do that!)
As a side note, I think that the process for getting a library into this set of guaranteed libraries should be exactly the same as the current process for getting a library into base. If these core libraries have the same status as base currently does, then they need the same level of scrutiny.
Thanks
Neil
-- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

On Thu, Nov 23, 2006 at 04:48:36PM +0000, Neil Mitchell wrote:
"I propose that this module is added to the set of modules which is guaranteed to be available in every Haskell implementation, and can be used by programs with a simple import statement."
Currently that means "is in base".
For GHC that means "is a core lib"; I'm not sure of the policy for the other implementations. Perhaps we should rename that "is a GHC core lib" and define a list of global "core libs" (of which "GHC core libs" would be a superset) somewhere? I haven't looked at the particular library, but have no problem with some sort of file path manipulating library to be used by cabal Setup.hs's being in the core libs set. Thanks Ian

Again, see http://hackage.haskell.org/trac/ghc/wiki/PackageReorg for proposed termiology. What we currently call "core libs" we propose to rename GHC Boot Packages S | -----Original Message----- | From: libraries-bounces@haskell.org [mailto:libraries-bounces@haskell.org] On Behalf Of Ian Lynagh | Sent: 24 November 2006 17:24 | To: Neil Mitchell | Cc: Haskell Libraries | Subject: Re: Re[2]: Proposal: Add System.FilePath to base | | On Thu, Nov 23, 2006 at 04:48:36PM +0000, Neil Mitchell wrote: | > | > "I propose that this module is added to the set of modules which is | > guaranteed to be available in every Haskell implementation, and can be | > used by programs with a simple import statement." | > | > Currently that means "is in base". | | For GHC that means "is a core lib"; I'm not sure of the policy for the | other implementations. Perhaps we should rename that "is a GHC core lib" | and define a list of global "core libs" (of which "GHC core libs" would | be a superset) somewhere? | | I haven't looked at the particular library, but have no problem with | some sort of file path manipulating library to be used by cabal | Setup.hs's being in the core libs set. | | | Thanks | Ian | | _______________________________________________ | Libraries mailing list | Libraries@haskell.org | http://www.haskell.org/mailman/listinfo/libraries

On Thu, 2006-11-23 at 15:19 +0300, Bulat Ziganshin wrote:
Hello Neil,
Wednesday, November 22, 2006, 3:25:54 PM, you wrote:
* Portability, Windows+Linux, GHC+Hugs+Yhc
may be i'm wrong but adding NHC compatibility seems like an important issue. reasons: base lib and cabal now supports nhc and anything proposed to be used here need to follow the same requirements
Unfortunately it is very difficult for most developers to test with NHC98 because it does not work with any recent version of Linux (specifically since Linux kernel version 2.6 some years ago) due to the 'high mem' bug. I admit that I've not tried building NHC98 for a while but I've not heard that the bug has been fixed. Duncan

Duncan Coutts
Unfortunately it is very difficult for most developers to test with NHC98 because it does not work with any recent version of Linux (specifically since Linux kernel version 2.6 some years ago) due to the 'high mem' bug.
I build nhc98 every night on Linux, Sparc, and MacOS X. It works here...
I admit that I've not tried building NHC98 for a while but I've not heard that the bug has been fixed.
The high-mem bug was fixed in March this year. So yes, nhc98 is long overdue for a refreshed release that includes this patch, along with more current versions of the library packages. I'll try to get to it before Christmas. Regards, Malcolm

On Thu, 2006-11-23 at 17:03 +0000, Malcolm Wallace wrote:
Duncan Coutts
wrote: Unfortunately it is very difficult for most developers to test with NHC98 because it does not work with any recent version of Linux (specifically since Linux kernel version 2.6 some years ago) due to the 'high mem' bug.
I build nhc98 every night on Linux, Sparc, and MacOS X. It works here...
I admit that I've not tried building NHC98 for a while but I've not heard that the bug has been fixed.
The high-mem bug was fixed in March this year. So yes, nhc98 is long overdue for a refreshed release that includes this patch, along with more current versions of the library packages. I'll try to get to it before Christmas.
That's great news. I look forward to re-adding a nhc98 gentoo package (it got removed some time ago due to this bug). Duncan

Hello Malcolm, Thursday, November 23, 2006, 8:03:21 PM, you wrote:
I build nhc98 every night on Linux, Sparc, and MacOS X. It works here...
you should be interested in testing filepath module with nhc ;) at least i agree with Neil that this module is great for setup scripts -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com
participants (8)
-
Benjamin Franksen
-
Bulat Ziganshin
-
Duncan Coutts
-
Ian Lynagh
-
Malcolm Wallace
-
Neil Mitchell
-
Simon Marlow
-
Simon Peyton-Jones