
[ moving to libraries@haskell.org ] On 26 January 2005 12:22, Malcolm Wallace wrote:
Could we just punt this library for this release. After all we can add libraries in a later point release (eg 6.4.1) you just can't change existing APIs.
FWIW, I agree with Duncan, Ben, and Peter, that the new System.FilePath interface is broken, and the implementation more so. It would be better to redesign FilePaths as an algebraic datatype.
Ok, I'll go with the concensus. System.FilePath in its current state will be removed from the base package. Isaac or Krasimir: can you make the required changes to Cabal? (the code is already duplicated in Distribution.Compat.FilePath, just remove the import for GHC). So let's aim for 6.6 to do it right, and design an abstract type for path names. We have to think about the migration path: Haskell 98 programs must continue to work, so at least IO.FilePath must still be a String, and IO.openFile must still take a String. We can therefore: (a) make System.IO.FilePath be the new type, which is different from, and incompatible with, IO.FilePath. Similarly for System.IO.openFile, System.Directory.removeFile, and so on. (b) or just define a new type, and force you to insert read/show to convert where necessary, eg. before calling openFile. (a) is kind of the right thing, but (b) is a lot less painful in the short term. Since we'll be migrating to a new IO library at some point, (b) is probably fine (the new IO library can use the new FilePaths exclusively), but we'll need to migrate System.Directory too. Would someone like to take up the reigns on the design for the new library? Cheers, Simon

On Wed, Jan 26, 2005 at 01:34:39PM -0000, Simon Marlow wrote:
... We can therefore:
(a) make System.IO.FilePath be the new type, which is different from, and incompatible with, IO.FilePath. Similarly for System.IO.openFile, System.Directory.removeFile, and so on.
(b) or just define a new type, and force you to insert read/show to convert where necessary, eg. before calling openFile.
(a) is kind of the right thing, but (b) is a lot less painful in the short term. Since we'll be migrating to a new IO library at some point, (b) is probably fine (the new IO library can use the new FilePaths exclusively), but we'll need to migrate System.Directory too.
One thing I've been wishing for some time (as long as we're discussing a replacement for FilePath) was to have a FilePath class, which would allow me to use my FastPackedStrings with the IO routines. It seems silly to have a byte-oriented filepath, convert it into a String and then have the IO library convert back again to a byte-oriented string to call the C library. I've wished there were a class FilePath f where toStringFilePath :: f -> String withCStringFilePath :: f -> (CString -> IO a) -> IO a or something like that (where the withCStringFilePath could have a default written in terms of toStringFilePath). It's a shame when I use ffi for some of my IO (which of course always requires CStrings) and haskell IO libraries which always require Strings to keep having to convert back and forth. Alas, darcs does enough "quick" calls to stat (doesFileExist, etc) that the cost isn't negligible. Eventually I'll rewrite a lot of this to just use the ffi (since I want to use lstat anyways, or its windows equivalent), but it would be nice (eventually) not to have to do this. But how painful would it be for the System.IO functions to have types such as readFile :: FilePath a => a -> String ? -- David Roundy http://www.darcs.net

Would someone like to take up the reigns on the design for the new library?
I don't feel I know enough about the corners of DOS, Windows, and MacOS Classic filesystems to take up the challenge. But here's a quick sketch of the kind of thing I had in mind, based on Ben's suggestion:
having an ADT for paths would not break any code, if what that means is that we provide
parsePath :: String -> Maybe Path showPath :: Path -> Maybe String
and various (new) functions operating on the Path type.
So let's define a new algebraic datatype something like: data Path = Unix { absolute :: Bool , dirnames :: [String] , basename :: String , extensions :: [String] } | DOS { absolute :: Bool , drive :: Maybe Char , dirnames :: [String] , basename :: String -- max 8 Chars , extension :: String -- max 3 Chars } -- | Win32 ... -- | MacOSClassic .... -- | VMS | CPM | Amiga | ARM .... The implementation of parseFilePath :: Prelude.FilePath -> Maybe System.Path is probably quite involved, but the Show instance should be more straightforward, something like: instance Show FilePath where show (Unix abs dirs name exts) = (if abs then ('/':) else id) (concat (intersperse "/" (dirs++[basename])) ++ concat (intersperse "." exts)) show (DOS abs drive dirs name ext) = (maybe "" (\c->[c,':']) drive) ++ (if abs then ('\\':) else id) (concat (intersperse "\\" (dirs++[basename])) ++"."++ext) The signatures of various of the current manipulation functions would need to change: - splitFileName :: FilePath -> (String, String) + splitFileName :: Path -> (Path, String) - splitFileExt :: FilePath -> (String, String) + splitFileExt :: Path -> (Path, String) - joinFileName :: String -> String -> FilePath + joinFileName :: Path -> String -> Path - joinPaths :: FilePath -> FilePath -> FilePath + joinPaths :: Path -> Path -> Maybe Path ... and so on. Regards, Malcolm

Malcolm Wallace wrote:
So let's define a new algebraic datatype something like:
data Path = Unix { absolute :: Bool , dirnames :: [String] , basename :: String , extensions :: [String] } | DOS { absolute :: Bool , drive :: Maybe Char , dirnames :: [String] , basename :: String -- max 8 Chars , extension :: String -- max 3 Chars } -- | Win32 ... -- | MacOSClassic .... -- | VMS | CPM | Amiga | ARM ....
I think you can guess what comming - why not use a class for extensibility: class Path p where ... Then different types for each platform: data UnixPath = UnixPath ... which would be made instance of the Path class... The path class would provide all the operations for using paths... This has the advantage that people can add instances for platforms without having to alter the code in the library... Keean

Keean Schupke
So let's define a new algebraic datatype something like:
I think you can guess what comming - why not use a class for extensibility: class Path p where ... The path class would provide all the operations for using paths...
Yup, good idea, and I think this would also allow David Roundy's suggestion of a FastString implementation of filepaths - it becomes just another instance.
This has the advantage that people can add instances for platforms without having to alter the code in the library...
Exactly. Yes. Regards, Malcolm

Simon Marlow writes:
Would someone like to take up the reigns on the design for the new library?
I'll do it. I think I understand the subject well enough to give it a try. If nothing else, my code can be torn to shreds so that something better can come out of it. ;-) I'll make the result of my efforts available via Darcs ASAP, then you all can participate in the process. My knowledge of esoteric file system details is limited, so I'll need help anyway. Peter P. S.: I apologize if you're getting this posting twice. GMANE was acting up somehow when I tried to post, and I am not sure what really happened.

"Simon Marlow"
So let's aim for 6.6 to do it right, and design an abstract type for path names.
In this case it would be silly to *not* solve somehow the problem of filename encodings on Unix. Technically they are byte strings, and converting between them and Unicode strings is ambiguous. Usually they are in the locale's encoding, but it's not enforced. If filename is going to be a separate type, there is a chance to handle filenames which are not decodable in the locale's encoding, at least when they are just passed around and not converted to/from strings (e.g. read from directory contents and used to open the file). It's not clear how such filenames should be presented in strings. I think Gtk+ 2.6 introduced some functions for filename conversion because it used to be inconsistently reimplemented in various programs. For example there is a function which converts a filename to UTF-8 for display (Gtk+ uses UTF-8 internally); this function is not invertible. -- __("< Marcin Kowalczyk \__/ qrczak@knm.org.pl ^^ http://qrnik.knm.org.pl/~qrczak/
participants (6)
-
David Roundy
-
Keean Schupke
-
Malcolm Wallace
-
Marcin 'Qrczak' Kowalczyk
-
Peter Simons
-
Simon Marlow