
Hi all, Just to clarify: filenames can be written (by different users) in different locales. Therefore, one should treat filesnames as abstract entitities (sequences of bytes) since one can't sensibly convert a filename to a string (if the locale in which it was created is unknown). If the above is true, we should just treat file names as an abstract data type (FilePath) with a set of operations to break them down in smaller pieces (directory, extension etc), to append them again, and to compare them. FilePath's can be created from strings, and even be shown. But showing and creating a filepath again would not be an identity (ie: makeFilePath . show /= id). (Ian: I haven't studied your proposal in detail, but I can't see directly why you propose a separate FilePath class?) All the best, -- Daan. David Roundy wrote:
On Sat, Jul 30, 2005 at 06:13:21PM +0200, Udo Stenzel wrote:
Ian Lynagh wrote:
With it's closer adherence to the Haskell 98 report, it is no longer possible with hugs to manipulate files using the standard IO functions if the filenames are not representable in your locale.
Note that this basically means your filesystem is broken. This situation can only occur if a filesystem is written in one and then read in another locale. [...]
That is true, but on any multiuser system it's quite a reasonable scenario to have different users using different locales. It's an embarrassing scenario that I can't write a tool in Haskell that recursively deletes a directory in which there are files that aren't representable in my current locale... or display the contents of such files, or anything else.
This "problem" cannot really be fixed, only worked around.
On the contrary, the problem *can* be fixed, by only requiring that filenames be converted to unicode if necesary. For many purposes (possibly even *most* purposes), knowledge of the character encoding is completely unnecesary.
More to the point, the "problem" is inherent in the langage, not the filesystem--or perhaps you'd prefer to say that it's a problem with writing portable code. The point is that it would seem best to present an API which makes it possible to write portable code. On POSIX filesystems filenames are not sequences of unicode characters, and treating them as such causes trouble.
UTF-8: 65533 = U+FFFD = "replacement character"
================= Proposed solution =================
I have a simpler proposal: allocate 128 "replacement characters" in the "Vendor Zone" of Unicode. Their purpose is as place holders for incorrect UTF8. Then use these replacement characters when decoding UTF8 and reproduce the original, broken, code when re-encoding. Under ordinary circumstances these codes should never occur in strings.
I guess you'd then want a couple of functions in the IO monad to convert between FilePath and CString (or something we could actually use)?
While your suggestion would solve the problem of being unable to access some files, it would also result in FilePaths themselves (without conversion routines) being useless for purposes other than actually accessing the same files.
------------------------------------------------------------------------
_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries