
On Sat, Jul 30, 2005 at 06:13:21PM +0200, Udo Stenzel wrote:
Ian Lynagh wrote:
With it's closer adherence to the Haskell 98 report, it is no longer possible with hugs to manipulate files using the standard IO functions if the filenames are not representable in your locale.
Note that this basically means your filesystem is broken. This situation can only occur if a filesystem is written in one and then read in another locale. [...]
That is true, but on any multiuser system it's quite a reasonable scenario to have different users using different locales. It's an embarrassing scenario that I can't write a tool in Haskell that recursively deletes a directory in which there are files that aren't representable in my current locale... or display the contents of such files, or anything else.
This "problem" cannot really be fixed, only worked around.
On the contrary, the problem *can* be fixed, by only requiring that filenames be converted to unicode if necesary. For many purposes (possibly even *most* purposes), knowledge of the character encoding is completely unnecesary. More to the point, the "problem" is inherent in the langage, not the filesystem--or perhaps you'd prefer to say that it's a problem with writing portable code. The point is that it would seem best to present an API which makes it possible to write portable code. On POSIX filesystems filenames are not sequences of unicode characters, and treating them as such causes trouble.
UTF-8: 65533 = U+FFFD = "replacement character"
================= Proposed solution =================
I have a simpler proposal: allocate 128 "replacement characters" in the "Vendor Zone" of Unicode. Their purpose is as place holders for incorrect UTF8. Then use these replacement characters when decoding UTF8 and reproduce the original, broken, code when re-encoding. Under ordinary circumstances these codes should never occur in strings.
I guess you'd then want a couple of functions in the IO monad to convert between FilePath and CString (or something we could actually use)? While your suggestion would solve the problem of being unable to access some files, it would also result in FilePaths themselves (without conversion routines) being useless for purposes other than actually accessing the same files. -- David Roundy http://www.darcs.net