Re: Proposal #3456: Add FilePath -> String decoder

On Mon, Aug 31, 2009 at 12:28 AM, Ketil Malde
Duncan Coutts
writes: Presumably on POSIX we will follow the glib approach of using '?' replacement chars, since the conversion to string is aimed at human consumption. Doing this makes the function total but lossy.
If the FilePath is not a valid UTF-8, there's a private area in Unicode that can be used for encoding byte values. Wikipedia's UTF-8 entry suggests "U+DCxx where xx is the byte's value".
This would make us "non conformant" as per the bureaucracy, but on the other hand, it would work (with some ugliness for non-ASCII-based encodings) for any encoding, and these would be the expected identity:
Taking a step back, there's (at least) three separate issues at play here: 1) The FilePath type must be able to represent arbitrary byte sequences on POSIX systems, but the current one-byte-per-Char is suboptimal. 2) Much existing code probably relies on FilePath==String. 3) We need to be able to display FilePaths in a readable form to the user. The U+DCxx method is a way to fix #1 without affecting #2. However, I don't think this will solve issue #3 (which is what my proposal is intended to address). Probably a FilePath->String display function should explicitly replace the problem bytes with either "?" or "%xx".
Can FilePath be defined differently on different systems?
I.e. could it be:
type FilePath = [Word8] -- Posix type FilePath = [Word16] -- Windows, etc
It'd also be nice if overloaded string literals are used (and extended to these), so that I could use specify filenames directly with no need for wrappers.
Yes, this would solve #1 nicely; but runs into #2, so it's looking unlikely that it will happen anytime soon.
In principle I guess it'd be ok to add versions in the System.FilePath.Posix module that take an extra encoding parameter
I think these belong in Text.Encodings or some such.
Either you use the default, simple and pure interface (which is UTF-8 on Posix), or you'll have to do some more work, and do something like
mydecoder <- filePathToStringWith =<< getLocaleEncoding
Sure; though I'd expect that a TextEncoding would convert bytes to/from Chars-as-Unicode, which isn't really useful on Windows. I guess on Windows filePathToStringWith would just completely ignore the encoding parameter. (But I do think it's important to have such a function for portability.) -Judah
participants (1)
-
Judah Jacobson