
On Mon, Nov 7, 2011 at 09:02, Simon Marlow
I think you might be misunderstanding how the new API works. Basically, imagine a reversible transformation:
encode :: String -> [Word8] decode :: [Word8] -> String
this transformation is applied in the appropriate direction by the IO library to translate filesystem paths into FilePath and vice versa. No information is lost; furthermore you can apply the transformation yourself in order to recover the original [Word8] from a String, or to inject your own [Word8] file path.
Ok?
I understand how the API is intended / designed to work; however, the implementation does not actually do this. My argument is that this transformation should be in a high-level library like "directory", and the low-level libraries like "base" or "unix" ought to provide functions which do not transform their inputs. That way, when an error is found in the encoding logic, it can be fixed by just pushing a new version of the affected library to Hackage, instead of requiring a new version of the compiler. I am also not convinced that it is possible to correctly implement either of these functions if their behavior is dependent on the user's locale.
All this does is mean that the common case where you want to interpret file system paths as text works with no fuss, without breaking anything in the case when the file system paths are not actually text.
As mentioned earlier in the thread, this behavior is breaking things. Due to an implementation error, programs compiled with GHC 7.2 on POSIX systems cannot open files unless their paths also happen to be valid text according to their locale. It is very difficult to work around this error, because the paths-are-text logic was placed at a very low level in the library stack.
It would probably be better to have an abstract FilePath type and to keep the original bytes, decoding on demand. But that is a big change to the API and would break much more code. One day we'll do this properly; for now we have this, which I think is a pretty reasonble compromise.
Please understand, I am not arguing against the existence of this encoding layer in general. It's a fine idea for a simplistic high-level filesystem interaction library. But it should be *optional*, not part of the compiler or "base. As implemented in GHC 7.2, this encoding is a complex and untested behavior with no escape hatch.