
On Tue, Nov 8, 2011 at 03:04, Simon Marlow
As mentioned earlier in the thread, this behavior is breaking things. Due to an implementation error, programs compiled with GHC 7.2 on POSIX systems cannot open files unless their paths also happen to be valid text according to their locale. It is very difficult to work around this error, because the paths-are-text logic was placed at a very low level in the library stack.
So your objection is that there is a bug? What if we fixed the bug?
My objection is that the current implementation provides no way to work around potential bugs. GHC is software. Like all software, it contains errors, and new features are likely to contain more errors. When adding behavior like automatic path encoding, there should always be a way to avoid or work around it, in case a severe bug is discovered.
It would probably be better to have an abstract FilePath type and to keep the original bytes, decoding on demand. But that is a big change to the API and would break much more code. One day we'll do this properly; for now we have this, which I think is a pretty reasonble compromise.
Please understand, I am not arguing against the existence of this encoding layer in general. It's a fine idea for a simplistic high-level filesystem interaction library. But it should be *optional*, not part of the compiler or "base.
Ok, so I was about to reply and say that the low-level API is available via the unix and Win32 packages, and then I thought I should check first, and I discovered that even using System.Posix you get the magic encoding behaviour.
I really think we should provide the native APIs. The problem is that the System.Posix.Directory API is all in terms of FilePath (=String), and if we gave that a different meaning from the System.Directory FilePaths then confusion would ensue. So perhaps we need to add another API to System.Posix with filesystem operations in terms of ByteString, and similarly for Win32.
+1 I think most users would be OK with having System.Posix treat FilePath differently, as long as this is clearly documented, but if you feel a separate API is better then I have no objection. As long as there's some way to say "I know what I'm doing, here's the bytes" to the library. The Win32 package uses wide-character functions, so I'm not sure whether bytes would be appropriate there. My instinct says to stick with chars, via withCWString or equivalent. The package maintainer will have a better idea of what fits with the OS's idioms.