
On 10/11/2011 09:28, Max Bolingbroke wrote:
Is there any consensus about what to do here? My take is that we should move back to lone surrogates. This: 1. Recovers the roundtrip property, which we appear to believe is essential 2. Removes all the weird problems I outlined earlier that can occur if your byte strings happen to contain some bytes that decode to U+EFxx 3. DOES break software that expects Strings not to contain surrogate codepoints, but (I agree with you) this is arguably a feature
This is also exactly what Python does so it has the advantage of being battle tested.
Agreed?
Agreed.
We can additionally: * Provide your layer in the "unix" package where FilePath = ByteString, for people who for some reason care about performance of their FilePath encoding/decoding, OR who don't want to rely on the roundtripping property being implemented correctly
I think I'll do this anyway.
* Perhaps provide a layer in the "win32" package where FilePath = ByteString but where that ByteString is guaranteed to be UTF-16 encoded (I'm less sure about this, because we can always unambiguously decode this without doing any escaping. It's still useful if you care about performance.)
I'm wondering if we should also have hSetLocaleEncoding, hSetFileSystemEncoding :: TextEncoding -> IO () and change localeEncoding, fileSystemEncoding :: IO TextEncoding. hSetFileSystemEncoding in particular would let people opt-out of escapes entirely as long as they issued it right at the start of their program before the fileSystemEncoding had been used.
Ok by me. Cheers, Simon