
On 9 November 2011 16:29, Simon Marlow
Ok, so since we need something like
makePrintable :: FilePath -> String
arguably we might as well make that do the locale decoding. That's certainly a good point...
You could, but getArgs :: IO [String], not :: IO [FilePath]. And locale-decoding command-line arguments is the Right Thing To Do. So this doesn't really avoid the need to roundtrip, does it? Is there any consensus about what to do here? My take is that we should move back to lone surrogates. This: 1. Recovers the roundtrip property, which we appear to believe is essential 2. Removes all the weird problems I outlined earlier that can occur if your byte strings happen to contain some bytes that decode to U+EFxx 3. DOES break software that expects Strings not to contain surrogate codepoints, but (I agree with you) this is arguably a feature This is also exactly what Python does so it has the advantage of being battle tested. Agreed? We can additionally: * Provide your layer in the "unix" package where FilePath = ByteString, for people who for some reason care about performance of their FilePath encoding/decoding, OR who don't want to rely on the roundtripping property being implemented correctly * Perhaps provide a layer in the "win32" package where FilePath = ByteString but where that ByteString is guaranteed to be UTF-16 encoded (I'm less sure about this, because we can always unambiguously decode this without doing any escaping. It's still useful if you care about performance.) I'm wondering if we should also have hSetLocaleEncoding, hSetFileSystemEncoding :: TextEncoding -> IO () and change localeEncoding, fileSystemEncoding :: IO TextEncoding. hSetFileSystemEncoding in particular would let people opt-out of escapes entirely as long as they issued it right at the start of their program before the fileSystemEncoding had been used. What do you think? Max