Re: behaviour change in getDirectoryContents in GHC 7.2?

10 Nov 2011

      On 10 November 2011 00:17, Ian Lynagh  wrote:
...
On Wed, Nov 09, 2011 at 03:58:47PM +0000, Max Bolingbroke wrote:
...
(Note that the above outlined problems are problems in the current
implementation too
Then the proposal seems to me to be strictly better than the current
system. Under both systems the wrong thing happen when U+EFxx is entered
as unicode text, but the proposed system works for all filenames read
from the filesystem.
Your proposal is not *strictly* better than what is implemented in at
least the following ways:
  1. With your proposal, if you read a filename containing U+EF80 into
the variable "fp" and then expect the character U+EF80 to be in fp you
will be surprised to only find its escaped form. In the current
implementation you will in fact find U+EF80.
  2. The performance of iconv-based decoders will suffer because we
will need to do a post-pass in the TextEncoding to do this extra
escaping for U+EFxx characters

I'm really not keen about implementing a fix that addresses such a
limited subset of the problems, anyway.
...
In the longer term, I think we need to fix the underlying problem that
(for example) both getLine and getArgs produce a String from bytes, but
do so in different ways. At some point we should change the type of
getArgs and friends.
I'm not sure about this. hGetLine produces a String from bytes in a
different way depending on the encoding set on the Handle, but we
don't try to differentiate in the type system between Strings decoded
using different TextEncodings. Why should getLine and getArgs be
different?

If you are really unhappy about getLine and getArgs having different
behaviour in this sense, one option would be to change the default
stdout/stdin TextEncoding to use the fileSystemEncoding that knows
about escapes. (Note that this would mean that your Haskell program
wouldn't immediately die if you were using the UTF8 locale and then
tried to read some non-UTF8 input from stdin, which might or might not
be a good thing, depending on the application.)

Max