
On Sun, Nov 28, 2010 at 10:35 AM, Yitzchak Gale
I wrote:
In any case, you still need to have the correct encoding set on the handles as before.
Michael Snoyman wrote:
...it does *not* address invalid byte sequences (AFAIK), which can be dealt with using the bytestring/text decoding combination.
Well, using the standard interface, you have three choices on how to handle invalid byte sequences - drop them, use a replacement character, or throw an exception, with the third choice being the default. You specify that choice when you set the encoding. See the documentation for System.IO for more details.
However, those choices are implemented via GNU iconv, so on Windows you only have the default behavior.
Also, in certain special situations - like if you need to be able to specify the replacement character yourself, or if you need in-band exceptions (e.g. a stream of Either error character), then the options do seem limited currently.
You might still need to fall back on the old bytestring hack in those cases. If you find yourself in that situation, it might be a good idea to push the maintainers of System.IO and Data.Text to continue to improve support for encodings in the standard libraries.
I hadn't realized that the standard libraries offered so much sophistication in their approach to file encodings, I'll have to look at it more thoroughly. Michael