
On Mon, Feb 25, 2008 at 09:07:08PM +0000, Duncan Coutts wrote:
It's no use pretending that readFile returns Unicode, it just doesn't (except on Hugs which does it properly). GHC is not going to catch up on this any time soon.
On the contrary, it's the only way to stay sane. readFile does return Unicode, it just doesn't read UTF. Putting compensating bugs in the libraries is only going to make it harder for GHC to change.
If we open the files in binary mode we don't get the cr/lf line conversion on Windows and we'd have to do that ourselves. Perhaps that's the way to go.
I think we've been ignoring CRs in .cabal files ever since we had to deal with tar files built on Windows and unpacked on Unix.
As for stdout/stderr we're just stuffed. We cannot reopen them in binary mode and hugs and ghc have different and incompatible behaviour. We either end up double encoding with hugs or not decoding with ghc. There is no single method that works with both. We'd have to switch on the system in use.
My suggestion is to just write Chars to these Handles, even though text handles in GHC currently only work in an ISO-8859-1 locale. That's what the other libraries in your program will be doing with those handles, and they're not wrong: the other way lies madness. Is switching the standard text handles to UTF really an impossibly remote prospect?