
On Mon, 2008-02-25 at 21:49 +0000, Ross Paterson wrote:
On Mon, Feb 25, 2008 at 09:07:08PM +0000, Duncan Coutts wrote:
It's no use pretending that readFile returns Unicode, it just doesn't (except on Hugs which does it properly). GHC is not going to catch up on this any time soon.
On the contrary, it's the only way to stay sane. readFile does return Unicode, it just doesn't read UTF. Putting compensating bugs in the libraries is only going to make it harder for GHC to change.
My suggestion is to just write Chars to these Handles, even though text handles in GHC currently only work in an ISO-8859-1 locale. That's what the other libraries in your program will be doing with those handles, and they're not wrong: the other way lies madness.
So that's basically what I've done in the most recent patches. I pretend that read/writeFile and putStr etc work for text in the current locale encoding. For files we know specifically are UTF8 because we declare that to be the case (like .cabal and .hs) we now use to/fromUTF8 and openBinaryFile. Hmm, having said that we're not yet treating line endings in .hs files correctly on windows. Sigh.
Is switching the standard text handles to UTF really an impossibly remote prospect?
Seems not :-) Duncan