
quoth Donn Cave
Umlaut u turns up as 0xFC for UTF-8 users; 0xDCFC, for Latin-1 users. This is an ordinary hello world type program, can't think of any unique environmental issues.
Well, I mischaracterized that problem, so to speak. I find that GHC is not picking up on my "current locale" encoding, and instead seems to be hard-wired to UTF-8. On MacOS X, I can select an encoding in Terminal Preferences, open a new window, and for all intents and purposes it's an ISO8859-1 world, including LANG=en_US.ISO8859-1, but GHC isn't going along with it. So the ISO8859-1 umlaut u is undecodable if GHC is stuck in UTF-8, which seems to explain what I'm seeing. If I understand this right, the 0xDC00 high byte is recognized in some circumstances, and the value is spared from UTF-8 encoding and instead simply copied. Hope that was interesting! Donn