
On Oct 25, 2009, at 5:01 PM, Curt Sampson wrote:
Actually, you would be having the exact same issues with Java; in UTF-8 mode it would also choke on Latin-1.
Yes, but from the 'javac' man-page: -encoding encoding Sets the source file encoding name, such as EUCJIS/SJIS/ISO8859-1/UTF8. If -encoding is not specified, the platform default converter is used. The corresponding part of the GHC documentation says GHC assumes that source files are ASCII or UTF-8 only, other encodings are not recognised. However, invalid UTF-8 sequences will be ignored in comments, so it is possible to use other encodings such as Latin-1, as long as the non-comment source code is ASCII only. There's no obvious reason why GHC couldn't support any source encoding that the host's iconv() supports.
Blaming Haskell for this "problem" is quite unfair.
It is perfectly fair. The problem is not that the original user isn't telling GHC what the encoding is, but that GHC cannot be told. A javac-like -encoding switch on the command line would meet the original need.
(If all of this UTF-8 stuff seems annoying to you, consider that in ISO-8859-1 it's not possible to express the simplest Japanese word.
And why, exactly, should someone who has no Japanese words to express even care? You have explained why UTF-8 is a good *default*; that does not make choosing it as the *only* option a good idea.