
Roel van Dijk wrote:
I propose to make UTF-8 the only allowed encoding for Haskell source files. Implementations must discard an initial Byte Order Mark (BOM) if present
I am in favor of this proposal. However, you wrote:
"GHC assumes that source files are ASCII or UTF-8 only, other encodings are not recognised. However, invalid UTF-8 sequences will be ignored in comments, so it is possible to use other encodings such as Latin-1, as long as the non-comment source code is ASCII only." [4]
From this I deduce that all current code accepted by GHC is compatible with UTF-8. No working code will be broken.
No. If GHC is changed to conform to this proposal, source code including invalid UTF-8 in comments which previously compiled successfully will now be rejected. But anyway I think allowing invalid UTF-8 in comments is a mistake. It could lead to the end of the comment being detected in the wrong place, thus changing the meaning of the program in very unexpected ways. Not likely, but possible. I doubt that there is a whole lot of code out there which would be affected. And GHC can easily provide a certain degree of backward compatibility with a flag and/or pragma. Thanks, Yitz