Re: Proposal: Define UTF-8 to be the encoding of Haskell source files

5 Apr 2011

      Roel van Dijk wrote:
...
I propose to make UTF-8 the only allowed encoding for Haskell source
files. Implementations must discard an initial Byte Order Mark (BOM)
if present
I am in favor of this proposal.

However, you wrote:
...
"GHC assumes that source files are ASCII or UTF-8 only, other
encodings are not recognised. However, invalid UTF-8 sequences will be
ignored in comments, so it is possible to use other encodings such as
Latin-1, as long as the non-comment source code is ASCII only." [4]
From this I deduce that all current code accepted by GHC is compatible
with UTF-8. No working code will be broken.
No. If GHC is changed to conform to this proposal, source code
including invalid UTF-8 in comments which previously compiled
successfully will now be rejected.

But anyway I think allowing invalid UTF-8 in comments is a
mistake. It could lead to the end of the comment being detected
in the wrong place, thus changing the meaning of the program in
very unexpected ways. Not likely, but possible.

I doubt that there is a whole lot of code out there which would
be affected. And GHC can easily provide a certain degree of
backward compatibility with a flag and/or pragma.

Thanks,
Yitz

Re: Proposal: Define UTF-8 to be the encoding of Haskell source files

Yitzchak Gale