Re: [Haskell-cafe] Encoding of Haskell source files

5 Apr 2011

      On Mon, 2011-04-04 at 11:50 +0200, Roel van Dijk wrote:
...
I am not aware of any algorithm that can reliably infer the character
encoding used by just looking at the raw data. Why would people bother
with stuff like <?xml version="1.0" encoding="UTF-8"?> if
automatically figuring out the encoding was easy?
It is possible, if the syntax/grammar of the encoded content restricts
the set of allowed code-points in the first few characters.

For instance, valid JSON (see RFC 4673 section 3) requires the first two
characters to be plain "ASCII" code-points, thus which of the 5 BOM-less
UTF-encodings is used is uniquely determined by inspecting the first 4
bytes of the UTF encoded stream.

Re: [Haskell-cafe] Encoding of Haskell source files

Herbert Valerio Riedel