
Aaron Denney
It provides variants of UTF-16/32 with and without a BOM, but UTF-8 only has the variant with a BOM. This makes UTF-8 a stateful encoding.
I think you mean "UTF-8 only has the variant without a BOM".
No, unfortunately. Unicode standard section 3.10 defines encoding schemes: - UTF-8 (with a BOM) - UTF-16BE (without a BOM) - UTF-16LE (without a BOM) - UTF-16 (with a BOM) - UTF-32BE (without a BOM) - UTF-32LE (without a BOM) - UTF-32 (with a BOM) It says about UTF-8 BOM: "Its usage at the beginning of a UTF-8 data stream is neither required nor recommended by the Unicode Standard, but its presence does not affect conformance to the UTF-8 encoding scheme." IMHO it would be fair if it had two variants of UTF-8 encoding scheme, just like it has three variants of UTF-16/32, so it would be unambiguous whether "UTF-8" in a particular context allows BOM or not. -- __("< Marcin Kowalczyk \__/ qrczak@knm.org.pl ^^ http://qrnik.knm.org.pl/~qrczak/