
11 Jan
2006
11 Jan
'06
10:14 a.m.
On 10.01 10:25, Bulat Ziganshin wrote:
i have the question about this issue - i also want to provide autodetection mechanism, which relies on first bytes of text files to set proper encoding. what is the standard rules to encode utf8/utf16 encoding used for text in file in these first bytes?
The BOM is used to mark the encoding (http://en.wikipedia.org/wiki/Byte_Order_Mark), but most UTF-8 streams lack it. I have not seen it used in UTF-8 files either. Do you plan on supporting things like HTTP where the character set is only known in the middle of the parsing? - Einar Karttunen