
Glynn Clements
And it isn't a theoretical issue. E.g. in an environment where EUC-JP is used, filenames may begin with <ESC>$)B (designate JISX0208 to G1), or they may not (because G1 is assumed to contain JISX0208 initally).
I think such encodings are never used as default encodings of a Unix locale.
The various UTF encodings do not have this particular problem; if a UTF string is valid, then it is a unique representation of a unicode string.
BOM is a problem. Unfortunately Unicode mandates that FEFF at the start of a UTF-8 text stream is a mark which doesn't belong to the text. It provides variants of UTF-16/32 with and without a BOM, but UTF-8 only has the variant with a BOM. This makes UTF-8 a stateful encoding. Unix ignores this, it doesn't use BOM in UTF-8 except individual applications for individual file formats. iconv() on Linux and in libiconv don't process a BOM in UTF-8 (although in libiconv this is because it's old, basing on and old RFC with 31-bit code points which didn't include a BOM). -- __("< Marcin Kowalczyk \__/ qrczak@knm.org.pl ^^ http://qrnik.knm.org.pl/~qrczak/