RE: UTF-8 BOM, really!? (was: [Haskell-cafe] Re: File path progr amme)

From: Aaron Denney [mailto:wnoise@ofb.net]
Better yet would be to have the standard never allow the BOM.
Since some things can't handle it, on output we should never emit it, but still must handle it on input. Bah.
I don't see how banning it from input would help; as I understand it, it's meant to be ignored anyway, so there's no loss or gain there. Whether or not to emit the BOM on output should be a user choice, surely?
From: Graham Klyne [mailto:GK@ninebynine.org]
How can it make sense to have a BOM in UTF-8? UTF-8 is a sequence of octets (bytes); what ordering is there here that can sensibly be varied?
"Q: Where is a BOM useful? A: A BOM is useful at the beginning of files that are typed as text, but for which it is not known whether they are in big or little endian format..." i.e. it helps when you need to guess what the encoding of a given file might be. Alistair. ----------------------------------------- ***************************************************************** Confidentiality Note: The information contained in this message, and any attachments, may contain confidential and/or privileged material. It is intended solely for the person(s) or entity to which it is addressed. Any review, retransmission, dissemination, or taking of any action in reliance upon this information by persons or entities other than the intended recipient(s) is prohibited. If you received this in error, please contact the sender and delete the material from any computer. *****************************************************************

"Bayley, Alistair"
How can it make sense to have a BOM in UTF-8?
"Q: Where is a BOM useful?
A: A BOM is useful at the beginning of files that are typed as text, but for which it is not known whether they are in big or little endian format..."
I think the question is how a single byte can have a byte order. -kzm -- If I haven't seen further, it is by standing in the footprints of giants

On 2005-01-31, Bayley, Alistair
From: Aaron Denney [mailto:wnoise@ofb.net]
Better yet would be to have the standard never allow the BOM.
Since some things can't handle it, on output we should never emit it, but still must handle it on input. Bah.
I don't see how banning it from input would help; as I understand it, it's meant to be ignored anyway, so there's no loss or gain there.
Backwards compatibility, e.g. Kernel looking for the two bytes '#!' at the beginning of a shell script. UTF-8 was supposed to be for backwards compatibility with ASCII with minimal fuss. Having to handle a BOM transparently at the beginning is no longer minimal fuss. -- Aaron Denney -><-
participants (3)
-
Aaron Denney
-
Bayley, Alistair
-
Ketil Malde