
On Thu, Feb 3, 2011 at 1:30 PM, Curt Sampson
On 2011-02-02 16:43 +0100 (Wed), Johan Tibell wrote:
Are you sure that no headers are defined as octets (i.e. binary data). If some are they will have all 8 bits set possibly.
Some are indeed so. See, e.g., comments in HTTP headers (RFC2616 section 2.2), which are 'ctext' surrounded by parens; ctext is 'any TEXT excluding "(" and ")"'; TEXT is 'any OCTET except CTLs...', and OCTET is 'any 8-bit sequence of data.'
That said, there is no method within the standard to specify the encoding of anything, so stuff like that needs to be treated as more or less opaque binary data, anyway.
Right. So converting it to a Unicode type is dangerous. As operations on the Unicode type might yield errors for invalid code points generated by casting binary data to Unicode.