
On Tue, Feb 03, 2009 at 10:56:13PM +0000, Duncan Coutts wrote:
Thanks to suggestions from Duncan Coutts, it's possible to call hSetEncoding even on buffered read Handles, and the right thing happens. So we can read from text streams that include multiple encodings, such as an HTTP response or email message, without having to turn buffering off (though there is a penalty for switching encodings on a buffered Handle, as the IO system has to do some re-decoding to figure out where it should start reading from again).
Sounds useful, but is this the bit that causes the 30% performance hit?
No. You only pay that penalty if you switch encoding. The standard case has no extra cost.
I'm confused. I thought the standard case was conversion to the system's local encoding? How is that different than selecting the same encoding manually? There always has to be *some* conversion from a 32-bit Char to the system's selection, right? What exactly do we have to do to avoid the penalty?
No, I think that's 30% for latin1. The cost is not really the character conversion but the copying from a byte buffer via iconv to a char buffer.
Don't we already have to copy between a byte buffer and a char buffer, since read() and write() use a byte buffer?
30% slower is a big deal, especially since we're not all that speedy now.
Bear in mind that's talking about the [Char] interface, and nobody using that is expecting great performance. We already have an API for getting
Yes, I know, but it's still the most convenient interface, and making it suck more isn't cool -- though there are certainly big wins here. -- John