Re: [Haskell-cafe] Ready for testing: Unicode support for Handle I/O

3 Feb 2009

      On Tue, Feb 03, 2009 at 10:56:13PM +0000, Duncan Coutts wrote:
...
...
...
Thanks to suggestions from Duncan Coutts, it's possible to call
hSetEncoding even on buffered read Handles, and the right thing
happens.  So we can read from text streams that include multiple
encodings, such as an HTTP response or email message, without having
to turn buffering off (though there is a penalty for switching
encodings on a buffered Handle, as the IO system has to do some
re-decoding to figure out where it should start reading from again).
Sounds useful, but is this the bit that causes the 30% performance hit?
No. You only pay that penalty if you switch encoding. The standard case
has no extra cost.
I'm confused.  I thought the standard case was conversion to the
system's local encoding?  How is that different than selecting the
same encoding manually?

There always has to be *some* conversion from a 32-bit Char to the
system's selection, right?

What exactly do we have to do to avoid the penalty?
...
No, I think that's 30% for latin1. The cost is not really the character
conversion but the copying from a byte buffer via iconv to a char
buffer.
Don't we already have to copy between a byte buffer and a char buffer,
since read() and write() use a byte buffer?
...
...
30% slower is a big deal, especially since we're not all that speedy now.
Bear in mind that's talking about the [Char] interface, and nobody using
that is expecting great performance. We already have an API for getting
Yes, I know, but it's still the most convenient interface, and making
it suck more isn't cool -- though there are certainly big wins here.

-- John

Re: [Haskell-cafe] Ready for testing: Unicode support for Handle I/O

John Goerzen