Re: [Haskell-cafe] Ready for testing: Unicode support for Handle I/O

3 Feb 2009


      Duncan Coutts wrote:
...
Sorry, I think we've been talking at cross purposes.
I think so.
...
...
There always has to be *some* conversion from a 32-bit Char to the
system's selection, right?
Yes. In text mode there is always some conversion going on. Internally
there is a byte buffer and a char buffer (ie UTF32).
...
What exactly do we have to do to avoid the penalty?
The penalty we're talking about here is not the cost of converting bytes
to characters, it's in switching which encoding the Handle is using. For
example you might read some HTTP headers in ASCII and then switch the
Handle encoding to UTF8 to read some XML.
Simon referenced a 30% penalty.  Are you saying that if we read from a
Handle using the same encoding that we used when we first opened it,
that we won't see any slowdown vs. the system in 6.10?
...
Switching the Handle encoding has a penalty. We have to discard the
characters that we pre-decoded and re-decode the byte buffer in the new
encoding. It's actually slightly more complicated because we do not
Got it.  That makes sense, as does the decision to optimize for the more
common (not switching the encoding) case.

-- John

Re: [Haskell-cafe] Ready for testing: Unicode support for Handle I/O

John Goerzen