
John Goerzen wrote:
Char in Haskell represents a Unicode character. I don't know exactly what its size is, but it must be at least 16 bits and maybe more. String would then share those properties.
However, usually I'm accustomed to dealing with data in 8-bit words. So I have some questions:
Char and String handling in Haskell is deeply broken. There's a discussion ongoing on this very list about fixing it (in the context of pathnames). But for now, Haskell's Char behaves like C's char with respect to I/O. This is unlikely ever to change (in the existing I/O interface) because it would break too much code. So the answers to your questions are:
* If I use hPutStr on a string, is it guaranteed that the number of 8-bit bytes written equals (length stringWritten)?
Yes, if the handle is opened in binary mode. No if not.
+ If yes, what happens to the upper 8 bits? Are they simply stripped off?
Yes.
* If I run hGetChar, is it possible that it would consume more than one byte of input?
No in binary mode, yes in text mode.
* Does Haskell treat the "this is a Unicode file" marker special in any way?
No.
* Same questions on withCString and related String<->CString conversions.
They all behave as if reading/writing a file in binary mode. -- Ben