Re: [Haskell-cafe] The Nature of Char and String

30 Jan 2005


      John Goerzen wrote:
...
Char in Haskell represents a Unicode character.  I don't know exactly
what its size is, but it must be at least 16 bits and maybe more.
String would then share those properties.
However, usually I'm accustomed to dealing with data in 8-bit words.
So I have some questions:
Char and String handling in Haskell is deeply broken. There's a 
discussion ongoing on this very list about fixing it (in the context of 
pathnames).

But for now, Haskell's Char behaves like C's char with respect to I/O. 
This is unlikely ever to change (in the existing I/O interface) because 
it would break too much code. So the answers to your questions are:
...
* If I use hPutStr on a string, is it guaranteed that the number of
  8-bit bytes written equals (length stringWritten)?
Yes, if the handle is opened in binary mode. No if not.
...
+ If yes, what happens to the upper 8 bits?  Are they simply
    stripped off?
Yes.
...
* If I run hGetChar, is it possible that it would consume more than
  one byte of input?
No in binary mode, yes in text mode.
...
* Does Haskell treat the "this is a Unicode file" marker special in
  any way?
No.
...
* Same questions on withCString and related String<->CString
  conversions.
They all behave as if reading/writing a file in binary mode.

-- Ben

Re: [Haskell-cafe] The Nature of Char and String

Ben Rudiak-Gould