
8 Jul
2007
8 Jul
'07
11:10 a.m.
Stefan O'Rear:
Char is just a code point. It's a 32 bit integer (64 on 64-bit platforms due to infelicities in the GHC backend) with a code point. [...] The GHC IO functions truncate down to 8 bits. There is no way in GHC to read or write full UTF-8, short of doing the encoding yourself (google for UTF8.lhs).
Thanks, this makes things clear to me.
[Char] is a linked list of pointers to heap-allocated fullword integers, 20 (40) bytes per character (assuming non-latin1).
Hey, I love ByteStrings! ;-) Malte