
Peter Verswyvelen
No I just used wrong terminology. When I said unicode, I actually meant UCS-x,
You might as well say UCS-4, nobody uses UCS-2 anymore. It's been replaced by UTF-16, which gives you the complexity of UTF-8 without being compact (for 99% of existing data), endianness-indifferent, or backwards compatibe with ASCII.
and with multi-byte-string-thing I meant VARIABLE-length, sorry about that. I find variable length chars so much harder to use and reason about than the fixed length characters. UTF-x is a form of compression, which is understandable, but it is IMHO a burden (since it does not allow random access to the n-th character)
Do you really need that, though? Most formats I know with enough structure that you can pick up records by offset either encode the offsets somewhere, or are restricted to ASCII, or both.
Now I'm getting a bit confused here. To summarize, what encoding does GHC 6.8.2 use for [Char]? UCS-32?
Internally, Haskell Chars are Unicode, and stores a code point as a 32bit (well, actually 21 bit or something) value. One Char, one code point. ByteString stores 8-bit "char"s, and the Char8 interface chops off the top bits, essentially projecting codepoints down to the ISO-8859-1 (latin1) subset. Externally, it depends on what IO library you use. As for the command line, Ian's post links to: http://www.haskell.org/ghc/docs/6.8.2/html/users_guide/release-6-8-2.html -k -- If I haven't seen further, it is by standing in the footprints of giants