Re: [Haskell-cafe] Has character changed in GHC 6.8?

23 Jan 2008

      Peter Verswyvelen  writes:
...
No I just used wrong terminology. When I said unicode, I actually meant UCS-x,
You might as well say UCS-4, nobody uses UCS-2 anymore.  It's been
replaced by UTF-16, which gives you the complexity of UTF-8 without
being compact (for 99% of existing data), endianness-indifferent, or backwards
compatibe with ASCII.
...
and with multi-byte-string-thing I meant VARIABLE-length, sorry about that. I
find variable length chars so much harder to use and reason about than the
fixed length characters. UTF-x is a form of compression, which is
understandable, but it is IMHO a burden (since it does not allow random access
to the n-th character)
Do you really need that, though?  Most formats I know with enough structure
that you can pick up records by offset either encode the offsets
somewhere, or are restricted to ASCII, or both.
...
Now I'm getting a bit confused here. To summarize, what encoding does GHC 6.8.2
use for [Char]? UCS-32?
Internally, Haskell Chars are Unicode, and stores a code point as a
32bit (well, actually 21 bit or something) value.  One Char, one code
point. 

ByteString stores 8-bit "char"s, and the Char8 interface chops off the
top bits, essentially projecting codepoints down to the ISO-8859-1
(latin1) subset.

Externally, it depends on what IO library you use.

As for the command line, Ian's post links to:
  http://www.haskell.org/ghc/docs/6.8.2/html/users_guide/release-6-8-2.html

-k
-- 
If I haven't seen further, it is by standing in the footprints of giants

Re: [Haskell-cafe] Has character changed in GHC 6.8?

Ketil Malde