On 03/26/2012 02:39 AM, Gabriel Dos Reis wrote:
> True, but should the language definition default to a string type
> that is one the most unsuited for text processing in the 21st
> century where global multilingualism abounds? Even C has qualms
> about that.
...
> I have no doubt believing that if all texts my students have to
> process are US ASCII, [Char] is more than sufficient. So, I have
> sympathy for your position. However, I doubt [Char] would be
> adequate if I ask them to shared texts from their diverse cultures.
Uh, while a C char is (usually) just a byte (2^8 bits of information, like
Word8 in Haskell), a Haskell Char is a Unicode character (2^21 bits of
information). A single C char cannot contain arbitrary Unicode character,
while a Haskell Char can, and does. Hence [Char] is (efficiency issues
aside) perfectly adequate for dealing with texts written in arbitrary languages.
...as long as you ignore combining characters and the like. I claim ignoring them in this way is just continuing the same "good enough for my language" attitude that has plagued text handling ever since someone got the notion that maybe text processing should consider more than just ISO 8859/1 and got roundly pooh-poohed by the community.