
On Mon, Mar 26, 2012 at 5:08 AM, Christian Siefkes
On 03/26/2012 02:39 AM, Gabriel Dos Reis wrote:
True, but should the language definition default to a string type that is one the most unsuited for text processing in the 21st century where global multilingualism abounds? Even C has qualms about that. ... I have no doubt believing that if all texts my students have to process are US ASCII, [Char] is more than sufficient. So, I have sympathy for your position. However, I doubt [Char] would be adequate if I ask them to shared texts from their diverse cultures.
Uh, while a C char is (usually) just a byte (2^8 bits of information, like Word8 in Haskell), a Haskell Char is a Unicode character (2^21 bits of information).
It is not the precision of Char or char that is the issue here. It has been clarified at several points that Char is not a Unicode character, but a Unicode code point. Not every Unicode code point represents a Unicode code character, and not every sequence of Unicode code points represents a character or a sequence of Unicode character.
A single C char cannot contain arbitrary Unicode character, while a Haskell Char can, and does. Hence [Char] is (efficiency issues aside) perfectly adequate for dealing with texts written in arbitrary languages.
See above. -- Gaby