Re: String != [Char]

26 Mar 2012

      Hi all,

Johan Tibell wrote:
...
Normalization isn't quite enough unfortunately, as it does solve e.g.
upcase = map toUppper
You need all-at-once functions on strings (which we could add.) I'm
just pointing out that most (all?) list functions do the wrong thing
when used on Strings.
So, is the argument to deprecate Char, then? As long as Haskell
allows Chars to be handled in isolation, it would seem impossible
to prevent naive users from accidentally stumbling over the
complexities of Unicode?

And, to be honest, even with a well thought-out Text API, I don't
think it is going to be possible to hide the complexities of
Unicode. For example, just take a quick look at

    http://en.wikipedia.org/wiki/Unicode_equivalence

There are canonical equivalence and compatibility, and each
has two normal forms (fully composed and fully decomposed),
and "each of these four normal forms can be used in text processing".

As an example of the difference between "equivalent" and "compatible",
the ligature "ff" is "compatible - but not canonically equivalent"
to a sequence of two characters latin "f", meaning they "may be treated 
the same way in some applications (such as sorting and indexing), but 
not in others; and may be substituted for each other in some situations, 
but not in others".

Is it realistic to think that if only Haskell used Text and not
String = [Char], a naive user/beginner would be able to write
correct code for all manner of text processing tasks without
needing to understand a great deal about Unicode?

I'm sorry, but I'm rather sceptical.

So I reiterate that I see little if any gain, be it in terms of making
life simpler for beginners, making Haskell more "multi cultural", or
giving Haskell applications in general a performance boost, in
deprecating String = [Char] and mandating the use of Text.
But the costs would be massive.

Best,

/Henrik

-- 
Henrik Nilsson
School of Computer Science
The University of Nottingham
nhn@cs.nott.ac.uk