
Hi all, Johan Tibell wrote:
Normalization isn't quite enough unfortunately, as it does solve e.g.
upcase = map toUppper
You need all-at-once functions on strings (which we could add.) I'm just pointing out that most (all?) list functions do the wrong thing when used on Strings.
So, is the argument to deprecate Char, then? As long as Haskell allows Chars to be handled in isolation, it would seem impossible to prevent naive users from accidentally stumbling over the complexities of Unicode? And, to be honest, even with a well thought-out Text API, I don't think it is going to be possible to hide the complexities of Unicode. For example, just take a quick look at http://en.wikipedia.org/wiki/Unicode_equivalence There are canonical equivalence and compatibility, and each has two normal forms (fully composed and fully decomposed), and "each of these four normal forms can be used in text processing". As an example of the difference between "equivalent" and "compatible", the ligature "ff" is "compatible - but not canonically equivalent" to a sequence of two characters latin "f", meaning they "may be treated the same way in some applications (such as sorting and indexing), but not in others; and may be substituted for each other in some situations, but not in others". Is it realistic to think that if only Haskell used Text and not String = [Char], a naive user/beginner would be able to write correct code for all manner of text processing tasks without needing to understand a great deal about Unicode? I'm sorry, but I'm rather sceptical. So I reiterate that I see little if any gain, be it in terms of making life simpler for beginners, making Haskell more "multi cultural", or giving Haskell applications in general a performance boost, in deprecating String = [Char] and mandating the use of Text. But the costs would be massive. Best, /Henrik -- Henrik Nilsson School of Computer Science The University of Nottingham nhn@cs.nott.ac.uk