
On Mon, Mar 26, 2012 at 9:42 AM, Christian Siefkes
On 03/26/2012 05:50 PM, Johan Tibell wrote:
Normalization isn't quite enough unfortunately, as it does solve e.g.
upcase = map toUppper
You need all-at-once functions on strings (which we could add.) I'm just pointing out that most (all?) list functions do the wrong thing when used on Strings.
Hm, do you have any other examples besides toUpper/toLower?
length, cons, head, tail, filter, folds, anything that works on an element-by-element basis.
Also, that example is not really an argument against using list functions on strings (which, by any reasonable definition, seem to be "sequences of characters" -- whether that sequence is represented as a list, an array, or something else, seems more like an implementation detail to me).
I agree on the second part. As someone pointed out earlier, we should be careful in using the word character as the Unicode code point doesn't correspond well to the commonly used concept of a character. What we have today is really: type String = [CodePoint] What you would normally think of as a character might consists of several code points.
Rather, it indicates the fact that Char.toUpper may have to wrong type. If its type was Char -> String instead of Char -> Char, it could handle things like toUppper 'ß' == "SS" correctly. Then stuff like
upcase = concatMap toUppper
would work fine.
Yes.
As it is, the problem seems to be with Char, not with [Char].
[Char] is a semantically OK representation of a Unicode string, using an array like text does is simply an optimization. However, using the list function defined by the Prelude is not a good idea if you want to process a Unicode string correctly. -- Johan