
On 25.04 13:46, John Meacham wrote:
I think all we really need are
Data.ByteString Data.PackedString
(Though, I suppose Latin1 could be useful)
Using the Word8 API is not very pleasant, because all character constants etc are not Word8. As for Latin1 - what semantics do we use for toUpper/toLower and Ord? Using the unicode ones or locale seems the sensible thing if the data really is Latin1. Thus a simple wrapper to the Word8 api is desirable. Make it follow few simple rules: * c2w . w2c = id (conversion is a bijection) * ascii characters translated correctly * toLower/toUpper for ascii * Ord by byte values. This is very useful for many purposes and does not mean that there should not be a fancy UTF8 module. Rather than arguing about killing this, wouldn't it be more productive to create the UTF8 module?
but note, do the people that want latin1 just need ASCII? because it should be noted that if we have a UTF8 PackedString, then we can make ASCII-specific access routines that are just as fast as the ones in the Latin1 variety without giving up the ability to store full unicode values in the string.
Case conversions and ordering need to be different. Thus we need to newtype things to avoid having two conflicting Ord instances. The UTF8 layer should provide: * Unicode toUpper/toLower * Unicode collation (UCA) for Ord * Graphemes (see Perl6 for good ways to do this) * Normalisation - Einar Karttunen