Re: Data.ByteString candidate 3

25 Apr 2006

      On 25.04 13:46, John Meacham wrote:
...
I think all we really need are
Data.ByteString
Data.PackedString
(Though, I suppose Latin1 could be useful)
Using the Word8 API is not very pleasant, because all
character constants etc are not Word8.

As for Latin1 - what semantics do we use for toUpper/toLower and Ord?
Using the unicode ones or locale seems the sensible thing if the data
really is Latin1.

Thus a simple wrapper to the Word8 api is desirable. Make it follow
few simple rules:
* c2w . w2c = id  (conversion is a bijection)
* ascii characters translated correctly
* toLower/toUpper for ascii
* Ord by byte values.

This is very useful for many purposes and does not mean that there
should not be a fancy UTF8 module. Rather than arguing about killing
this, wouldn't it be more productive to create the UTF8 module?
...
but note, do the people that want latin1 just need ASCII? because it should be
noted that if we have a UTF8 PackedString, then we can make
ASCII-specific access routines that are just as fast as the ones in the
Latin1 variety without giving up the ability to store full unicode
values in the string.
Case conversions and ordering need to be different. Thus we need to newtype
things to avoid having two conflicting Ord instances. The UTF8 layer
should provide:

* Unicode toUpper/toLower
* Unicode collation (UCA) for Ord
* Graphemes (see Perl6 for good ways to do this)
* Normalisation

- Einar Karttunen