
On Wed, 2006-04-26 at 02:16 +0300, Einar Karttunen wrote:
This is very useful for many purposes and does not mean that there should not be a fancy UTF8 module. Rather than arguing about killing this, wouldn't it be more productive to create the UTF8 module?
I've been following this thread with some frowning. I can see that some people want to dish out text over the network *really fast* and thus would like the ability to emit pure ASCII without the overhead of 4 bytes per character. Still, I don't see the need for a .Latin1 module next to a .Word8 module. When it comes to UTF8, I cringe. Dealing with UTF8 is such a nightmare to get right and it won't show up until you're test some Chinese texts with it (or are there other common 4-byte characters?). Hence, UTF8 should not be a common interface for application developers. Haskell has the advantage that changing Char form 8 bits to 32 bits doesn't add to the space consumption of lists. With packed string the situation is different, but still, I propose to - have a library that deals with packed strings of 32-bit Haskell Char - have a library that deals with packed Word8 sequences This way, it will hurt if you touch the bare-metal Word8 representation, but then, using Word8 sequences is quite an optimisation that you don't use when you start developing an application. A simplistic solution like this avoids the whole discussion on whether there should be an Ord or toUpper for Latin1, or how to coerce a packed Latin1 string to a packed Word8 representation. Axel.