
On Tue, 2006-04-25 at 13:08 +0100, Simon Marlow wrote:
Donald Bruce Stewart wrote:
The code has been partioned into: Data.ByteString a Word8 only layer. All functions are in terms of Word8 Data.ByteString.Char provides an ascii/byte-Char layer over the Word8 layer.
Ok, but where would we put a UTF8 version of the Char layer? I'm thinking that "Latin1" would be more correct than "Char", and leaves room for adding UTF8 and other encodings later.
As others have pointed out, it's not strictly Latin1. Don and I reckon it's probably safe to say that the current Data.ByteString.Char layer is ok for any 8-bit fixed-width encoding with ASCII as a subset, so that means it's probably ok for many of the Latin* encodings. How would we distinguish a full fixed0width 4-byte Unicode version? A purist mgiht say that this should be Data.ByteString.Char since a Char really is a 4-byte Unicode value and then change the current Data.ByteString.Char to be Data.ByteString.Char8 or something like that. Duncan