
On Tue, 2006-04-25 at 22:34 +1000, Donald Bruce Stewart wrote:
ross:
On Tue, Apr 25, 2006 at 12:08:45PM +0300, Einar Karttunen wrote:
The name Latin1 is particularly bad since there are many other single byte encodings around.
The name is quite appropriate, since that is the particular encoding of Char that is exposed by the interface. What's bad is that there's no choice. Calling it Latin1 is just being honest about that, and leaving room for modules with other encodings or an interface parameterized by encoding.
Ok. Duncan, Ketil, Ross and Simon make good points here. I'll move Data.ByteString.Char -> Data.ByteString.Latin1
If you want to justify that and provide some concrete spec you can add something like the following to the Data.ByteString.Latin1 docs: Manipulate ByteStrings using Char operations. All Chars will be truncated to 8 bits. More specifically these byte strings are taken to be in the subset of Unicode covered by code points 0-255. This covers Unicode Basic Latin, Latin-1 Supplement and C0+C1 Controls. See: http://www.unicode.org/charts/ http://www.unicode.org/charts/PDF/U0000.pdf http://www.unicode.org/charts/PDF/U0080.pdf One reason to be so specific is that other definitions of character sets commonly called "Latin-1" omit the control characters and so do not cover all bytes 0-255. I think this allows us to justify reinterpreting Word8s as Chars and getting valid Unicode code points. Duncan