Re: FPS/Data.ByteString candidate

25 Apr 2006

      On Tue, 2006-04-25 at 22:34 +1000, Donald Bruce Stewart wrote:
...
ross:
...
On Tue, Apr 25, 2006 at 12:08:45PM +0300, Einar Karttunen wrote:
...
The name Latin1 is particularly bad since there are many other
single byte encodings around.
The name is quite appropriate, since that is the particular encoding of
Char that is exposed by the interface.  What's bad is that there's no
choice.  Calling it Latin1 is just being honest about that, and leaving
room for modules with other encodings or an interface parameterized
by encoding.
Ok. Duncan, Ketil, Ross and Simon make good points here.
I'll move Data.ByteString.Char -> Data.ByteString.Latin1
If you want to justify that and provide some concrete spec you can add
something like the following to the Data.ByteString.Latin1 docs:

        Manipulate ByteStrings using Char operations. All Chars will be
        truncated to 8 bits.

        More specifically these byte strings are taken to be in the
        subset of Unicode covered by code points 0-255. This covers
        Unicode Basic Latin, Latin-1 Supplement and C0+C1 Controls.

        See: http://www.unicode.org/charts/
        http://www.unicode.org/charts/PDF/U0000.pdf
        http://www.unicode.org/charts/PDF/U0080.pdf

One reason to be so specific is that other definitions of character sets
commonly called "Latin-1" omit the control characters and so do not
cover all bytes 0-255.

I think this allows us to justify reinterpreting Word8s as Chars and
getting valid Unicode code points.

Duncan

Re: FPS/Data.ByteString candidate

Duncan Coutts