Re: Haskell Platform proposal: Add case-insensitive and Haskell 98/2010 compliance

On Mon, Jan 14, 2013 at 11:29 AM, Ben Millwood
This is easily fixed, although may be controversial: remove the ByteString instances.
The bytestring instances are what it's most often used for! The haddocks
warn about this issue. And: yes, we do want an ASCII case fold. If we
didn't, we'd be using Text. I've seen an Ascii newtype proposed (so that
you have to work extra hard to have the obvious semantics), but IMO nerfing
this particular library to make it harder for people to make beginner
unicode mistakes is silly. They'll find a way to make them anyways.
One thing the haddocks could maybe do better is to have a bigger, louder,
and more informative disclaimer about the issue that points users towards
the "modern default": use text, and encode/decode as UTF-8.
G
--
Gregory Collins

On Mon, Jan 14, 2013 at 9:43 AM, Gregory Collins
The bytestring instances are what it's most often used for! The haddocks warn about this issue. And: yes, we do want an ASCII case fold. If we didn't, we'd be using Text.
Note: case-insensitive provides ASCII case folding when used on ByteString. However, HTTP headers are in ISO-8859-1. Hence, using this for case folding HTTP headers isn't technically correct: It will fail for headers and other case-insensitive tokens with ISO-8859-1 accented characters., though admittedly in practice one doesn't see them. That bit of pedantry out of the way.... ... let's do this! I'd like to see the haddock for the ByteString cases better expanded. I'll suggest some alternate language later this weekend. - Mark

On Sun, Jan 20, 2013 at 08:21:32AM -0800, Mark Lentczner wrote:
On Mon, Jan 14, 2013 at 9:43 AM, Gregory Collins
wrote: The bytestring instances are what it's most often used for! The haddocks warn about this issue. And: yes, we do want an ASCII case fold. If we didn't, we'd be using Text.
Note: case-insensitive provides ASCII case folding when used on ByteString. However, HTTP headers are in ISO-8859-1. Hence, using this for case folding HTTP headers isn't technically correct: It will fail for headers and other case-insensitive tokens with ISO-8859-1 accented characters., though admittedly in practice one doesn't see them.
That bit of pedantry out of the way....
Interestingly, case-insensitive may well support this case by accident! The case-folding it uses is just that from Data.ByteString.Char8, which is Unicode code points 0-255, which includes (if I recall correctly) latin-1 as a subset. Testing this may be misleading because Data.ByteString.Char8.putStrLn doesn't do anything clever, but you should find that packing, foldCase, unpacking does the right thing. Of course, the documentation should be amended to state this if we are interested in making it a guarantee. regards, Ben Millwood PS. Can I ask that people remove bm380 at srcf dot net from their CC lists in this thread? It's an artifact of when I accidentally replied from the wrong address and it's meaning I'm getting some stuff twice.

Okay, my bad - case-insensitive *IS* doing the right thing for ByteString. (At least if you believe that treating ByteString as ISO-8859-1 encoded is the right thing!) However, note that it is folding two different ways: In 0.4.0.2 and earlier, it was mapping Data.Char's toLower for all types except Text (where it was using Text's toCaseFold)
From 0.4.0.3 and on, it is using Text's toCaseFold for all types, except ByteString, where it uses it's own toLower function optimized for just ISO-8859-1.
- Mark

On 21 January 2013 00:29, Mark Lentczner
In 0.4.0.2 and earlier, it was mapping Data.Char's toLower for all types except Text (where it was using Text's toCaseFold) From 0.4.0.3 and on, it is using Text's toCaseFold for all types, except ByteString, where it uses it's own toLower function optimized for just ISO-8859-1.
The output should be identical between 0.4.0.2 and 0.4.0.3. In the hp branch I changed the documentation to mention ISO-8859-1 instead of ASCII: https://github.com/basvandijk/case-insensitive/tree/hp Feel free to send me pull requests for better phrasing. Bas
participants (4)
-
Bas van Dijk
-
Ben Millwood
-
Gregory Collins
-
Mark Lentczner