Behavior change of Data.Char

Hi, It seems to me that some characters of GHC 7.10.1RC2 behave differently from those of GHC 7.8.4: 7.8.4 7.10.1RC2 isLower (char 170) True False isSymbol (chr 182) True False isPunctuation (chr 182) Fase True Is this intentional? I noticed this because I received a bug report: https://github.com/kazu-yamamoto/word8/issues/3 As you can see, 167 also behaves differently. --Kazu

7.10 uses a newer version of Unicode, which could explain differences.
On Thu, Feb 19, 2015 at 12:19 AM, Kazu Yamamoto
Hi,
It seems to me that some characters of GHC 7.10.1RC2 behave differently from those of GHC 7.8.4:
7.8.4 7.10.1RC2 isLower (char 170) True False isSymbol (chr 182) True False isPunctuation (chr 182) Fase True
Is this intentional?
I noticed this because I received a bug report:
https://github.com/kazu-yamamoto/word8/issues/3
As you can see, 167 also behaves differently.
--Kazu _______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

David, Thank you for the information. I would like to know whether or not this behavior changes are intentional. If they are bugs, we need to fix them before releasing GHC 7.10.1. --Kazu
7.10 uses a newer version of Unicode, which could explain differences.
On Thu, Feb 19, 2015 at 12:19 AM, Kazu Yamamoto
wrote: Hi,
It seems to me that some characters of GHC 7.10.1RC2 behave differently from those of GHC 7.8.4:
7.8.4 7.10.1RC2 isLower (char 170) True False isSymbol (chr 182) True False isPunctuation (chr 182) Fase True
Is this intentional?
I noticed this because I received a bug report:
https://github.com/kazu-yamamoto/word8/issues/3
As you can see, 167 also behaves differently.
--Kazu _______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

These are not bugs — these are changes in the Unicode standard. See http://www.unicode.org/Public/6.0.0/ucd/UnicodeData.txt (old) http://www.unicode.org/Public/7.0.0/ucd/UnicodeData.txt (new) On 19/02/15 12:14, Kazu Yamamoto (山本和彦) wrote:
David,
Thank you for the information.
I would like to know whether or not this behavior changes are intentional. If they are bugs, we need to fix them before releasing GHC 7.10.1.
--Kazu
7.10 uses a newer version of Unicode, which could explain differences.
On Thu, Feb 19, 2015 at 12:19 AM, Kazu Yamamoto
wrote: Hi,
It seems to me that some characters of GHC 7.10.1RC2 behave differently from those of GHC 7.8.4:
7.8.4 7.10.1RC2 isLower (char 170) True False isSymbol (chr 182) True False isPunctuation (chr 182) Fase True
Is this intentional?
I noticed this because I received a bug report:
https://github.com/kazu-yamamoto/word8/issues/3
As you can see, 167 also behaves differently.
--Kazu _______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

On 2015-02-19 at 06:19:18 +0100, Kazu Yamamoto (山本和彦) wrote:
It seems to me that some characters of GHC 7.10.1RC2 behave differently from those of GHC 7.8.4:
7.8.4 7.10.1RC2 isLower (char 170) True False
Fwiw, the motivation for that particular change may be (I'm just guessing here) to have the following condition hold: \c -> isLower c `implies` (not . isLower . toUpper) c i.e. if something is 'lower-case', then applying 'toUpper' should result in a character that is not 'lower-case' anymore. This didn't hold with 7.8.4's Unicode 6, but now holds with 7.10.1's Unicode 7 definitions. Cheers, hvr

It'd be good to document this condition/invariant in the Haddocks, wouldn't it?! Simon | -----Original Message----- | From: ghc-devs [mailto:ghc-devs-bounces@haskell.org] On Behalf Of | Herbert Valerio Riedel | Sent: 19 February 2015 10:42 | To: Kazu Yamamoto | Cc: libraries@haskell.org; ghc-devs@haskell.org | Subject: Re: Behavior change of Data.Char | | On 2015-02-19 at 06:19:18 +0100, Kazu Yamamoto (山本和彦) wrote: | > It seems to me that some characters of GHC 7.10.1RC2 behave | > differently from those of GHC 7.8.4: | > | > 7.8.4 7.10.1RC2 | > isLower (char 170) True False | | Fwiw, the motivation for that particular change may be (I'm just | guessing here) to have the following condition hold: | | \c -> isLower c `implies` (not . isLower . toUpper) c | | i.e. if something is 'lower-case', then applying 'toUpper' should | result in a character that is not 'lower-case' anymore. This didn't | hold with 7.8.4's Unicode 6, but now holds with 7.10.1's Unicode 7 | definitions. | | Cheers, | hvr | _______________________________________________ | ghc-devs mailing list | ghc-devs@haskell.org | http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
participants (5)
-
David Feuer
-
Herbert Valerio Riedel
-
Kazu Yamamoto
-
Roman Cheplyaka
-
Simon Peyton Jones