Unicode, strings, and Show

Why are we doing this? GHCi, version 7.10.3: http://www.haskell.org/ghc/ :? for help Prelude> "文字" "\25991\23383" Prelude> After all, we don’t print ’a’ as ’\97’. Manuel

On Tue, Mar 29, 2016 at 9:56 PM, Manuel M T Chakravarty
Why are we doing this?
GHCi, version 7.10.3: http://www.haskell.org/ghc/ :? for help Prelude> "文字" "\25991\23383" Prelude>
After all, we don’t print ’a’ as ’\97’.
Manuel
Indeed: • 2016: https://mail.haskell.org/pipermail/haskell-cafe/2016-February/122874.html • 2012: http://stackoverflow.com/questions/14039726/how-to-make-haskell-or-ghci-able... • 2012 again: https://mail.haskell.org/pipermail/haskell-cafe/2012-July/102569.html • 2011: http://stackoverflow.com/questions/5535512/how-to-hack-ghci-or-hugs-so-that-... • 2010: https://mail.haskell.org/pipermail/haskell-cafe/2010-August/082823.html This is a constant source of pain and should be relatively easy to fix. Manuel

There was recently a discussion about it, search for subject "Can we
improve Show instance for non-ascii charcters?"
You can read for yourself but my impression was that people were
generally favorable, but had some backward compatibility worries, and
came up with some workarounds, but no one committed to following up on
a ghci patch.
On Tue, Mar 29, 2016 at 7:26 PM, Manuel M T Chakravarty
Why are we doing this?
GHCi, version 7.10.3: http://www.haskell.org/ghc/ :? for help Prelude> "文字" "\25991\23383" Prelude>
After all, we don’t print ’a’ as ’\97’.
Manuel
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Evan Laforge
There was recently a discussion about it, search for subject "Can we improve Show instance for non-ascii charcters?"
You can read for yourself but my impression was that people were generally favorable, but had some backward compatibility worries, and came up with some workarounds, but no one committed to following up on a ghci patch.
It would be great if someone could create a Trac ticket so we had someplace persistent to track this discussion. Manuel, perhaps you could handle this? Cheers, - Ben

It would be great if someone could create a Trac ticket
Existing ticket: https://ghc.haskell.org/trac/ghc/ticket/11529 ("Show instance of Char should print literals for non-ascii printable charcters")

Thomas Miedema
It would be great if someone could create a Trac ticket
Existing ticket: https://ghc.haskell.org/trac/ghc/ticket/11529 ("Show instance of Char should print literals for non-ascii printable charcters")
Thanks Thomas! I've added a reference to this thread on the ticket. Cheers, - Ben

Thank you for all the replies and especially pointing to this ticket. I think, the discussion on this ticket is actually misleading and there is a simple solution, which I added as a comment. Manuel
Thomas Miedema
: It would be great if someone could create a Trac ticket
Existing ticket: https://ghc.haskell.org/trac/ghc/ticket/11529 https://ghc.haskell.org/trac/ghc/ticket/11529 ("Show instance of Char should print literals for non-ascii printable charcters")

On Wed, Mar 30, 2016 at 9:16 PM, Manuel M T Chakravarty < chak@justtesting.org> wrote:
Thank you for all the replies and especially pointing to this ticket.
I think, the discussion on this ticket is actually misleading and there is a simple solution, which I added as a comment.
That is in fact not simple at all: with that, the ostensibly pure `show` now depends on the user's locale and therefore should be in IO (and you cannot reliably feed it to `read` in a program running in a different locale)! This is why the ticket was concentrating on ghci, where it's at least somewhat reasonable to assume a UTF8 environment. -- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net

Firstly, we have isPrint :: Char -> Bool Are you saying that this type is wrong? Secondly, how often do you feed the output of ’show’ to ’read’ in another locale versus how often is everybody whose whole life is outside of ASCII (i.e., not anglo-centric people) bothered by this shortcoming? (*) Moreover, the argument on the ticket was that changing the current implementation would go against the standard. Now that I am saying, the current implementation is not conforming to the standard, the standard suddenly doesn’t seem to matter. Personally, I would say, when we wrote that standard, we knew what we were doing. Manuel (*) BTW, (read . show) is a pretty bad serialisation story anyway.
Brandon Allbery
: On Wed, Mar 30, 2016 at 9:16 PM, Manuel M T Chakravarty
mailto:chak@justtesting.org> wrote: Thank you for all the replies and especially pointing to this ticket. I think, the discussion on this ticket is actually misleading and there is a simple solution, which I added as a comment.
That is in fact not simple at all: with that, the ostensibly pure `show` now depends on the user's locale and therefore should be in IO (and you cannot reliably feed it to `read` in a program running in a different locale)! This is why the ticket was concentrating on ghci, where it's at least somewhat reasonable to assume a UTF8 environment.
-- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com mailto:allbery.b@gmail.com ballbery@sinenomine.net mailto:ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net http://sinenomine.net/_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

On Wed, Mar 30, 2016 at 9:50 PM, Manuel M T Chakravarty < chak@justtesting.org> wrote:
Firstly, we have
isPrint :: Char -> Bool
Are you saying that this type is wrong?
Secondly, how often do you feed the output of ’show’ to ’read’ in another locale versus how often is everybody whose whole life is outside of ASCII (i.e., not anglo-centric people) bothered by this shortcoming? (*)
Moreover, the argument on the ticket was that changing the current implementation would go against the standard. Now that I am saying, the current implementation is not conforming to the standard, the standard suddenly doesn’t seem to matter. Personally, I would say, when we wrote that standard, we knew what we were doing.
The standard I am aware of is the Report, which deliberately limited the output to the subset which is guaranteed to be usable in all locales. show conforms to this; apparently people want it to *not* conform, and in a way which requires some locale to become the One True Locale. isPrint is, as per the language Report, based on what Char is --- which is Unicode codepoints. Using it for output --- or for input, for that matter --- gets you into locale issues because nobody anywhere guarantees that Unicode codepoints that pass isPrint are representable in every locale. isPrint is not the place to verify that a character can actually be displayed in the current locale. Or have you decided that ghc should require Unicode locales and nothing but Unicode locales from now on? If so, what do you do when the next issue comes up, where Unix is UTF8 and Windows is UTF16? -- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net

One point in the design space that the swift language does, which seems
intersting at least to me, is to have the notion of a character be backed
by a Unicode grapheme cluster, which is a character like sequence of
Unicode code points. Would library support for this at all help this
discussion or problem?
On Wednesday, March 30, 2016, Brandon Allbery
On Wed, Mar 30, 2016 at 9:50 PM, Manuel M T Chakravarty < chak@justtesting.org javascript:_e(%7B%7D,'cvml','chak@justtesting.org');> wrote:
Firstly, we have
isPrint :: Char -> Bool
Are you saying that this type is wrong?
Secondly, how often do you feed the output of ’show’ to ’read’ in another locale versus how often is everybody whose whole life is outside of ASCII (i.e., not anglo-centric people) bothered by this shortcoming? (*)
Moreover, the argument on the ticket was that changing the current implementation would go against the standard. Now that I am saying, the current implementation is not conforming to the standard, the standard suddenly doesn’t seem to matter. Personally, I would say, when we wrote that standard, we knew what we were doing.
The standard I am aware of is the Report, which deliberately limited the output to the subset which is guaranteed to be usable in all locales. show conforms to this; apparently people want it to *not* conform, and in a way which requires some locale to become the One True Locale.
isPrint is, as per the language Report, based on what Char is --- which is Unicode codepoints. Using it for output --- or for input, for that matter --- gets you into locale issues because nobody anywhere guarantees that Unicode codepoints that pass isPrint are representable in every locale. isPrint is not the place to verify that a character can actually be displayed in the current locale.
Or have you decided that ghc should require Unicode locales and nothing but Unicode locales from now on? If so, what do you do when the next issue comes up, where Unix is UTF8 and Windows is UTF16?
-- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com javascript:_e(%7B%7D,'cvml','allbery.b@gmail.com'); ballbery@sinenomine.net javascript:_e(%7B%7D,'cvml','ballbery@sinenomine.net'); unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net

On Wed, Mar 30, 2016 at 11:03 PM, Carter Schonwald < carter.schonwald@gmail.com> wrote:
One point in the design space that the swift language does, which seems intersting at least to me, is to have the notion of a character be backed by a Unicode grapheme cluster, which is a character like sequence of Unicode code points. Would library support for this at all help this discussion or problem?
That's also Perl 6's solution. But in this case it would not help because it's still living in Unicode space and not the I/O locale that is the destination for the character. -- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net

Brandon Allbery
: On Wed, Mar 30, 2016 at 9:50 PM, Manuel M T Chakravarty
mailto:chak@justtesting.org> wrote: Firstly, we have isPrint :: Char -> Bool
Are you saying that this type is wrong?
Secondly, how often do you feed the output of ’show’ to ’read’ in another locale versus how often is everybody whose whole life is outside of ASCII (i.e., not anglo-centric people) bothered by this shortcoming? (*)
Moreover, the argument on the ticket was that changing the current implementation would go against the standard. Now that I am saying, the current implementation is not conforming to the standard, the standard suddenly doesn’t seem to matter. Personally, I would say, when we wrote that standard, we knew what we were doing.
The standard I am aware of is the Report, which deliberately limited the output to the subset which is guaranteed to be usable in all locales. show conforms to this; apparently people want it to *not* conform, and in a way which requires some locale to become the One True Locale.
Where does it say that in the Report?
isPrint is, as per the language Report, based on what Char is --- which is Unicode codepoints. Using it for output — or for input, for that matter --- gets you into locale issues because nobody anywhere guarantees that Unicode codepoints that pass isPrint are representable in every locale. isPrint is not the place to verify that a character can actually be displayed in the current locale.
Yet, this is apparently what the report requires. IMHO, it also makes sense. We have seen that either set up (the current or using ’isPrint’) has imperfections. However, getting \<number> is rarely helpful, whereas using ’isPrint’ is going to be helpful most of the time. Manuel
participants (7)
-
Ben Gamari
-
Brandon Allbery
-
Carter Schonwald
-
Evan Laforge
-
Manuel Gómez
-
Manuel M T Chakravarty
-
Thomas Miedema