GHC rendering of non-ASCII characters configurable?

I'm working on an application that involves processing a lot of Unicode data, and I'm finding the built-in Show implementation for Char to be really inconvenient. Specifically, it renders all characters at U+0080 and above with decimal escapes: Prelude> '\x80' '\128' This is annoying because all of the Unicode charts give the code points in hex, and indeed the charts are split into different PDFs at numbers that are nice and round in hex but not in decimal. So in order to figure out which character I'm looking at, I have to convert back to hex and then look it up in the charts. Is there any way to ask GHC to render super-ASCII characters with their hexadecimal escapes, instead? I'm perfectly happy to write my own custom Show instance, but I don't know how to hook that into ghci's REPL (or, for that matter, the routines that HUnit uses to generate the messages on failed tests, etc.). I'm using GHC 7.4.1 on MacOS 10.7.4. Thanks, Richard

On Sun, Jul 29, 2012 at 7:04 PM, Richard Cobbe
I'm working on an application that involves processing a lot of Unicode data, and I'm finding the built-in Show implementation for Char to be really inconvenient. Specifically, it renders all characters at U+0080 and above with decimal escapes:
Prelude> '\x80' '\128'
This is annoying because all of the Unicode charts give the code points in hex, and indeed the charts are split into different PDFs at numbers that are nice and round in hex but not in decimal. So in order to figure out which character I'm looking at, I have to convert back to hex and then look it up in the charts.
Is there any way to ask GHC to render super-ASCII characters with their hexadecimal escapes, instead? I'm perfectly happy to write my own custom Show instance, but I don't know how to hook that into ghci's REPL (or, for that matter, the routines that HUnit uses to generate the messages on failed tests, etc.).
I'm using GHC 7.4.1 on MacOS 10.7.4.
In GHC HEAD there is a new flag -interactive-print that allows to change the function used for printing values in GHCi. It will be in 7.6.1. That won't help with HUnit output, though. BR, Paolo

On Sun, Jul 29, 2012 at 8:04 PM, Richard Cobbe
This is annoying because all of the Unicode charts give the code points in hex, and indeed the charts are split into different PDFs at numbers that are nice and round in hex but not in decimal. So in order to figure out which character I'm looking at, I have to convert back to hex and then look it up in the charts.
My reading of the Haskell 98 report is that the Show instance for Char *could* use hex escapes, so this is a compiler choice. If there isn't a good reason for this choice, perhaps GHC could change? --Max

On 30 July 2012 04:04, Richard Cobbe
I'm working on an application that involves processing a lot of Unicode data, and I'm finding the built-in Show implementation for Char to be really inconvenient. Specifically, it renders all characters at U+0080 and above with decimal escapes:
Prelude> '\x80' '\128'
This is annoying because all of the Unicode charts give the code points in hex, and indeed the charts are split into different PDFs at numbers that are nice and round in hex but not in decimal. So in order to figure out which character I'm looking at, I have to convert back to hex and then look it up in the charts.
Can I ask what you're doing here? Are you printing individual characters or entire chunks of text? putStrLn and similar IO-based functions (at least for me) will un-escape characters if that helps. Otherwise, are you using Text or String?
Is there any way to ask GHC to render super-ASCII characters with their hexadecimal escapes, instead? I'm perfectly happy to write my own custom Show instance, but I don't know how to hook that into ghci's REPL (or, for that matter, the routines that HUnit uses to generate the messages on failed tests, etc.).
I'm using GHC 7.4.1 on MacOS 10.7.4.
Thanks,
Richard
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
-- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com http://IvanMiljenovic.wordpress.com

On Mon, Jul 30, 2012 at 11:45:38PM +1000, Ivan Lazar Miljenovic wrote:
On 30 July 2012 04:04, Richard Cobbe
wrote: I'm working on an application that involves processing a lot of Unicode data, and I'm finding the built-in Show implementation for Char to be really inconvenient. Specifically, it renders all characters at U+0080 and above with decimal escapes:
Prelude> '\x80' '\128'
This is annoying because all of the Unicode charts give the code points in hex, and indeed the charts are split into different PDFs at numbers that are nice and round in hex but not in decimal. So in order to figure out which character I'm looking at, I have to convert back to hex and then look it up in the charts.
Can I ask what you're doing here? Are you printing individual characters or entire chunks of text?
Mostly, I'm working with expressions of type String, rather than Text; the Char above was merely an example to demonstrate the problem. The two I/O cases that most concern me are evaluating a String expression at the GHCi REPL, and working with HUnit test cases built around String expressions. I suppose I could wrap putStrLn around all string exprs at the repl, but a) that's a pain; b) it's important for this app that I be able to distinguish between precomposed characters and combining characters; and c) some of the characters I'm dealing with are very similar in my terminal fonts, such as U+1F00 and U+1F01. It's much nicer to be able to just see the code points. The other problem is with HUnit tests. When a test fails (under runTestTT, anyway) you get a diagnostic printed to stdout. I'm not sure exactly what logic HUnit uses to produce these error messages, but it's almost certainly calling 'show' on the underlying strings. So there's no place, as far as I know, where I can insert a call to putStrLn. Richard

On 31 July 2012 21:01, Richard Cobbe
On Mon, Jul 30, 2012 at 11:45:38PM +1000, Ivan Lazar Miljenovic wrote:
On 30 July 2012 04:04, Richard Cobbe
wrote: I'm working on an application that involves processing a lot of Unicode data, and I'm finding the built-in Show implementation for Char to be really inconvenient. Specifically, it renders all characters at U+0080 and above with decimal escapes:
Prelude> '\x80' '\128'
This is annoying because all of the Unicode charts give the code points in hex, and indeed the charts are split into different PDFs at numbers that are nice and round in hex but not in decimal. So in order to figure out which character I'm looking at, I have to convert back to hex and then look it up in the charts.
Can I ask what you're doing here? Are you printing individual characters or entire chunks of text?
Mostly, I'm working with expressions of type String, rather than Text;
Any particular reason why? Using Text will probably solve your problem and give you a performance improvement at the same time. -- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com http://IvanMiljenovic.wordpress.com

On Tue, Jul 31, 2012 at 09:17:34PM +1000, Ivan Lazar Miljenovic wrote:
On 31 July 2012 21:01, Richard Cobbe
wrote: On Mon, Jul 30, 2012 at 11:45:38PM +1000, Ivan Lazar Miljenovic wrote:
Can I ask what you're doing here? Are you printing individual characters or entire chunks of text?
Mostly, I'm working with expressions of type String, rather than Text;
Any particular reason why? Using Text will probably solve your problem and give you a performance improvement at the same time.
Well, I initially went with String because I didn't want to clutter up my code with all of the calls to 'pack', especially around string literals. I'm open to being convinced that it's worth it to switch, though. In any case, while Text is undoubtedly faster than String, it unfortunately doesn't solve my problem with output rendering: [vimes:~]$ ghci GHCi, version 7.4.1: http://www.haskell.org/ghc/ :? for help Loading package ghc-prim ... linking ... done. Loading package integer-gmp ... linking ... done. Loading package base ... linking ... done. Prelude> :m +Data.Text Prelude Data.Text> pack "\x1f00" Loading package array-0.4.0.0 ... linking ... done. Loading package bytestring-0.9.2.1 ... linking ... done. Loading package deepseq-1.3.0.0 ... linking ... done. Loading package text-0.11.2.0 ... linking ... done. "\7936" Prelude Data.Text> pack "\x1f01" "\7937" Richard

On Wed, Aug 1, 2012 at 2:35 AM, Richard Cobbe
Well, I initially went with String because I didn't want to clutter up my code with all of the calls to 'pack', especially around string literals. I'm open to being convinced that it's worth it to switch, though.
For string literals, you can turn on OverloadedStrings to get rid of the calls to 'pack'. Erik
participants (5)
-
Erik Hesselink
-
Ivan Lazar Miljenovic
-
Max Rabkin
-
Paolo Capriotti
-
Richard Cobbe