Re: To show or not to show french accents

Good afternoon,
Well, I think there should probably be some internationalisation mechanism that
tells the "show" function (to name one), according to some configuration, how
to interpret a byte as a character.
Frankly, I see no good reason why we should be satisfied we the dinosaurus 7
bits except perhaps because 7 bits is sufficient for english.
I am talking about respect for non english speaking people.
But if nobody cares ...
Cheers,
Francis Girard
LE CONQUET
France
Selon Max Kirillov
Good morning,
The following haskell program :
--<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< module Main where
accentLetters :: String accentLetters = "éàô"
main :: IO () main = do putStr (show accentLetters) -->>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
after being compiled will give the result :
"\233\224\244"
But, exactly the same program, without the "show" function will give the result:
éàô
Is there some way to have "show" show all the printable characters, even
On Tue, Dec 16, 2003 at 07:49:26AM +0100, francis.girard@free.fr wrote: those
represented by a value greater than the US-ASCII 7 bits (127) ?
The specific octet may be printable character or not depending on your charset. For instance, your letters are printable in koi8-r (showing upper Russian I YU T), but not in cp866 (al least recode cp866..koi8-r fails on them).
The "show" function represents your over-127 bytes in portable and readable (by read) way and, I think, it does right.
-- Max _______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Hallo! On Thu, Dec 18, 2003 at 01:55:27PM +0100, francis.girard@free.fr wrote:
Well, I think there should probably be some internationalisation mechanism that tells the "show" function (to name one), according to some configuration, how to interpret a byte as a character.
My understanding is that `show' should work with `read' and possibly produce output that can be parsed by the Haskell parser. It is not a pretty printing function.
Frankly, I see no good reason why we should be satisfied we the dinosaurus 7 bits except perhaps because 7 bits is sufficient for english.
I am talking about respect for non english speaking people.
But if nobody cares ...
I, too, speak a language that can't be fully expressed in ASCII, but I do not think that the behaviour of `show' should be changed in this respect. Greetings, Carsten -- Carsten Schultz (2:38, 33:47), FB Mathematik, FU Berlin http://carsten.fu-mathe-team.de/ PGP/GPG key on the pgp.net key servers, fingerprint on my home page.

Good evening,
OK. I don't know Haskell enough to argue.
But I can't resist pointing out that reading a single byte having the value 233
(that is 'é') is certainly simpler than reading the four characters "\233",
parse it, and translate it into a single byte having the value 233 representing
no matter what character in your character table.
But, I don't care that much and I'm sorry for this.
Best regards,
Francis Girard
LE CONQUET
France
Selon Carsten Schultz
Hallo!
On Thu, Dec 18, 2003 at 01:55:27PM +0100, francis.girard@free.fr wrote:
Well, I think there should probably be some internationalisation mechanism that tells the "show" function (to name one), according to some configuration, how to interpret a byte as a character.
My understanding is that `show' should work with `read' and possibly produce output that can be parsed by the Haskell parser. It is not a pretty printing function.
Frankly, I see no good reason why we should be satisfied we the dinosaurus 7 bits except perhaps because 7 bits is sufficient for english.
I am talking about respect for non english speaking people.
But if nobody cares ...
I, too, speak a language that can't be fully expressed in ASCII, but I do not think that the behaviour of `show' should be changed in this respect.
Greetings,
Carsten
-- Carsten Schultz (2:38, 33:47), FB Mathematik, FU Berlin http://carsten.fu-mathe-team.de/ PGP/GPG key on the pgp.net key servers, fingerprint on my home page.
Original message :
The following haskell program :
--<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< module Main where
accentLetters :: String accentLetters = "éàô"
main :: IO () main = do putStr (show accentLetters) -->>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
after being compiled will give the result :
"\233\224\244"
But, exactly the same program, without the "show" function will give the result:
éàô
Is there some way to have "show" show all the printable characters, even those represented by a value greater than the US-ASCII 7 bits (127) ?

On 2003-12-18 at 16:40+0100 francis.girard@free.fr wrote:
Good evening,
OK. I don't know Haskell enough to argue.
But I can't resist pointing out that reading a single byte having the value 233 (that is 'é')
The problem is that if you are reading single bytes, 233 is not necessarily é. It might be 'shch' if you are in Russia, or iota if you are in Greece. While it's (almost) completely reasonable to expect 233 to display as é in Western Europe, it's completely unreasonable to hold that expectation across borders.
is certainly simpler than reading the four characters "\233", parse it, and translate it into a single byte
but it isn't a single byte internally. Indeed, if you are in Russia you could reasonably expect reading a single byte 233 to be converted to the internal code 1257 (if I got the arithmetic right). Since Haskell specifies unicode, if you are operating in a Russian locale that's what ought to happen. What I don't understand is why you want show for this. As I mentioned earlier, to output strings and get accented characters, all you have to do is to output the string with putStr, and voilà, les signes diacritiques. Jón -- Jón Fairbairn Jon.Fairbairn@cl.cam.ac.uk

I would support the point of view that show should output escapes when showing characters outside ASCII. This is sort of a "transport" format (together with read), therefore it must be a GCD for all possible input encodings. UTF-8 might be alternative, but it would require to be equally supported by all Haskell implementations. -- Dmitry M. Golubovsky South Lyon, MI

Hello,
What I don't understand is why you want show for this. As I mentioned earlier, to output strings and get accented characters, all you have to do is to output the string with putStr, and voilà, les signes diacritiques.
Sometimes, I want to do cheap and dirty test programs that "shows" data structures involving some strings. Again, the cheap and dirty way to do this is to derive the data structure from "Show" using the "deriving" keyword. But then, you are sometimes barely able to just read the outputed string. Therefore I have to redefine "show" myself ... Which is a lot less cheap and a lot more dirty.
The problem is that if you are reading single bytes, 233 is not necessarily é. It might be 'shch' if you are in Russia,
What the byte should represents is not relevant here. All I wanted to point out, is that it is easier to read a character as a one byte (or two) instead of 4 and translate 4 characters into a numerical value, NO MATTER WHAT THE BYTE IS SUPPOSED TO REPRESENT. This was to answer an opposition that was maintaining that "show" was meant to be "Read". I simply answered that that was not a valid argument.
arithmetic right). Since Haskell specifies unicode, if you are operating in a Russian locale that's what ought to happen.
I naively tought that unicode would solve these kind of problems. But yet we're
stucked with these pesky 7 bits ...
Regards,
Francis Girard
LE CONQUET
France
Selon Jon Fairbairn
On 2003-12-18 at 16:40+0100 francis.girard@free.fr wrote:
Good evening,
OK. I don't know Haskell enough to argue.
But I can't resist pointing out that reading a single byte having the value 233 (that is 'é')
The problem is that if you are reading single bytes, 233 is not necessarily é. It might be 'shch' if you are in Russia, or iota if you are in Greece. While it's (almost) completely reasonable to expect 233 to display as é in Western Europe, it's completely unreasonable to hold that expectation across borders.
is certainly simpler than reading the four characters "\233", parse it, and translate it into a single byte
but it isn't a single byte internally. Indeed, if you are in Russia you could reasonably expect reading a single byte 233 to be converted to the internal code 1257 (if I got the arithmetic right). Since Haskell specifies unicode, if you are operating in a Russian locale that's what ought to happen.
What I don't understand is why you want show for this. As I mentioned earlier, to output strings and get accented characters, all you have to do is to output the string with putStr, and voilà, les signes diacritiques.
Jón
-- Jón Fairbairn Jon.Fairbairn@cl.cam.ac.uk
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
participants (4)
-
Carsten Schultz
-
Dimitry Golubovsky
-
francis.girard@free.fr
-
Jon Fairbairn