
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 On Thu, Jan 15, 2009 at 12:21 PM, Eric Mertens wrote:
On Thu, 2009-01-15 at 12:04 -0500, Gwern Branwen wrote:
Perhaps we're over-thinking all this. Is it a problem in any way to run encodeString over a String that is just normal ASCII (that is, no funky Unicode)?
Eric: could we just mindlessly call encodeString on everything going into spawn/safeSpawn?
ASCII is valid UTF-8 encoded Unicode, however Latin1 is not. So as long as you stick to ASCII (values less than 128) you can treat them as UTF-8.
ISO 8859-1 and ASCII Extended are not valid UTF-8, however (due to their use of the values 128-255)
Does this answer your question?
If I'm understanding you, the answer is 'you can safely call encodeString on ASCII text, and UTF text, but you cannot on ISO8859-1 & ASCII Extended'. So we can either default to calling encodeString, checking whether it's ISO/Extended (and not calling encodeString if True); or we can default to not calling encodeString, and enabling it if a check for UTF returns true. I guess since Alexey has already provided a check for UTF, then we should probably use the latter strategy. - -- gwern -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) iEYEAREKAAYFAklvv9QACgkQvpDo5Pfl1oIYWACcCJclUot9NbxFmQLjdckDwc4H fN0AoJfpM3bD44z7rKHsbEYF8H/7Y9xY =Niw2 -----END PGP SIGNATURE-----