
Duncan Coutts wrote:
On Wed, 2007-11-28 at 17:38 -0200, Maurício wrote:
(...) When it's phrased as "truncates to 8 bits" it sounds so simple, surely all we need to do is not truncate to 8 bits right?
The problem is, what encoding should it pick? UTF8, 16, 32, EBDIC? (...)
One sensible suggestion many people have made is that H98 file IO should use the locale encoding and do Unicode/String <-> locale conversion. (...)
I'm really afraid of solutions where the behavior of your program changes with an environment variable that not everybody has configured properly, or even know to exist.
Be afraid of all your standard Unix utils in that case. They are all locale dependent, not just for encoding but also for sorting order and the language of messages.
Language of messages is quite different from language of a file you read. Suppose I am English, and I have a russian friend, Vlad. My default locale is, say, latin-1, and his is something cyrillic. I might well open files including my own files, and his files. The locale of the current user is simple no guide to the correct encoding to read a file in, and not a particularly reliable guide to writing a file out. Locale makes perfect sense for messages (you are communicating with the user, his locale tells you what language he speaks). It makes much less sense for file IO. Jules