
On Wed, 2007-11-28 at 17:38 -0200, Maurício wrote:
(...) When it's phrased as "truncates to 8 bits" it sounds so simple, surely all we need to do is not truncate to 8 bits right?
The problem is, what encoding should it pick? UTF8, 16, 32, EBDIC? (...)
One sensible suggestion many people have made is that H98 file IO should use the locale encoding and do Unicode/String <-> locale conversion. (...)
I'm really afraid of solutions where the behavior of your program changes with an environment variable that not everybody has configured properly, or even know to exist.
Be afraid of all your standard Unix utils in that case. They are all locale dependent, not just for encoding but also for sorting order and the language of messages. Using the locale is standard Unix behaviour (and these days the locale usually specifies UTF8 encoding). On OSX the default should be UTF8. On Windows it's a bit less clear, supposedly text files should use UTF16 but nobody actually does that as far as I can see. Duncan