
On Fri, Aug 28, 2009 at 3:50 PM, Duncan
Coutts
On Wed, 2009-08-26 at 16:14 +0300, Yitzchak Gale wrote:
I am now beginning to lean towards Ketil's suggestion that on POSIX platforms we should always use UTF-8. We then need a prominent warning in the documentation that if you need something else, like the current locale, decode it yourself.
That's nice in that it makes the function pure, or equivalently so that it does not need a locale parameter.
Note that it is becoming increasingly rare for people to use non-UTF-8 locales anywhere in the world, and even then it's likely ignored by many UIs. So I'm inclined against cluttering the API with convenience functions for other encodings, as Johan is suggesting.
I agree that this would make the API much simpler; but I'm wary of broad statements like the above. My (very vague) impression was that many Japanese users, for example, still use non-Unicode encodings. I think that glib is an interesting example. Its developers advocate pretty strongly for everyone to use utf-8 filenames; but even they provide a simple way for the user of any glib program to override that behavior by setting G_FILENAME_ENCODING=@locale. As another example, Python v.3, which recently redesigned its Unicode interface, also still uses the locale for filenames rather than solely utf-8. The following interview with Guido from January has a good take on why they did that (about halfway through the article): http://broadcast.oreilly.com/2009/01/the-evolution-of-python-3.html If we really want a pure FilePath->String conversion, then perhaps we could make the rts check the locale once at the start of the program, and have every subsequent conversion use that locale. This would be safe from order-of-operation changes; though it would be possible for the same pure code to behave differently in two different program runs...so I'm unsure about that solution. Best, -Judah