
Glynn Clements
It should be possible to specify the encoding explicitly.
Conversely, it shouldn't be possible to avoid specifying the encoding explicitly.
What encoding should a binding to readline or curses use? Curses in C comes in two flavors: the traditional byte version and a wide character version. The second version is easy if we can assume that wchar_t is Unicode, but it's not always available and until recently in ncurses it was buggy. Let's assume we are using the byte version. How to encode strings? A terminal uses an ASCII-compatible encoding. Wide character version of curses convert characters to the locale encoding, and byte version passes bytes unchanged. This means that if a Haskell binding to the wide character version does the obvious thing and passes Unicode directly, then an equivalent behavior can be obtained from the byte version (only limited to 256-character encodings) by using the locale encoding. The locale encoding is the right encoding to use for conversion of the result of strerror, gai_strerror, msg member of gzip compressor state etc. When an I/O error occurs and the error code is translated to a Haskell exception and then shown to the user, why would the application need to specify the encoding and how?
If application code doesn't want to use the locale's encoding, it shouldn't be shoe-horned into doing so because a library developer decided to duck the encoding issues by grabbing whatever encoding was readily to hand (i.e. the locale's encoding).
If a C library is written with the assumption that texts are in the locale encoding, a Haskell binding to such library should respect that assumption. Only some libraries allow to work with different, explicitly specified encodings. Many libraries don't, especially if the texts are not the core of the library functionality but error messages. -- __("< Marcin Kowalczyk \__/ qrczak@knm.org.pl ^^ http://qrnik.knm.org.pl/~qrczak/