Re: [Haskell-cafe] invalid character encoding

16 Mar 2005

      Glynn Clements  writes:
...
...
It should be possible to specify the encoding explicitly.
Conversely, it shouldn't be possible to avoid specifying the
encoding explicitly.
What encoding should a binding to readline or curses use?

Curses in C comes in two flavors: the traditional byte version and a
wide character version. The second version is easy if we can assume
that wchar_t is Unicode, but it's not always available and until
recently in ncurses it was buggy. Let's assume we are using the byte
version. How to encode strings?

A terminal uses an ASCII-compatible encoding. Wide character version
of curses convert characters to the locale encoding, and byte version
passes bytes unchanged. This means that if a Haskell binding to the
wide character version does the obvious thing and passes Unicode
directly, then an equivalent behavior can be obtained from the byte
version (only limited to 256-character encodings) by using the locale
encoding.

The locale encoding is the right encoding to use for conversion of the
result of strerror, gai_strerror, msg member of gzip compressor state
etc. When an I/O error occurs and the error code is translated to a
Haskell exception and then shown to the user, why would the application
need to specify the encoding and how?
...
If application code doesn't want to use the locale's encoding, it
shouldn't be shoe-horned into doing so because a library developer
decided to duck the encoding issues by grabbing whatever encoding
was readily to hand (i.e. the locale's encoding).
If a C library is written with the assumption that texts are in the
locale encoding, a Haskell binding to such library should respect that
assumption.

Only some libraries allow to work with different, explicitly specified
encodings. Many libraries don't, especially if the texts are not the
core of the library functionality but error messages.

-- 
   __("<         Marcin Kowalczyk
   \__/       qrczak@knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/

Re: [Haskell-cafe] invalid character encoding

Marcin 'Qrczak' Kowalczyk