New subject: Proposal #3455: Add a setting to change how Unicode encoding errors are handled

23 Aug 2009

      I proposal that we augment ghc-6.12.1's support for Unicode Handles
by adding the following functions to System.IO:

hSetOnEncodingError :: Handle -> OnEncodingError -> IO ()
hGetOnEncodingError :: Handle -> IO OnEncodingError

as well as the enumeration `OnEncodingError` with three constructors:

 - `ThrowEncodingError`: Throw an exception at the first encoding or
 decoding
   error.
 - `SkipEncodingError`: Skip all invalid bytes or characters.
 - `TranslitEncodingError`: Replace undecodable bytes with u+FFFD, and
 unencodable characters with '?'.

I have implemented this functionality in a patch attached to the
ticket.  Haddock docs
are here:
http://code.haskell.org/~judah/new-io-docs/System-IO.html#23

The choice of error handler is orthogonal to the choice of encoder.
Additionally, the same setting is used for both read and write modes.  For
portability, the handlers are written in pure Haskell rather than using
GNU iconv's //TRANSLIT feature.

Note that the text package, for example, provides more sophisticated
error-handling options.  However, I think the above choices are useful
enough without making the API too complicated.

Discussion deadline: September 9
Ticket: http://hackage.haskell.org/trac/ghc/ticket/3455

Best,
-Judah

Proposal #3455: Add a setting to change how Unicode encoding errors are handled

Judah Jacobson

Duncan Coutts

Simon Marlow

Judah Jacobson

Edward Kmett

tags

participants (4)