
Ganesh Sittampalam
I need to convert directly between different string encodings, rather than just using a particular encoding when reading from/writing to a Handle.
I'm aware of the following options, but they have a few problems:
- text-icu: not easily usable on Windows as it requires libicu - text: just handles utf8/16/32 - iconv: POSIX only
It seems like GHC's TextEncoding has the necessary low-level functionality (http://hackage.haskell.org/packages/archive/base/latest/doc/html/GHC-IO-Enco...), but I can't find any high-level interface for directly transcoding between String/Bytestring/Text.
Am I missing something, or would this be a useful addition as a separate library?
btw, looking at the GHC.IO.Encoding.* modules, it seems to me that that 'mkTextEncoding'[1] only supports utf8/16/32 in a system independent fashion: ,---- | The set of known encodings is system-dependent, but includes at least: | | - UTF-8 | - UTF-16, UTF-16BE, UTF-16LE | - UTF-32, UTF-32BE, UTF-32LE | | On systems using GNU iconv (e.g. Linux), there is additional notation | for specifying how illegal characters are handled: | | - a suffix of //IGNORE, e.g. UTF-8//IGNORE, will cause all illegal | sequences on input to be ignored, and on output will drop all code | points that have no representation in the target encoding. | | - a suffix of //TRANSLIT will choose a replacement character for | illegal sequences or code points. | | On Windows, you can access supported code pages with the prefix CP; for | example, "CP1250". `---- ...so does using GHC.Encoding.* actually provide you with more encodings than using the other options ('text' et al.) you mentioned? which text encodings beyond the UTF-family do you need btw? [1]: http://hackage.haskell.org/packages/archive/base/4.6.0.0/doc/html/GHC-IO-Enc... cheers, hvr