
If the input bytes are mapped 1-1 to Char values without conversion,
you can just use Data.ByteString.Char8.pack to convert to a
ByteString, which you can then convert to Unicode however you like.
On Sun, Nov 16, 2014 at 5:42 AM, Ben Franksen
I have a question about how to reverse the text encoding as done by ghc and the base library for stuff that comes from the command line or the environment.
Assume the user's environment specifies a non-Unicode locale, e.g. some latin encoding. In this case, the String we get from e.g. System.Environment.getArgs does *not* contain the Unicode code points of the characters the user has entered. Instead the input bytes are mapped one-to- one to Char. This has probably been done for compatibility reasons, and I do not want to discuss this choice here. Rather, I want to find out how I can convert such a string back to some proper Unicode representation, for instance in order to store the value in a file with a defined encoding such as utf-8.
This should be done in a generic way, i.e. without making ad-hoc assumptions about what the user's encoding might be.
There is the iconv package. However, it takes ByteString as input and output and it also requires that I give it the encoding as input. How do I find out which is this encoding? On the command line I could simply do
ben@sarun[1]: ~ > locale charmap ISO-8859-1
Is there a Haskell function that does the equivalent or do I have to use getEnv "LC_CTYPE", then parse the result?
Let's assume I get this to work, so now I have a String that represents the user's encoding, such as "ISO-8859-1". Now, in order to use iconv, I have to convert the string I got via getArgs into a ByteString. But to do that properly, I would have to decode it according to the user's current locale, which is exactly what I want to achieve in the first place.
How do I break this cycle?
Perhaps it is simpler to write our own getArgs/getEnv functions and directly convert the data we get from the system to a proper (Unicode) String?
Any suggestions would be highly appreciated.
Cheers Ben -- "Make it so they have to reboot after every typo." -- Scott Adams
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe