
On 17/Oct/10 3:37 PM, Michael Snoyman wrote:
On Sun, Oct 17, 2010 at 2:26 PM, Ionut G. Stan
wrote: Thanks Michael, now it works indeed. But I don't understand, is there any inherent problem with Haskell's built-in String? Should one choose ByteString when dealing with Unicode stuff? Or, is there any resource that describes in one place all the problems Haskell has with Unicode?
There's no problem with String; you just need to remember what it means. A String is a list of Chars, and a Char is a unicode codepoint. On the other hand, the HTTP protocol deals with *bytes*, not Unicode codepoints. In order to convert between the two, you need some type of encoding; in the case of JSON, I believe this is always specified as UTF-8.
The problem for you is that the HTTP package does *not* perform UTF-8 decoding of the raw bytes sent over the network. Instead, I believe it is doing the naive byte-to-codepoint conversion, aka Latin-1 decoding. By downloading the data as bytes (ie, a ByteString), you can then explicitly state that you want to do UTF-8 decoding instead of Latin-1.
It would be entirely possible to write an HTTP library that does this automatically, but it would be inherently limited to a single encoding type. By dealing directly with bytestrings, you can work with any character encoding, as well as binary data such as images which does not have any character encoding.
OK, I think I understand now. I was under the assumption that the Network.HTTP package will take a look at the Content-Type header and do a behind-the-scene conversion before decoding those bytes. Thanks for your help. -- IonuČ› G. Stan | http://igstan.ro