
On Sun, Oct 17, 2010 at 2:26 PM, Ionut G. Stan
On 17/Oct/10 8:02 AM, Michael Snoyman wrote:
In the gist you sent, the problem is that you are reading the HTTP response as a String. The HTTP library doesn't deal well with non-Latin characters when doing String requests; you should be using ByteString and then converting. It's a little tedious using the HTTP library with ByteStrings, which is one of the reasons I wrote http-enumerator. Here's some working code. The main point is to convert the UTF8 octets to a String.
You could also consider using one of the JSON libraries that support bytestrings directly instead of strings, which will likely result in much better performance. Contenders include JSONb[1] and yajl-enumerator[2].
import Network.HTTP.Enumerator import qualified Text.JSON as JSON import qualified Data.ByteString.Lazy.UTF8 as BSLU
data GithubUser = GithubUser { name :: String, location :: String } deriving (Eq, Show)
instance JSON.JSON GithubUser where readJSON (JSON.JSObject object) = let (Just a) = lookupM "user" $ JSON.fromJSObject object (JSON.JSObject b) = a user = JSON.fromJSObject b in do name<- lookupM "name" user>>= JSON.readJSON location<- lookupM "location" user>>= JSON.readJSON return $ GithubUser { name = name, location = location }
showJSON user = JSON.makeObj [ ("name", JSON.showJSON $ name user), ("location", JSON.showJSON $ location user) ]
lookupM :: (Monad m) => String -> [(String, a)] -> m a lookupM x xs = maybe (fail $ "No such element: " ++ x) return (lookup x xs)
main = do jsonLbs<- simpleHttp "http://github.com/api/v2/json/user/show/igstan" let jsonText = BSLU.toString jsonLbs let result = JSON.decode jsonText :: JSON.Result GithubUser showResult result where showResult (JSON.Ok json) = putStrLn $ name json showResult (JSON.Error e) = putStrLn e
Michael
[1] http://hackage.haskell.org/package/JSONb-1.0.2 [2] http://hackage.haskell.org/package/yajl-enumerator
Thanks Michael, now it works indeed. But I don't understand, is there any inherent problem with Haskell's built-in String? Should one choose ByteString when dealing with Unicode stuff? Or, is there any resource that describes in one place all the problems Haskell has with Unicode?
There's no problem with String; you just need to remember what it means. A String is a list of Chars, and a Char is a unicode codepoint. On the other hand, the HTTP protocol deals with *bytes*, not Unicode codepoints. In order to convert between the two, you need some type of encoding; in the case of JSON, I believe this is always specified as UTF-8. The problem for you is that the HTTP package does *not* perform UTF-8 decoding of the raw bytes sent over the network. Instead, I believe it is doing the naive byte-to-codepoint conversion, aka Latin-1 decoding. By downloading the data as bytes (ie, a ByteString), you can then explicitly state that you want to do UTF-8 decoding instead of Latin-1. It would be entirely possible to write an HTTP library that does this automatically, but it would be inherently limited to a single encoding type. By dealing directly with bytestrings, you can work with any character encoding, as well as binary data such as images which does not have any character encoding. Michael