
hi, tl;dr: i propose this patch to Text/JSON/String.hs and would like to know why it is needed: @@ -375,7 +375,7 @@ where go s1 = case s1 of - (x :xs) | x < '\x20' || x > '\x7e' -> '\\' : encControl x (go xs) + (x :xs) | x < '\x20' -> '\\' : encControl x (go xs) ('"' :xs) -> '\\' : '"' : go xs ('\\':xs) -> '\\' : '\\' : go xs (x :xs) -> x : go xs i recently stumbled upon CouchDB telling me i'm sending invalid json. i basically read lines from a utf8 file with german umlauts and send them to CouchDB using Text.JSON and Database.CouchDB. $ file lines.txt lines.txt: UTF-8 Unicode text lets take 'ö' as an example. i use LANG=de_DE.utf8 ghci tells
'ö' '\246'
putChar '\246' ö
putChar 'ö' ö
:m + Text.JSON Database.CouchDB runCouchDB' $ newNamedDoc (db "foo") (doc "bar") (showJSON $ toJSObject [("test","ö")]) *** Exception: HTTP/1.1 400 Bad Request Server: CouchDB/1.2.1 (Erlang OTP/R15B03) Date: Mon, 11 Feb 2013 13:24:49 GMT Content-Type: text/plain; charset=utf-8 Content-Length: 48 Cache-Control: must-revalidate
couchdb log says: Invalid JSON: {{error,{10,"lexical error: invalid bytes in UTF8 string.\n"}},<<"{\"test\":\"<F6>\"}">>} this is indeed hex ö:
:m + Numeric putChar $ toEnum $ fst $ head $ readHex "f6" ö
if i apply the above patch and reinstall JSON and CouchDB the doc creation works:
runCouchDB' $ newNamedDoc (db "db") (doc "foo") (showJSON $ toJSObject [("test", "ö")]) Right someRev
but i dont get back the ö i expected:
Just (_,_,x) <-runCouchDB' $ getDoc (db "foo") (doc "bar") :: IO (Maybe (Doc,Rev,JSObject String)) let Ok y = valFromObj "test" =<< readJSON x :: Result String y "\195\188" putStrLn y ü
apperently with curl everything works fine: $ curl localhost:5984/db/foo -XPUT -d '{"test": "ö"}' {"ok":true,"id":"foo","rev":"someOtherRev"} $ curl localhost:5984/db/foo {"_id":"bars","_rev":"someOtherRev","test":"ö"} so how can i get my precious ö back? what am i doing wrong or does Text.JSON need another patch? another question: why does encControl in Text/JSON/String.hs handle the cases x < '\x100' and x < '\x1000' even though they can never be reached with the old predicate in encJSString (x < '\x20') finally: is '\x7e' the right literal for the job? thanks for reading have fun martin