Don't use the json package, use aeson instead. (It's much faster and handles encoding issues correctly).

G


On Mon, Feb 11, 2013 at 2:56 PM, Martin Hilbig <lists@mhilbig.de> wrote:
hi,

tl;dr: i propose this patch to Text/JSON/String.hs and would like to
know why it is needed:

@@ -375,7 +375,7 @@
   where
   go s1 =
     case s1 of
-      (x   :xs) | x < '\x20' || x > '\x7e' -> '\\' : encControl x (go xs)
+      (x   :xs) | x < '\x20' -> '\\' : encControl x (go xs)
       ('"' :xs)              -> '\\' : '"'  : go xs
       ('\\':xs)              -> '\\' : '\\' : go xs
       (x   :xs)              -> x    : go xs


i recently stumbled upon CouchDB telling me i'm sending invalid json.

i basically read lines from a utf8 file with german umlauts and send
them to CouchDB using Text.JSON and Database.CouchDB.

  $ file lines.txt
  lines.txt: UTF-8 Unicode text

lets take 'ö' as an example. i use LANG=de_DE.utf8

ghci tells

> 'ö'
'\246'

> putChar '\246'
ö

> putChar 'ö'
ö

> :m + Text.JSON Database.CouchDB
> runCouchDB' $ newNamedDoc (db "foo") (doc "bar") (showJSON $ toJSObject [("test","ö")])
*** Exception: HTTP/1.1 400 Bad Request
Server: CouchDB/1.2.1 (Erlang OTP/R15B03)
Date: Mon, 11 Feb 2013 13:24:49 GMT
Content-Type: text/plain; charset=utf-8
Content-Length: 48
Cache-Control: must-revalidate

couchdb log says:

  Invalid JSON: {{error,{10,"lexical error: invalid bytes in UTF8 string.\n"}},<<"{\"test\":\"<F6>\"}">>}

this is indeed hex ö:

> :m + Numeric
> putChar $ toEnum $ fst $ head $ readHex "f6"
ö

if i apply the above patch and reinstall JSON and CouchDB the doc
creation works:

> runCouchDB' $ newNamedDoc (db "db") (doc "foo") (showJSON $ toJSObject [("test", "ö")])
Right someRev

but i dont get back the ö i expected:

> Just (_,_,x) <-runCouchDB' $ getDoc (db "foo") (doc "bar") :: IO (Maybe (Doc,Rev,JSObject String))
> let Ok y = valFromObj "test" =<< readJSON x :: Result String
> y
"\195\188"
> putStrLn y
ü

apperently with curl everything works fine:

$ curl localhost:5984/db/foo -XPUT -d '{"test": "ö"}'
{"ok":true,"id":"foo","rev":"someOtherRev"}
$ curl localhost:5984/db/foo
{"_id":"bars","_rev":"someOtherRev","test":"ö"}

so how can i get my precious ö back? what am i doing wrong or does Text.JSON need another patch?

another question: why does encControl in Text/JSON/String.hs handle the
cases x < '\x100' and x < '\x1000' even though they can never be
reached with the old predicate in encJSString (x < '\x20')

finally: is '\x7e' the right literal for the job?

thanks for reading

have fun
martin

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe



--
Gregory Collins <greg@gregorycollins.net>