Issues with aeson encoding performance

I know a lot of you use aeson for JSON encoding, and I get the feeling I'm doing something wrong here, so if you can give me some kind of a hint, I'd appreciate it. The aeson package description talks about getting up to around 40K messages per second from the encoding, but I'm pegging the CPU at 100% just trying to send 10 messages per second. Profiling (which is a REAL pain to reproduce, since the GHC API is involved... but I locally butchered a copy of the code enough to route around that) shows JSON serialization taking 98% of the CPU and doing 99.8% of allocations (on the order of 5 GB of total allocation if I leave it running for 15 seconds). Heap profiling also shows the vast majority of the heap occupied by aeson's types (Value, Number, etc), and with huge spikes. The streaming code -- very straightforward; it just repeatedly does a toJSON (getValue (f t)) at most once every 0.1 seconds -- is at https://github.com/cdsmith/gloss-web/blob/master/src/Main.hs#L188 and the serialization (of Gloss's Picture data type) is at https://github.com/cdsmith/gloss-web/blob/master/src/Instances.hs#L106 The code to build the Picture data type is the "animation" function in the second example at: http://cdsmith.wordpress.com/2011/08/20/animations-added-to-web-based-haskel... Any ideas? I'm about to try rewriting the JSON serialization to avoid aeson, but before I go there, I thought I'd ask if anyone sees something obviously wrong here. -- Chris Smith

On Mon, Aug 29, 2011 at 8:08 AM, Chris Smith
I know a lot of you use aeson for JSON encoding, and I get the feeling I'm doing something wrong here, so if you can give me some kind of a hint, I'd appreciate it.
It's impossible to tell what's going on from the data at hand, alas. Do you have an example of the kind of image you're trying to encode, so that I could try to reproduce your situation locally?

On Mon, 2011-08-29 at 11:15 -0700, Bryan O'Sullivan wrote:
It's impossible to tell what's going on from the data at hand, alas. Do you have an example of the kind of image you're trying to encode, so that I could try to reproduce your situation locally?
The easiest way I can give you to reproduce it is to try http://dac4.designacourse.com:8000/anim and copy and paste the second example from the blog link above, which I've included below. I'll send an example of the generated JSON off the mailing list if you like, since it's 182K in size and bandwidth isn't free everywhere in the world yet. import Graphics.Gloss animation :: Float -> Picture animation time = Scale 0.8 0.8 $ Translate 0 (-300) $ tree 4 time (dim $ dim brown) -- Basic stump shape stump :: Color -> Picture stump color = Color color $ Polygon [(30,0), (15,300), (-15,300), (-30,0)] -- Make a tree fractal. tree :: Int -- Fractal degree -> Float -- time -> Color -- Color for the stump -> Picture tree 0 time color = stump color tree n time color = let smallTree = Rotate (sin time) $ Scale 0.5 0.5 $ tree (n-1) (- time) (greener color) in Pictures [ stump color , Translate 0 300 $ smallTree , Translate 0 240 $ Rotate 20 smallTree , Translate 0 180 $ Rotate (-20) smallTree , Translate 0 120 $ Rotate 40 smallTree , Translate 0 60 $ Rotate (-40) smallTree ] -- A starting colour for the stump brown :: Color brown = makeColor8 139 100 35 255 -- Make this color a little greener greener :: Color -> Color greener c = mixColors 1 10 green c -- Chris

On Mon, Aug 29, 2011 at 5:08 PM, Chris Smith
I know a lot of you use aeson for JSON encoding, and I get the feeling I'm doing something wrong here, so if you can give me some kind of a hint, I'd appreciate it.
The aeson package description talks about getting up to around 40K messages per second from the encoding, but I'm pegging the CPU at 100% just trying to send 10 messages per second. Profiling (which is a REAL pain to reproduce, since the GHC API is involved... but I locally butchered a copy of the code enough to route around that) shows JSON serialization taking 98% of the CPU and doing 99.8% of allocations (on the order of 5 GB of total allocation if I leave it running for 15 seconds). Heap profiling also shows the vast majority of the heap occupied by aeson's types (Value, Number, etc), and with huge spikes.
Dumb question: are the values you're encoding forced before you call
the JSON serialization function? If you have unevaluated thunks in
there, it could cause the profiler to misattribute the time to aeson.
G
--
Gregory Collins

On Tue, 2011-08-30 at 14:17 +0200, Gregory Collins wrote:
Dumb question: are the values you're encoding forced before you call the JSON serialization function? If you have unevaluated thunks in there, it could cause the profiler to misattribute the time to aeson.
They are not... but it turns out the profiling was accurate. I talked to Bryan about this, and he did some checking around, and basically concluded that there is no way to do the JSON serialization with acceptable performance, even by dropping down to an optimized implementation in C. It's a combination, I think, between the size of the messages and the fact that so much of their contents are floating point numbers. I rewrote this bit of code last night to send (base64-encoded) binary data instead, and the results are dramatic: streaming an animation that was choppy and used 100% CPU on the server before is now smooth and uses about 6% CPU time. It's a good bit easier on the client as well: Chromium goes from itself spinning at 100% CPU to using around 25% to display the animation. I guess the lesson here is that JSON encoding (and parsing) carries some very substantial overhead in many cases. It's better than XML, but no substitute for a binary format. -- Chris
participants (3)
-
Bryan O'Sullivan
-
Chris Smith
-
Gregory Collins