
Hello Cafe, Recently while trying to optimize some code, I noticed that aeson requires quite a substantial amount of memory when parsing map-like structures. The quick example I crafted [1] is quite surprising. In order to parse an 11 megabyte file aeson requires around 300 megabytes of memory (according to the heap profiler), and the total memory footprint (according to htop) jumps to around 700 megabytes. Is there a way to reduce this number to something less insane? I suspect that I might be doing something wrong here, but I can't exactly see what. [1] https://github.com/greydot/aeson-example

I'm not 100% sure it's applicable in your case, but perhaps trying threadscope would be a good idea? Based on the heap profile, I'm guessing that most of the allocations are due to slicing ByteStrings up. What happens when you try to read a map of Integers? I'd also note that your example uses evaluate and performGC which isn't entirely realistic and thus may not actually correspond to the performance of the application in practice. I gather it *was* a problem in practice but nonetheless it's always good to benchmark the right thing. On 08/08/2018 02:54 PM, Lana Black wrote:
Hello Cafe,
Recently while trying to optimize some code, I noticed that aeson requires quite a substantial amount of memory when parsing map-like structures. The quick example I crafted [1] is quite surprising. In order to parse an 11 megabyte file aeson requires around 300 megabytes of memory (according to the heap profiler), and the total memory footprint (according to htop) jumps to around 700 megabytes.
Is there a way to reduce this number to something less insane? I suspect that I might be doing something wrong here, but I can't exactly see what.
[1] https://github.com/greydot/aeson-example _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

Hi, The test.data is very repetitive: {"1":["a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z"],"10":["a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z"],...} Perhaps (after parsing (which might fuse enough to avoid a memory spike), otherwise during parsing might require modifications to aeson?) you could compress it by interning the symbols using a `Map Text Text` to generate one canonical `Text` object for each unique string. `pack "a" == pack "a"` under `Eq` but they might be different `Text` objects. You might also need to `copy` the `Text` objects, which might be slices referencing the input. On 08/08/18 21:24, Vanessa McHale wrote:
I'm not 100% sure it's applicable in your case, but perhaps trying threadscope would be a good idea?
Based on the heap profile, I'm guessing that most of the allocations are due to slicing ByteStrings up. What happens when you try to read a map of Integers?
I'd also note that your example uses evaluate and performGC which isn't entirely realistic and thus may not actually correspond to the performance of the application in practice. I gather it *was* a problem in practice but nonetheless it's always good to benchmark the right thing.
On 08/08/2018 02:54 PM, Lana Black wrote:
Hello Cafe,
Recently while trying to optimize some code, I noticed that aeson requires quite a substantial amount of memory when parsing map-like structures. The quick example I crafted [1] is quite surprising. In order to parse an 11 megabyte file aeson requires around 300 megabytes of memory (according to the heap profiler), and the total memory footprint (according to htop) jumps to around 700 megabytes.
Is there a way to reduce this number to something less insane? I suspect that I might be doing something wrong here, but I can't exactly see what.
Claude -- https://mathr.co.uk

On 08/08/2018 08:59 PM, Claude Heiland-Allen wrote:
Hi,
The test.data is very repetitive:
{"1":["a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z"],"10":["a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z"],...}
Perhaps (after parsing (which might fuse enough to avoid a memory spike), otherwise during parsing might require modifications to aeson?) you could compress it by interning the symbols using a `Map Text Text` to generate one canonical `Text` object for each unique string.
`pack "a" == pack "a"` under `Eq` but they might be different `Text` objects.
You might also need to `copy` the `Text` objects, which might be slices referencing the input.
I tried using Text.copy, though in the real code, not this example. It didn't seem to help. The code I'm actually trying to optimize is building a map from IP addresses to a collection of short text samples, with potentially hundreds of thousands to millions of records. Though we use IPRTable from iproute package and not Data.Map, they occupy approximately the same amount of memory, so I used Data.Map in the example.

On 08/08/2018 08:24 PM, Vanessa McHale wrote:
I'm not 100% sure it's applicable in your case, but perhaps trying threadscope would be a good idea?
Based on the heap profile, I'm guessing that most of the allocations are due to slicing ByteStrings up. What happens when you try to read a map of Integers?
Parsing a 'Map Int Int' gives a similar picture, with slightly lower memory use due to the absence of [Text]. However, it is still in the hundreds of megabytes for a 10 megabyte file. I also noticed that Map itself contributes significantly to the memory footprint.
I'd also note that your example uses evaluate and performGC which isn't entirely realistic and thus may not actually correspond to the performance of the application in practice. I gather it *was* a problem in practice but nonetheless it's always good to benchmark the right thing.
Removing evaluate and force adds THUNKs to the heap profile, which is probably the wrong thing to benchmark anyway. What I was trying to benchmark was the memory footprint during the parsing stage and after it, when only the resulting data should be residing in memory.

Well, the problem with that is that in a lazy language, the "parsing stage" may not be as clearly distinguished as you suggest. Simply forcing the evaluation of all THUNKs is not what happens in the actual program. On 08/09/2018 07:57 AM, Lana Black wrote:
On 08/08/2018 08:24 PM, Vanessa McHale wrote:
I'm not 100% sure it's applicable in your case, but perhaps trying threadscope would be a good idea?
Based on the heap profile, I'm guessing that most of the allocations are due to slicing ByteStrings up. What happens when you try to read a map of Integers?
Parsing a 'Map Int Int' gives a similar picture, with slightly lower memory use due to the absence of [Text]. However, it is still in the hundreds of megabytes for a 10 megabyte file. I also noticed that Map itself contributes significantly to the memory footprint.
I'd also note that your example uses evaluate and performGC which isn't entirely realistic and thus may not actually correspond to the performance of the application in practice. I gather it *was* a problem in practice but nonetheless it's always good to benchmark the right thing.
Removing evaluate and force adds THUNKs to the heap profile, which is probably the wrong thing to benchmark anyway. What I was trying to benchmark was the memory footprint during the parsing stage and after it, when only the resulting data should be residing in memory. _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

On 08/09/2018 02:38 PM, Vanessa McHale wrote:
Well, the problem with that is that in a lazy language, the "parsing stage" may not be as clearly distinguished as you suggest. Simply forcing the evaluation of all THUNKs is not what happens in the actual program.
True. Unfortunately, that doesn't mean the memory issue doesn't exist. Data.Map already has quite significant memory footprint, but aeson makes it even worse.

This is interesting. Why would something like a Data.Map have a bad memory
footprint? In your Aeson benchmark, what happens to memory usage if you
manually trigger a GC cycle? Is hundreds of megs of memory actively being
used? Or is most of it garbage waiting to be collected?
On Thu 9 Aug, 2018, 9:38 PM Lana Black,
On 08/09/2018 02:38 PM, Vanessa McHale wrote:
Well, the problem with that is that in a lazy language, the "parsing stage" may not be as clearly distinguished as you suggest. Simply forcing the evaluation of all THUNKs is not what happens in the actual program.
True. Unfortunately, that doesn't mean the memory issue doesn't exist. Data.Map already has quite significant memory footprint, but aeson makes it even worse. _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

you might be interested in John Ky's post on hw-json - it is expected
that there's quite a blowup.
On Wed, Aug 8, 2018 at 3:54 PM Lana Black
Hello Cafe,
Recently while trying to optimize some code, I noticed that aeson requires quite a substantial amount of memory when parsing map-like structures. The quick example I crafted [1] is quite surprising. In order to parse an 11 megabyte file aeson requires around 300 megabytes of memory (according to the heap profiler), and the total memory footprint (according to htop) jumps to around 700 megabytes.
Is there a way to reduce this number to something less insane? I suspect that I might be doing something wrong here, but I can't exactly see what.
[1] https://github.com/greydot/aeson-example _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.
-- A UNIX signature isn't a return address, it's the ASCII equivalent of a black velvet clown painting. It's a rectangle of carets surrounding a quote from a literary giant of weeniedom like Heinlein or Dr. Who. -- Chris Maeda

bah, sent too early, sorry:
https://haskell-works.github.io/posts/2018-07-25-problem-of-parsing-large-da...
On Wed, Aug 8, 2018 at 4:41 PM Mark Wotton
you might be interested in John Ky's post on hw-json - it is expected that there's quite a blowup. On Wed, Aug 8, 2018 at 3:54 PM Lana Black
wrote: Hello Cafe,
Recently while trying to optimize some code, I noticed that aeson requires quite a substantial amount of memory when parsing map-like structures. The quick example I crafted [1] is quite surprising. In order to parse an 11 megabyte file aeson requires around 300 megabytes of memory (according to the heap profiler), and the total memory footprint (according to htop) jumps to around 700 megabytes.
Is there a way to reduce this number to something less insane? I suspect that I might be doing something wrong here, but I can't exactly see what.
[1] https://github.com/greydot/aeson-example _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.
-- A UNIX signature isn't a return address, it's the ASCII equivalent of a black velvet clown painting. It's a rectangle of carets surrounding a quote from a literary giant of weeniedom like Heinlein or Dr. Who. -- Chris Maeda
-- A UNIX signature isn't a return address, it's the ASCII equivalent of a black velvet clown painting. It's a rectangle of carets surrounding a quote from a literary giant of weeniedom like Heinlein or Dr. Who. -- Chris Maeda
participants (5)
-
Claude Heiland-Allen
-
Lana Black
-
Mark Wotton
-
Saurabh Nanda
-
Vanessa McHale