Re: [Haskell-cafe] Fast JSON validation - reducing allocations

12 May 2017


      On Fri, 2017-05-12 at 10:48 +0100, David Turner wrote:
...
On 12 May 2017 at 09:27, Arjen  wrote:
...
Maybe this is a silly question, and please let me know why if so,
but:
Has anyone thought about parallelizing it for multiple messages in
order to "produce garbage faster"? While reducing allocation will
make
the single validations faster, doing multiple ones might improve
the
throughput per GC ratio. This assumes that the amount of live data
in
the heap is small, making GC sort of constant time, and having
multiple
cores available.
Not a silly question at all. Adding the following incantation:
    `using` parListChunk 100 rseq
does quite happily spread things across all 4 cores on my development
machine, and it's certainly a bit faster. To give some stats, it
processes ~24 events between GCs rather than ~6, and collects ~2MB
rather than ~500kB. The throughput becomes a lot less consistent,
however, at least partly due to some bigger GC pauses along the way.
As Ben's statistics showed, our allocation rate on one thread is
around 4TBps, which is already stressing the GC out a bit, and
parallelising it doesn't make that problem any easier.
I know in the OP I said "we have a stream" (accidental MLK misquote)
but in fact there are a bunch of parallel streams whose number
exceeds the number of available cores, so we don't anticipate any
enormous benefit from spreading the processing of any one stream
across multiple cores: single-threaded performance is what we think
we should be concentrating on.
Cheers,
David
Apologies for spamming, but if you want more mutator time between
garbage collections, try increasing the nursery size. I get a lot less
time in GC and also much less sync's on GC (according to -s). It seems
to reduce the execution time by a third using something -A8M or -A64M
(quite extreem). In a large program, this effect might be less.

kind regards, Arjen