Re: [Haskell-cafe] Fast JSON validation - reducing allocations

David Turner
On 11 May 2017 at 19:40, Ben Gamari
wrote: Ccing Luke Maurer under the assumption that he will appreciate seeing the fruits of his labor.
David Turner
writes: Dear Café,
(snip)
There's a copy of the relevant code for option 4 at https://github.com/DaveCTurner/json-validator. I've hacked around with
it a
bit since producing the numbers above, so it's now apparently a bit slower than Aeson but allocates less too (~65kB).
Could anyone help, e.g. by pointing me at the bit in the Core that is allocating within the main loop?
While turning this over in my head I realized that this is the sort of program which may be helped significantly by GHC 8.2's improved join point handling. Indeed the timings seem to support this hypothesis:
GHC 8.0.2
benchmarking json-validator/Automaton/testEvent time 22.84 μs (22.76 μs .. 22.94 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 22.84 μs (22.76 μs .. 22.94 μs) std dev 297.4 ns (221.8 ns .. 378.8 ns)
Alloc rate 4,015,256,078,038 bytes per MUT second
GHC 8.2
benchmarking json-validator/Automaton/testEvent time 9.221 μs (9.141 μs .. 9.318 μs) 0.998 R² (0.996 R² .. 1.000 R²) mean 9.163 μs (9.084 μs .. 9.356 μs) std dev 399.8 ns (193.0 ns .. 745.4 ns) variance introduced by outliers: 54% (severely inflated)
Alloc rate 123,141,635 bytes per MUT second
Wow! I suspect your allocations have now vanished.
Ooo, that's more like it.
Could you run again using the following to get Criterion's estimate of the allocations-per-call?
json-validator-exe --regress allocated:iters +RTS -T benchmarking json-validator/Automaton/testEvent
Here it is, GHC 8.0.2 benchmarking json-validator/Automaton/testEvent time 34.81 μs (33.48 μs .. 36.10 μs) 0.991 R² (0.988 R² .. 0.994 R²) mean 34.04 μs (33.28 μs .. 34.99 μs) std dev 2.828 μs (2.423 μs .. 3.247 μs) allocated: 1.000 R² (1.000 R² .. 1.000 R²) iters 65924.651 (65895.364 .. 65951.510) y -4175.284 (-46736.295 .. 39814.235) variance introduced by outliers: 78% (severely inflated) GHC 8.2 benchmarking json-validator/Automaton/testEvent time 9.021 μs (8.923 μs .. 9.162 μs) 0.998 R² (0.994 R² .. 1.000 R²) mean 8.970 μs (8.905 μs .. 9.232 μs) std dev 341.4 ns (94.90 ns .. 711.7 ns) allocated: 0.973 R² (0.956 R² .. 0.984 R²) iters 343.033 (332.692 .. 354.606) y -7326.339 (-57269.638 .. 47222.231) variance introduced by outliers: 47% (moderately inflated) Cheers, - Ben

Truly impressive. Amazing. I wonder what style of coding of inner loops leads to such good results in 8.2. Is it easy to describe? Or is the answer "any" or "simplest" or "natural"? If not, can it be captured as some recursion combinators perhaps?
json-validator-exe --regress allocated:iters +RTS -T
Here it is,
GHC 8.0.2
benchmarking json-validator/Automaton/testEvent time 34.81 μs (33.48 μs .. 36.10 μs) 0.991 R² (0.988 R² .. 0.994 R²) mean 34.04 μs (33.28 μs .. 34.99 μs) std dev 2.828 μs (2.423 μs .. 3.247 μs) allocated: 1.000 R² (1.000 R² .. 1.000 R²) iters 65924.651 (65895.364 .. 65951.510) y -4175.284 (-46736.295 .. 39814.235) variance introduced by outliers: 78% (severely inflated)
GHC 8.2
benchmarking json-validator/Automaton/testEvent time 9.021 μs (8.923 μs .. 9.162 μs) 0.998 R² (0.994 R² .. 1.000 R²) mean 8.970 μs (8.905 μs .. 9.232 μs) std dev 341.4 ns (94.90 ns .. 711.7 ns) allocated: 0.973 R² (0.956 R² .. 0.984 R²) iters 343.033 (332.692 .. 354.606) y -7326.339 (-57269.638 .. 47222.231) variance introduced by outliers: 47% (moderately inflated)

Mikolaj Konarski
Truly impressive. Amazing.
I wonder what style of coding of inner loops leads to such good results in 8.2. Is it easy to describe? Or is the answer "any" or "simplest" or "natural"? If not, can it be captured as some recursion combinators perhaps?
My fairly unhelpful description would be that carefully-written programs that look like they shouldn't allocate will be helped most. Previously these program (e.g. probably the one here) would allocate closures due to GHC's failure to identify some types of join points. In 8.2 our treatment of join points is more robust and consequently this unnecessary allocation will be more reliably eliminated. Cheers, - Ben

Just thought I'd follow up now that we've upgraded from GHC 7.10 to GHC
8.0.2 (not 8.2 yet) and are now seeing this code validate the 900-byte
sample document in <8us with <220 bytes of allocation. This is awesome. I'm
not sure what we did to get 8.0.2 to realise it could make the loop this
tight, and I'll look into it some more if I have the time and the
inclination.
There's been various changes since my OP, although nothing fundamental: a
bug fix, support for insignificant whitespace, and removal of bounds checks
on the arrays. We now pass the test suite from the appendix of
http://seriot.ch/parsing_json.php (except for the ones for bare values - we
only want arrays or objects).
Thanks again for everyone's help.
Cheers,
David
On 12 May 2017 at 16:47, Ben Gamari
Mikolaj Konarski
writes: Truly impressive. Amazing.
I wonder what style of coding of inner loops leads to such good results in 8.2. Is it easy to describe? Or is the answer "any" or "simplest" or "natural"? If not, can it be captured as some recursion combinators perhaps?
My fairly unhelpful description would be that carefully-written programs that look like they shouldn't allocate will be helped most. Previously these program (e.g. probably the one here) would allocate closures due to GHC's failure to identify some types of join points. In 8.2 our treatment of join points is more robust and consequently this unnecessary allocation will be more reliably eliminated.
Cheers,
- Ben
participants (3)
-
Ben Gamari
-
David Turner
-
Mikolaj Konarski