Re: [Haskell-cafe] Fast JSON validation - reducing allocations

11 May 2017

      Eric Seidel  writes:
...
On Thu, May 11, 2017, at 09:12, David Turner wrote:
...
Could anyone help, e.g. by pointing me at the bit in the Core that is
allocating within the main loop?
GHC has a -ticky flag that tracks precisely where allocations are
happening. It's quite low level (you may have to stare at the STG in
addition to the Core to interpret the results) but I've found it
incredibly useful when trying to understand performance swings in GHC's
own benchmarks.
https://ghc.haskell.org/trac/ghc/wiki/Debugging/TickyTicky
You can enable it per module, which is nice break from the
rebuild-the-world -prof, but the flip side IIRC is that if the
allocations are happening in a module that wasn't compiled with -ticky
(e.g. somewhere inside Data.Array) you might not see them.
Indeed, -ticky is wonderful for this sort of thing since it doesn't
affect Core-to-Core optimization, meaning that the instrumented program
will behave very similarly to the uninstrumented version (just a bit
slower due to counter overhead).

I had a quick look at your example over lunch. I started like this
(after commenting out the aeson benchmark in Main),

    $ ghc app/Main.hs -isrc  -O2 -ddump-stg -fforce-recomp \
          -ddump-to-file -dsuppress-idinfo -ddump-simpl -rtsopts \
          -ticky -ticky-allocd
    $ app/Main +RTS -rticky.out

This produces a Ticky report (named ticky.out) in the current directory.
This includes a table which in my case looked something like,

        Entries      Alloc    Alloc'd  Non-void Arguments      STG Name
    --------------------------------------------------------------------------------
            123        192         48   1 L                    go9{v slyX} (main@main:Automaton)
              2          0         32   1 L                    go8{v slyH} (main@main:Automaton)
        1465016   58600640  146501600   0                      $w$j1{v slMc} (main@main:Automaton) in slHB
      161884268          011761148448   0                      $w$j1{v slJT} (main@main:Automaton) in slHA
              0          0   11720128   4 ppww                 $s$wgo2{v slHz} (main@main:Automaton) in r4s
      16353241111790448768   13185144   5 wwLpp                $wgo7{v slHA} (main@main:Automaton) in r4s
        1831270  167011824   13185144   6 ppwLww               $s$wgo1{v slHB} (main@main:Automaton) in r4s
         183127          0   13185144   0                      $w$j{v slGN} (main@main:Automaton) in r4s
            992       8624      10944   1 L                    go7{v slwV} (main@main:Automaton) in r4m
              3         72          0   1 C                    main@main:Automaton.showGraph120{v r3O}
            681          0          0   2 ML                   go6{v rkSd} (main@main:Automaton)

The important things we care about here are Alloc (that is how many
bytes each name allocates) and Alloc'd (that is, how many of each
closure are allocated in bytes). [Aside: we explicitly enabled the
Alloc'd column during compilation with the -ticky-allocd flag; if you
omit it this column will contain only zeros. In general I find it
helpful to have so I generally enable it, despite the overhead it
introduces.]

The table suggests a few nice starting points: $w$j1_slJT is allocated
quite a bit, probably by $wgo7_slHA. Next we can turn to the STG output
(src/Automaton.dump-stg) to try to understand why this is. Reading STG
generally tends to be quite hard due to the rather large, deeply nested
trees that most programs compile to. That being said, a good editor can
help immensely (I also have my ghc-dump [1] package which I've been
meaning to add STG support to, which would enable some interesting
analysis techniques).

Unfortunately, my lunch is now all gone so I'll have to drop this here
for now. However, I'll try to pick it up again tonight. Do keep us
apprised of your progress!

Cheers,

- Ben

[1] https://github.com/bgamari/ghc-dump

Re: [Haskell-cafe] Fast JSON validation - reducing allocations

Ben Gamari