how to profile my code

27 Sep 2019

      Hello, I would like to profile my code in order to find what is wrong with it.

When I run it , I have a lot's of time took by the GC.

 13,374,664,736 bytes allocated in the heap
  16,149,779,248 bytes copied during GC
   8,449,239,784 bytes maximum residency (617 sample(s))
      52,988,816 bytes maximum slop
           15265 MB total memory in use (6808 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0      7281 colls,  7281 par   80.304s   3.389s     0.0005s    0.0279s
  Gen  1       617 colls,   616 par   2508.056s  105.538s     0.1711s    0.6128s

  Parallel GC work balance: 32.20% (serial 0%, perfect 100%)

  TASKS: 51 (1 bound, 50 peak workers (50 total), using -N24)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.013s  (  0.030s elapsed)
  MUT     time  927.150s  (101.039s elapsed)
  GC      time  2349.743s  ( 98.933s elapsed)
  RP      time    0.000s  (  0.000s elapsed)
  PROF    time  238.618s  (  9.993s elapsed)
  EXIT    time    0.007s  (  0.001s elapsed)
  Total   time  3515.531s  (209.997s elapsed)

  Alloc rate    14,425,560 bytes per MUT second

  Productivity  26.4% of total user, 48.1% of total elapsed

gc_alloc_block_sync: 524769
whitehole_spin: 2
gen[0].sync: 1375
gen[1].sync: 2139389

But If I use the Strict Extension in one of my file[1], this amount of time reduce drastically (the first run was done with a cold cache)
This is why the MUT time is different.

  13,364,299,032 bytes allocated in the heap
   6,460,918,752 bytes copied during GC
   8,423,062,112 bytes maximum residency (294 sample(s))
      55,571,872 bytes maximum slop
           15265 MB total memory in use (6813 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0      7455 colls,  7455 par   54.623s   2.330s     0.0003s    0.0279s
  Gen  1       294 colls,   293 par   782.773s  33.293s     0.1132s    0.5003s

  Parallel GC work balance: 61.24% (serial 0%, perfect 100%)

  TASKS: 51 (1 bound, 50 peak workers (50 total), using -N24)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.013s  (  0.026s elapsed)
  MUT     time  340.862s  ( 64.620s elapsed)
  GC      time  749.604s  ( 31.920s elapsed)
  RP      time    0.000s  (  0.000s elapsed)
  PROF    time   87.791s  (  3.703s elapsed)
  EXIT    time    0.011s  (  0.002s elapsed)
  Total   time  1178.282s  (100.271s elapsed)

  Alloc rate    39,207,331 bytes per MUT second

  Productivity  28.9% of total user, 64.4% of total elapsed

gc_alloc_block_sync: 439663
whitehole_spin: 11
gen[0].sync: 922
gen[1].sync: 1129310

It is nice, but I would like to understand the real problem.

I used hp2ps in order to monitor the memory usage, but my code is dominated by PINNED values (attached file)

so I do not know where to start ?

thanks for your advices.

Frederic

[1] https://repo.or.cz/hkl.git/blob/4cff5145d095683a40bc93cf8a0c0076f308244b:/co...

PICCA Frederic-Emmanuel

Ruben Astudillo

PICCA Frederic-Emmanuel

Tom Ellis

tags

participants (3)