My guess is most of the "noise" is not run time, but the compiled code changing in hard to predict ways.
https://gitlab.haskell.org/ghc/ghc/-/merge_requests/1776/diffs for example was a very small PR that took *months* of on-off work to get passing metrics tests. In the end, binding `is_boot` twice helped a bit, and dumb luck helped a little bit more. No matter how you analyze that, that's a lot of pain for what's manifestly a performance-irrelevant MR --- no one is writing 10,000 default methods or whatever could possibly make this the micro-optimizing worth it!
Perhaps this is an extreme example, but my rough sense is that
it's not an isolated outlier.
John
I left the wiggle room for things like longer wall time causing more time events in the IO Manager/RTS which can be a thermal/HW issue.They're small and indirect though
-davean
On Thu, Mar 18, 2021 at 1:37 PM Sebastian Graf <sgraf1337@gmail.com> wrote:
To be clear: All performance tests that run as part of CI measure allocations only. No wall clock time.
Those measurements are (mostly) deterministic and reproducible between compiles of the same worktree and not impacted by thermal issues/hardware at all.
Am Do., 18. März 2021 um 18:09 Uhr schrieb davean <davean@xkcd.com>:
That really shouldn't be near system noise for a well constructed performance test. You might be seeing things like thermal issues, etc though - good benchmarking is a serious subject.Also we're not talking wall clock tests, we're talking specific metrics. The machines do tend to be bare metal, but many of these are entirely CPU performance independent, memory timing independent, etc. Well not quite but that's a longer discussion.
The investigation of Haskell code performance is a very good thing to do BTW, but you'd still want to avoid regressions in the improvements you made. How well we can do that and the cost of it is the primary issue here.
-davean
_______________________________________________On Wed, Mar 17, 2021 at 6:22 PM Karel Gardas <karel.gardas@centrum.cz> wrote:
On 3/17/21 4:16 PM, Andreas Klebinger wrote:
> Now that isn't really an issue anyway I think. The question is rather is
> 2% a large enough regression to worry about? 5%? 10%?
5-10% is still around system noise even on lightly loaded workstation.
Not sure if CI is not run on some shared cloud resources where it may be
even higher.
I've done simple experiment of pining ghc compiling ghc-cabal and I've
been able to "speed" it up by 5-10% on W-2265.
Also following this CI/performance regs discussion I'm not entirely sure
if this is not just a witch-hunt hurting/beating mostly most active GHC
developers. Another idea may be to give up on CI doing perf reg testing
at all and invest saved resources into proper investigation of
GHC/Haskell programs performance. Not sure, if this would not be more
beneficial longer term.
Just one random number thrown to the ring. Linux's perf claims that
nearly every second L3 cache access on the example above ends with cache
miss. Is it a good number or bad number? See stats below (perf stat -d
on ghc with +RTS -T -s -RTS').
Good luck to anybody working on that!
Karel
Linking utils/ghc-cabal/dist/build/tmp/ghc-cabal ...
61,020,836,136 bytes allocated in the heap
5,229,185,608 bytes copied during GC
301,742,768 bytes maximum residency (19 sample(s))
3,533,000 bytes maximum slop
840 MiB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max
pause
Gen 0 2012 colls, 0 par 5.725s 5.731s 0.0028s
0.1267s
Gen 1 19 colls, 0 par 1.695s 1.696s 0.0893s
0.2636s
TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)
SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)
INIT time 0.000s ( 0.000s elapsed)
MUT time 27.849s ( 32.163s elapsed)
GC time 7.419s ( 7.427s elapsed)
EXIT time 0.000s ( 0.010s elapsed)
Total time 35.269s ( 39.601s elapsed)
Alloc rate 2,191,122,004 bytes per MUT second
Productivity 79.0% of total user, 81.2% of total elapsed
Performance counter stats for
'/export/home/karel/sfw/ghc-8.10.3/bin/ghc -H32m -O -Wall -optc-Wall -O0
-hide-all-packages -package ghc-prim -package base -package binary
-package array -package transformers -package time -package containers
-package bytestring -package deepseq -package process -package pretty
-package directory -package filepath -package template-haskell -package
unix --make utils/ghc-cabal/Main.hs -o
utils/ghc-cabal/dist/build/tmp/ghc-cabal -no-user-package-db -Wall
-fno-warn-unused-imports -fno-warn-warnings-deprecations
-DCABAL_VERSION=3,4,0,0 -DBOOTSTRAPPING -odir bootstrapping -hidir
bootstrapping libraries/Cabal/Cabal/Distribution/Fields/Lexer.hs
-ilibraries/Cabal/Cabal -ilibraries/binary/src -ilibraries/filepath
-ilibraries/hpc -ilibraries/mtl -ilibraries/text/src
libraries/text/cbits/cbits.c -Ilibraries/text/include
-ilibraries/parsec/src +RTS -T -s -RTS':
39,632.99 msec task-clock # 0.999 CPUs
utilized
17,191 context-switches # 0.434 K/sec
0 cpu-migrations # 0.000 K/sec
899,930 page-faults # 0.023 M/sec
177,636,979,975 cycles # 4.482 GHz
(87.54%)
181,945,795,221 instructions # 1.02 insn per
cycle (87.59%)
34,033,574,511 branches # 858.718 M/sec
(87.42%)
1,664,969,299 branch-misses # 4.89% of all
branches (87.48%)
41,522,737,426 L1-dcache-loads # 1047.681 M/sec
(87.53%)
2,675,319,939 L1-dcache-load-misses # 6.44% of all
L1-dcache hits (87.48%)
372,370,395 LLC-loads # 9.395 M/sec
(87.49%)
173,614,140 LLC-load-misses # 46.62% of all
LL-cache hits (87.46%)
39.663103602 seconds time elapsed
38.288158000 seconds user
1.358263000 seconds sys
_______________________________________________
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs