
Karel Gardas
On 3/17/21 4:16 PM, Andreas Klebinger wrote:
Now that isn't really an issue anyway I think. The question is rather is 2% a large enough regression to worry about? 5%? 10%?
5-10% is still around system noise even on lightly loaded workstation. Not sure if CI is not run on some shared cloud resources where it may be even higher.
I think when we say "performance" we should be clear about what we are referring to. Currently, GHC does not measure instructions/cycles/time. We only measure allocations and residency. These are significantly more deterministic than time measurements, even on cloud hardware. I do think that eventually we should start to measure a broader spectrum of metrics, but this is something that can be done on dedicated hardware as a separate CI job.
I've done simple experiment of pining ghc compiling ghc-cabal and I've been able to "speed" it up by 5-10% on W-2265.
Do note that once we switch to Hadrian ghc-cabal will vanish entirely (since Hadrian implements its functionality directly).
Also following this CI/performance regs discussion I'm not entirely sure if this is not just a witch-hunt hurting/beating mostly most active GHC developers. Another idea may be to give up on CI doing perf reg testing at all and invest saved resources into proper investigation of GHC/Haskell programs performance. Not sure, if this would not be more beneficial longer term.
I don't think this would be beneficial. It's much easier to prevent a regression from getting into the tree than it is to find and characterise it after it has been merged.
Just one random number thrown to the ring. Linux's perf claims that nearly every second L3 cache access on the example above ends with cache miss. Is it a good number or bad number? See stats below (perf stat -d on ghc with +RTS -T -s -RTS').
It is very hard to tell; it sounds bad but it is not easy to know why or whether it is possible to improve. This is one of the reasons why I have been trying to improve sharing within GHC recently; reducing residency should improve cache locality. Nevertheless, the difficulty interpreting architectural events is why I generally only use `perf` for differential measurements. Cheers, - Ben