
#15999: Stabilise nofib runtime measurements -------------------------------------+------------------------------------- Reporter: sgraf | Owner: (none) Type: task | Status: new Priority: normal | Milestone: ⊥ Component: NoFib benchmark | Version: 8.6.2 suite | Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #5793 #9476 | Differential Rev(s): Phab:D5438 #15333 #15357 | Wiki Page: | -------------------------------------+------------------------------------- Description changed by sgraf: Old description:
With Phab:D4989 (cf. #15357) having hit `nofib` master, there are still many benchmarks that are unstable. I identified three causes for unstability in https://ghc.haskell.org/trac/ghc/ticket/5793#comment:38. With system overhead mostly out of the equation, there are still two related tasks left:
1. Identify benchmarks with GC wibbles. Plan: Look at counted instructions while varying heap size with just one generation. A wibbling benchmark should have quite diverse sampled maximum residency (as opposed to a microbenchmark, which should have quite stable instruction count).
Then fix these by iterating `main` 'often enough'. Maybe look at total bytes allocated for that, we want this to be monotonically declining as the initial heap size grows. 2. Now, all benchmarks should have stable instruction count. If not, maybe there's another class of benchmarks I didn't identify yet in #5793. Of these benchmarks, there are a few, like `real/eff/CS`, that still have highly unstable runtimes. Fix these 'microbenchmarks' by hiding them behind a flag.
New description: With Phab:D4989 (cf. #15357) having hit `nofib` master, there are still many benchmarks that are unstable in one way or another. I identified three causes for unstability in https://ghc.haskell.org/trac/ghc/ticket/5793#comment:38. With system overhead mostly out of the equation, there are still two related tasks left: 1. Identify benchmarks with GC wibbles. Plan: Look at how productivity rate changes while increasing gen 0 heap size. A GC-sensitive benchmark should have a non-monotonic or discontinuous productivity-rate-over- nursery-size curve. Then fix these by iterating `main` often enough for the curve to become smooth and monotone. 2. Now, all benchmarks should have monotonically decreasing instruction count for increasing nursery sizes. If not, maybe there's another class of benchmarks I didn't identify yet in #5793. Of these benchmarks, there are a few, like `real/eff/CS`, that still have highly code layout-sensitive runtimes. Fix these 'microbenchmarks' by hiding them behind a flag. -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15999#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler