
#15999: Stabilise nofib runtime measurements -------------------------------------+------------------------------------- Reporter: sgraf | Owner: (none) Type: task | Status: new Priority: normal | Milestone: ⊥ Component: NoFib | Version: 8.6.2 benchmark suite | Keywords: | Operating System: Unknown/Multiple Architecture: | Type of failure: None/Unknown Unknown/Multiple | Test Case: | Blocked By: Blocking: | Related Tickets: #5793 #9476 | #15333 #15357 Differential Rev(s): | Wiki Page: -------------------------------------+------------------------------------- With Phab:D4989 (cf. #15357) having hit `nofib` master, there are still many benchmarks that are unstable. I identified three causes for unstability in https://ghc.haskell.org/trac/ghc/ticket/5793#comment:38. With system overhead mostly out of the equation, there are still two related tasks left: 1. Identify benchmarks with GC wibbles. Plan: Look at counted instructions while varying heap size with just one generation. A wibbling benchmark should have quite diverse sampled maximum residency (as opposed to a microbenchmark, which should have quite stable instruction count). Then fix these by iterating `main` 'often enough'. Maybe look at total bytes allocated for that, we want this to be monotonically declining as the initial heap size grows. 2. Now, all benchmarks should have stable instruction count. If not, maybe there's another class of benchmarks I didn't identify yet in #5793. Of these benchmarks, there are a few, like `real/eff/CS`, that still have highly unstable runtimes. Fix these 'microbenchmarks' by hiding them behind a flag. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15999 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler