[GHC] #15936: Rethink Choice of Baseline Commit for Performance Tests

#15936: Rethink Choice of Baseline Commit for Performance Tests -------------------------------------+------------------------------------- Reporter: davide | Owner: (none) Type: task | Status: new Priority: normal | Milestone: 8.6.3 Component: Test Suite | Version: 8.6.2 Keywords: performance | Operating System: Unknown/Multiple tests git notes | Architecture: | Type of failure: None/Unknown Unknown/Multiple | Test Case: | Blocked By: Blocking: | Related Tickets: Differential Rev(s): | Wiki Page: -------------------------------------+------------------------------------- = Intro = Currently we always use the previous commit when running performance tests. This works well in CI where we fully test each commit in sequence (and hence always have test results for the previous commit). Remember, test results are stored in git notes and are not by default shared between repositories (i.e. your local repo will only have performance results run locally on your machine). This is by design: we want to avoid comparing results form different machines. Unfortunately This is not so effective when testing locally. The programmer may have only run a subset of performance tests on the previous commit, and often have not run the tests at all (this is notably true after performing a rebase: the previous commit has changed). We need to rethink how we pick a baseline commit. = Goals = * In all cases, do something sensible. * Giving a warning if conditions are not idea. Provide clear and simple instructions on how to get to the ideal case. * Give control over the baseline commit to the programmer via command line options. * In general, performance tests should just work! No extra knowledge needed by the programmer. * If tests pass without warning now, then they should pass without warning later. = Proposed Solution = * Choose baseline commit * Provide command line arguments to set the baseline commit * If not baseline commit is provided use the "Ideal baseline commit". * If local test results for the baseline are available use those results. * Else if results from CI are available (without fetching), then use those results. * Else Warn user that there are no results available (tests trivially pass), then suggest fetching CI results (quick and easy, give full command, mention you may have to wait for CI to finish if the commit is recent) or running locally (more accurate, mention exact commit to checkout), or manually select a baseline commit. == Ideal baseline commit == * If there are no new changes: 0 ahead and 0 or more behind master. * Ideal baseline commit is the previous commit. * Else 1 or more ahead and 0 or more behind master. * Ideal baseline commit is `merge-base master HEAD` * Assume that the intention is to create a patch where all new commits will ultimately be squashed and placed on top of master. We only want to consider performance changes caused by the new commits, so we use the merge base instead of master HEAD (though these may be the same commit). = Open issues = If commits between HEAD and the baseline commit [https://ghc.haskell.org/trac/ghc/wiki/Performance/Tests#ExpectedPerformanceC... allow changes] in performance numbers, what should the behaviour be? What happens if this is merged? The commits will be squashed: this could affect the commit message and hence the allowed changes which is parsed from the commit messages. = Use cases = TODO This section is a WIP We generally will assume that the master branch corresponds to ghc's master branch (though may be slightly out of date). ||= Case =||= Details =||= Baseline =|| || 0 commits ahead of master || May be on master or branched off master with no new commits || previous commit || || CI pre-merge || || CI post-merge -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15936 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15936: Rethink Choice of Baseline Commit for Performance Tests -------------------------------------+------------------------------------- Reporter: davide | Owner: davide Type: task | Status: new Priority: normal | Milestone: 8.6.3 Component: Test Suite | Version: 8.6.2 Resolution: | Keywords: performance | tests git notes Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by davide): * owner: (none) => davide -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15936#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15936: Rethink Choice of Baseline Commit for Performance Tests -------------------------------------+------------------------------------- Reporter: davide | Owner: davide Type: task | Status: new Priority: normal | Milestone: 8.6.3 Component: Test Suite | Version: 8.6.2 Resolution: | Keywords: performance | tests git notes Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Description changed by davide: Old description:
= Intro =
Currently we always use the previous commit when running performance tests. This works well in CI where we fully test each commit in sequence (and hence always have test results for the previous commit). Remember, test results are stored in git notes and are not by default shared between repositories (i.e. your local repo will only have performance results run locally on your machine). This is by design: we want to avoid comparing results form different machines.
Unfortunately This is not so effective when testing locally. The programmer may have only run a subset of performance tests on the previous commit, and often have not run the tests at all (this is notably true after performing a rebase: the previous commit has changed). We need to rethink how we pick a baseline commit.
= Goals =
* In all cases, do something sensible. * Giving a warning if conditions are not idea. Provide clear and simple instructions on how to get to the ideal case. * Give control over the baseline commit to the programmer via command line options. * In general, performance tests should just work! No extra knowledge needed by the programmer. * If tests pass without warning now, then they should pass without warning later.
= Proposed Solution =
* Choose baseline commit * Provide command line arguments to set the baseline commit * If not baseline commit is provided use the "Ideal baseline commit". * If local test results for the baseline are available use those results. * Else if results from CI are available (without fetching), then use those results. * Else Warn user that there are no results available (tests trivially pass), then suggest fetching CI results (quick and easy, give full command, mention you may have to wait for CI to finish if the commit is recent) or running locally (more accurate, mention exact commit to checkout), or manually select a baseline commit.
== Ideal baseline commit ==
* If there are no new changes: 0 ahead and 0 or more behind master. * Ideal baseline commit is the previous commit. * Else 1 or more ahead and 0 or more behind master. * Ideal baseline commit is `merge-base master HEAD` * Assume that the intention is to create a patch where all new commits will ultimately be squashed and placed on top of master. We only want to consider performance changes caused by the new commits, so we use the merge base instead of master HEAD (though these may be the same commit).
= Open issues =
If commits between HEAD and the baseline commit [https://ghc.haskell.org/trac/ghc/wiki/Performance/Tests#ExpectedPerformanceC... allow changes] in performance numbers, what should the behaviour be? What happens if this is merged? The commits will be squashed: this could affect the commit message and hence the allowed changes which is parsed from the commit messages.
= Use cases =
TODO This section is a WIP
We generally will assume that the master branch corresponds to ghc's master branch (though may be slightly out of date).
||= Case =||= Details =||= Baseline =|| || 0 commits ahead of master || May be on master or branched off master with no new commits || previous commit || || CI pre-merge || || CI post-merge
New description: = Intro = Currently we always use the previous commit when running performance tests. This works well in CI where we fully test each commit in sequence (and hence always have test results for the previous commit). Remember, test results are stored in git notes and are not by default shared between repositories (i.e. your local repo will only have performance results run locally on your machine). This is by design: we want to avoid comparing results form different machines. Unfortunately This is not so effective when testing locally. The programmer may have only run a subset of performance tests on the previous commit, and often have not run the tests at all (this is notably true after performing a rebase: the previous commit has changed). We need to rethink how we pick a baseline commit. = Goals = * In all cases, do something sensible. * Giving a warning if conditions are not idea. Provide clear and simple instructions on how to get to the ideal case. * Give control over the baseline commit to the programmer via command line options. * Give control over the baseline of local or ci to the programmer via command line options. * In general, performance tests should just work! No extra knowledge needed by the programmer. * If tests pass without warning now, then they should pass without warning later. = Proposed Solution = * Choose baseline commit * Provide command line arguments to set the baseline commit * If not baseline commit is provided use the "Ideal baseline commit". * If local test results for the baseline are available use those results. * Else if results from CI are available (without fetching), then use those results. * Else Warn user that there are no results available (tests trivially pass), then suggest fetching CI results (quick and easy, give full command, mention you may have to wait for CI to finish if the commit is recent) or running locally (more accurate, mention exact commit to checkout), or manually select a baseline commit. == Ideal baseline commit == * If there are no new changes: 0 ahead and 0 or more behind master. * Ideal baseline commit is the previous commit. * Else 1 or more ahead and 0 or more behind master. * Ideal baseline commit is `merge-base master HEAD` * Assume that the intention is to create a patch where all new commits will ultimately be squashed and placed on top of master. We only want to consider performance changes caused by the new commits, so we use the merge base instead of master HEAD (though these may be the same commit). An easy way to implement this is: {{{ mergeBase = merge-base master HEAD baseline = if mergeBase == HEAD then HEAD^ else mergeBase }}} = Open issues = If commits between HEAD and the baseline commit [https://ghc.haskell.org/trac/ghc/wiki/Performance/Tests#ExpectedPerformanceC... allow changes] in performance numbers, what should the behaviour be? What happens if this is merged? The commits will be squashed: this could affect the commit message and hence the allowed changes which is parsed from the commit messages. TODO assume programmer only adds such annotations as tests fail, and doesn't want to renter them in full. We still test against baseline. Combine all allowed changes, prefering latest when there is overlap (I think thats right... think about adding a commit that decreases a metric, then you add another commit that increases it (compared to baseline), then overall this is an increase and we can ignore the intermediate decrease, thanks to the commits ultimately being squashed). Warn about what to put in squashed commit. We must figure out what commit messages will be used in GitLab on merge. = Use cases = * We do not distinguish between full/partial performance results being available for the baseline commit: that would require checking out the baseline commit and extracting the full list of tests. * == Locally validate a commit from master == {{{ git checkout master~5 ./validate }}} Baseline Commit: HEAD^ == master~6 || BaselinelocalResults || BaselineCIResults || Infos || Warnings || ||||= Case =||||||= Behaviour =|| ||= Baseline local Results? =||= Baseline CI Results? =||= Baseline local/CI =||= Infos =||= Warnings =|| || Yes/Partial || - || Local || If HEAD tests is not subset or eq to Baseline tests: "If relevant tests exist on baseline, checkout baseline and running those tests OR fetch notes and use --baseline-ci || Warnings || || No || Yes || CI || "Using CI numbers, suggest running locally for more accurate results" || Warnings || || No || No || - || || "No baseline results, tests trivially pass" + suggest fetch notes or locally run tests on baseline || == Locally validate a commit from master == {{{ git checkout master~5 ./validate }}} Baseline Commit: HEAD^ == master~6 || BaselinelocalResults || BaselineCIResults || Infos || Warnings || ||||= Case =||||||= Behaviour =|| ||= Baseline local Results? =||= Baseline CI Results? =||= Baseline local/CI =||= Infos =||= Warnings =|| || Yes/Partial || - || Local || If HEAD tests is not subset or eq to Baseline tests: "If relevant tests exist on baseline, checkout baseline and running those tests OR fetch notes and use --baseline-ci || Warnings || || No || Yes || CI || "Using CI numbers, suggest running locally for more accurate results" || Warnings || || No || No || - || || "No baseline results, tests trivially pass" + suggest fetch notes or locally run tests on baseline || = From the perspective of the CI = ?? From CI, "local" is actually "CI". SO replace "is CI results available" with "no" and replace "Is local results available" with "is CI results available" = When to automatically fetch CI results? = If baseline commit doesn't have local nor CI results, and is old enough such that we expect CI to have been run (WARNING we would need to know the merge time, not the time that the commit was created, which could be long before it was merged? Or will GitLab bump the commit time on merge?) -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15936#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15936: Rethink Choice of Baseline Commit for Performance Tests -------------------------------------+------------------------------------- Reporter: davide | Owner: davide Type: task | Status: new Priority: normal | Milestone: 8.6.3 Component: Test Suite | Version: 8.6.2 Resolution: | Keywords: performance | tests git notes Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Description changed by davide: Old description:
= Intro =
Currently we always use the previous commit when running performance tests. This works well in CI where we fully test each commit in sequence (and hence always have test results for the previous commit). Remember, test results are stored in git notes and are not by default shared between repositories (i.e. your local repo will only have performance results run locally on your machine). This is by design: we want to avoid comparing results form different machines.
Unfortunately This is not so effective when testing locally. The programmer may have only run a subset of performance tests on the previous commit, and often have not run the tests at all (this is notably true after performing a rebase: the previous commit has changed). We need to rethink how we pick a baseline commit.
= Goals =
* In all cases, do something sensible. * Giving a warning if conditions are not idea. Provide clear and simple instructions on how to get to the ideal case. * Give control over the baseline commit to the programmer via command line options. * Give control over the baseline of local or ci to the programmer via command line options. * In general, performance tests should just work! No extra knowledge needed by the programmer. * If tests pass without warning now, then they should pass without warning later.
= Proposed Solution =
* Choose baseline commit * Provide command line arguments to set the baseline commit * If not baseline commit is provided use the "Ideal baseline commit". * If local test results for the baseline are available use those results. * Else if results from CI are available (without fetching), then use those results. * Else Warn user that there are no results available (tests trivially pass), then suggest fetching CI results (quick and easy, give full command, mention you may have to wait for CI to finish if the commit is recent) or running locally (more accurate, mention exact commit to checkout), or manually select a baseline commit.
== Ideal baseline commit ==
* If there are no new changes: 0 ahead and 0 or more behind master. * Ideal baseline commit is the previous commit. * Else 1 or more ahead and 0 or more behind master. * Ideal baseline commit is `merge-base master HEAD` * Assume that the intention is to create a patch where all new commits will ultimately be squashed and placed on top of master. We only want to consider performance changes caused by the new commits, so we use the merge base instead of master HEAD (though these may be the same commit).
An easy way to implement this is: {{{ mergeBase = merge-base master HEAD baseline = if mergeBase == HEAD then HEAD^ else mergeBase }}}
= Open issues =
If commits between HEAD and the baseline commit [https://ghc.haskell.org/trac/ghc/wiki/Performance/Tests#ExpectedPerformanceC... allow changes] in performance numbers, what should the behaviour be? What happens if this is merged? The commits will be squashed: this could affect the commit message and hence the allowed changes which is parsed from the commit messages.
TODO assume programmer only adds such annotations as tests fail, and doesn't want to renter them in full. We still test against baseline. Combine all allowed changes, prefering latest when there is overlap (I think thats right... think about adding a commit that decreases a metric, then you add another commit that increases it (compared to baseline), then overall this is an increase and we can ignore the intermediate decrease, thanks to the commits ultimately being squashed). Warn about what to put in squashed commit.
We must figure out what commit messages will be used in GitLab on merge.
= Use cases =
* We do not distinguish between full/partial performance results being available for the baseline commit: that would require checking out the baseline commit and extracting the full list of tests. *
== Locally validate a commit from master ==
{{{ git checkout master~5 ./validate }}}
Baseline Commit: HEAD^ == master~6
|| BaselinelocalResults || BaselineCIResults || Infos || Warnings ||
||||= Case =||||||= Behaviour =|| ||= Baseline local Results? =||= Baseline CI Results? =||= Baseline local/CI =||= Infos =||= Warnings =|| || Yes/Partial || - || Local || If HEAD tests is not subset or eq to Baseline tests: "If relevant tests exist on baseline, checkout baseline and running those tests OR fetch notes and use --baseline-ci || Warnings || || No || Yes || CI || "Using CI numbers, suggest running locally for more accurate results" || Warnings || || No || No || - || || "No baseline results, tests trivially pass" + suggest fetch notes or locally run tests on baseline ||
== Locally validate a commit from master ==
{{{ git checkout master~5 ./validate }}}
Baseline Commit: HEAD^ == master~6
|| BaselinelocalResults || BaselineCIResults || Infos || Warnings ||
||||= Case =||||||= Behaviour =|| ||= Baseline local Results? =||= Baseline CI Results? =||= Baseline local/CI =||= Infos =||= Warnings =|| || Yes/Partial || - || Local || If HEAD tests is not subset or eq to Baseline tests: "If relevant tests exist on baseline, checkout baseline and running those tests OR fetch notes and use --baseline-ci || Warnings || || No || Yes || CI || "Using CI numbers, suggest running locally for more accurate results" || Warnings || || No || No || - || || "No baseline results, tests trivially pass" + suggest fetch notes or locally run tests on baseline ||
= From the perspective of the CI =
?? From CI, "local" is actually "CI". SO replace "is CI results available" with "no" and replace "Is local results available" with "is CI results available"
= When to automatically fetch CI results? =
If baseline commit doesn't have local nor CI results, and is old enough such that we expect CI to have been run (WARNING we would need to know the merge time, not the time that the commit was created, which could be long before it was merged? Or will GitLab bump the commit time on merge?)
New description: = Intro = Currently we always use the previous commit when running performance tests. This works well in CI where we fully test each commit in sequence (and hence always have test results for the previous commit). Remember, test results are stored in git notes and are not by default shared between repositories (i.e. your local repo will only have performance results run locally on your machine). This is by design: we want to avoid comparing results form different machines. Unfortunately This is not so effective when testing locally. The programmer may have only run a subset of performance tests on the previous commit, and often have not run the tests at all (this is notably true after performing a rebase: the previous commit has changed). We need to rethink how we pick a baseline commit. = Goals = * In all cases, do something sensible. * Giving a warning if conditions are not idea. Provide clear and simple instructions on how to get to the ideal case. * Give control over the baseline commit to the programmer via command line options. * Could make it a baseline branch where we still do git merge-base. That would be useful if you are branching from a different branch than master. * Give control over the baseline of local or ci to the programmer via command line options. * In general, performance tests should just work! No extra knowledge needed by the programmer. * If tests pass without warning now, then they should pass without warning later. = Proposed Solution = * When running performance tests, results will be compared to a baseline commit that is the merge base with master (most recent commit from master). If HEAD is already in master, then the previous commit is used instead. * If any locally generated performance results exist, they are used exclusively for the baseline. * Else if any CI generated performance results exist (and have been fetched), they are used exclusively for the baseline. * Else performance tests trivially pass, and a warning is given to the user. To find the baseline commit: {{{ mergeBase = merge-base master HEAD baselineCommit = if mergeBase == HEAD then HEAD^ else mergeBase }}} == Reasoning == * We want each commit in master not to introduce a significant change in performance: hence we compare commits in mater to the previous commit. * If not on master (1 or more ahead and 0 or more commits behind master). We assume that the intention is to create a patch where all new commits will ultimately be squashed and placed on top of master as a single commit. On the other hand we don't want to consider changes in master from after we branched. Instead of using master HEAD as the baseline, we use the commit from which we branched from master (i.e. the merge base). In other words we are concerned only with the change in performance introduced by the newly crated commits. = Handling Expected changes = See [https://ghc.haskell.org/trac/ghc/wiki/Performance/Tests#ExpectedPerformanceC... expected performance changes]. If on master or an ancestor commit, the baseline is the previous commit and we can simply allow performance changes as specified in the current commit's message (this is already the behaviour of the test driver). If we have branched from master, then we may have multiple commits from the baseline commit to HEAD, each of which may have, possibly contradictory, expected performance changes. If any expected changes exist, aggregate them. We introduce an explicit "Metric Unchanged" option and aggregate per test taking the newest commit. "Metric Unchanged" is necessary in the case that a new commit undoes a performance change such that a metric returns to the baseline value. The aggregate version should be output so that the programmer knows what to put in the commit message after squashing the commits. == Reasoning == creating new commits with expected changes is an interactive process. The programmer adds a 1 or more commits, runs the tests, then adds expected performance changes to a commit message. It would be too inconvenient to force the programmer to change old commit messages, and too verbose/annoying to have them enter a full list of expected changes in each commit. Hence we must aggregate the expected changes. This is of a bit risky as it is a context sensitive change in the semantics of expected changes. If we e.g. intend not to squash the commits, then all the sudden the expected changes mean something very different (change to the previous commit, not some distant baseline commit). Perhaps we just show a warning in this case. We must figure out what commit messages will be used in GitLab on merge. Does the programmer have to remember to sort out expected changes before merge some how? = Use cases = * We do not distinguish between full/partial performance results being available for the baseline commit: that would require checking out the baseline commit and extracting the full list of tests. * == Locally validate a commit from master == {{{ git checkout master~5 ./validate }}} Baseline Commit: HEAD^ == master~6 || BaselinelocalResults || BaselineCIResults || Infos || Warnings || ||||= Case =||||||= Behaviour =|| ||= Baseline local Results? =||= Baseline CI Results? =||= Baseline local/CI =||= Infos =||= Warnings =|| || Yes/Partial || - || Local || If HEAD tests is not subset or eq to Baseline tests: "If relevant tests exist on baseline, checkout baseline and running those tests OR fetch notes and use --baseline-ci || Warnings || || No || Yes || CI || "Using CI numbers, suggest running locally for more accurate results" || Warnings || || No || No || - || || "No baseline results, tests trivially pass" + suggest fetch notes or locally run tests on baseline || == Locally validate a commit from master == {{{ git checkout master~5 ./validate }}} Baseline Commit: HEAD^ == master~6 || BaselinelocalResults || BaselineCIResults || Infos || Warnings || ||||= Case =||||||= Behaviour =|| ||= Baseline local Results? =||= Baseline CI Results? =||= Baseline local/CI =||= Infos =||= Warnings =|| || Yes/Partial || - || Local || If HEAD tests is not subset or eq to Baseline tests: "If relevant tests exist on baseline, checkout baseline and running those tests OR fetch notes and use --baseline-ci || Warnings || || No || Yes || CI || "Using CI numbers, suggest running locally for more accurate results" || Warnings || || No || No || - || || "No baseline results, tests trivially pass" + suggest fetch notes or locally run tests on baseline || = From the perspective of the CI = ?? From CI, "local" is actually "CI". SO replace "is CI results available" with "no" and replace "Is local results available" with "is CI results available" = When to automatically fetch CI results? = If baseline commit doesn't have local nor CI results, and is old enough such that we expect CI to have been run (WARNING we would need to know the merge time, not the time that the commit was created, which could be long before it was merged? Or will GitLab bump the commit time on merge?) -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15936#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15936: Rethink Choice of Baseline Commit for Performance Tests -------------------------------------+------------------------------------- Reporter: davide | Owner: davide Type: task | Status: new Priority: normal | Milestone: 8.6.3 Component: Test Suite | Version: 8.6.2 Resolution: | Keywords: performance | tests git notes Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Description changed by davide: Old description:
= Intro =
Currently we always use the previous commit when running performance tests. This works well in CI where we fully test each commit in sequence (and hence always have test results for the previous commit). Remember, test results are stored in git notes and are not by default shared between repositories (i.e. your local repo will only have performance results run locally on your machine). This is by design: we want to avoid comparing results form different machines.
Unfortunately This is not so effective when testing locally. The programmer may have only run a subset of performance tests on the previous commit, and often have not run the tests at all (this is notably true after performing a rebase: the previous commit has changed). We need to rethink how we pick a baseline commit.
= Goals =
* In all cases, do something sensible. * Giving a warning if conditions are not idea. Provide clear and simple instructions on how to get to the ideal case. * Give control over the baseline commit to the programmer via command line options. * Could make it a baseline branch where we still do git merge-base. That would be useful if you are branching from a different branch than master. * Give control over the baseline of local or ci to the programmer via command line options. * In general, performance tests should just work! No extra knowledge needed by the programmer. * If tests pass without warning now, then they should pass without warning later.
= Proposed Solution =
* When running performance tests, results will be compared to a baseline commit that is the merge base with master (most recent commit from master). If HEAD is already in master, then the previous commit is used instead. * If any locally generated performance results exist, they are used exclusively for the baseline. * Else if any CI generated performance results exist (and have been fetched), they are used exclusively for the baseline. * Else performance tests trivially pass, and a warning is given to the user.
To find the baseline commit: {{{ mergeBase = merge-base master HEAD baselineCommit = if mergeBase == HEAD then HEAD^ else mergeBase }}}
== Reasoning ==
* We want each commit in master not to introduce a significant change in performance: hence we compare commits in mater to the previous commit. * If not on master (1 or more ahead and 0 or more commits behind master). We assume that the intention is to create a patch where all new commits will ultimately be squashed and placed on top of master as a single commit. On the other hand we don't want to consider changes in master from after we branched. Instead of using master HEAD as the baseline, we use the commit from which we branched from master (i.e. the merge base). In other words we are concerned only with the change in performance introduced by the newly crated commits.
= Handling Expected changes =
See [https://ghc.haskell.org/trac/ghc/wiki/Performance/Tests#ExpectedPerformanceC... expected performance changes].
If on master or an ancestor commit, the baseline is the previous commit and we can simply allow performance changes as specified in the current commit's message (this is already the behaviour of the test driver).
If we have branched from master, then we may have multiple commits from the baseline commit to HEAD, each of which may have, possibly contradictory, expected performance changes. If any expected changes exist, aggregate them. We introduce an explicit "Metric Unchanged" option and aggregate per test taking the newest commit. "Metric Unchanged" is necessary in the case that a new commit undoes a performance change such that a metric returns to the baseline value. The aggregate version should be output so that the programmer knows what to put in the commit message after squashing the commits.
== Reasoning ==
creating new commits with expected changes is an interactive process. The programmer adds a 1 or more commits, runs the tests, then adds expected performance changes to a commit message. It would be too inconvenient to force the programmer to change old commit messages, and too verbose/annoying to have them enter a full list of expected changes in each commit. Hence we must aggregate the expected changes.
This is of a bit risky as it is a context sensitive change in the semantics of expected changes. If we e.g. intend not to squash the commits, then all the sudden the expected changes mean something very different (change to the previous commit, not some distant baseline commit). Perhaps we just show a warning in this case.
We must figure out what commit messages will be used in GitLab on merge. Does the programmer have to remember to sort out expected changes before merge some how?
= Use cases =
* We do not distinguish between full/partial performance results being available for the baseline commit: that would require checking out the baseline commit and extracting the full list of tests. *
== Locally validate a commit from master ==
{{{ git checkout master~5 ./validate }}}
Baseline Commit: HEAD^ == master~6
|| BaselinelocalResults || BaselineCIResults || Infos || Warnings ||
||||= Case =||||||= Behaviour =|| ||= Baseline local Results? =||= Baseline CI Results? =||= Baseline local/CI =||= Infos =||= Warnings =|| || Yes/Partial || - || Local || If HEAD tests is not subset or eq to Baseline tests: "If relevant tests exist on baseline, checkout baseline and running those tests OR fetch notes and use --baseline-ci || Warnings || || No || Yes || CI || "Using CI numbers, suggest running locally for more accurate results" || Warnings || || No || No || - || || "No baseline results, tests trivially pass" + suggest fetch notes or locally run tests on baseline ||
== Locally validate a commit from master ==
{{{ git checkout master~5 ./validate }}}
Baseline Commit: HEAD^ == master~6
|| BaselinelocalResults || BaselineCIResults || Infos || Warnings ||
||||= Case =||||||= Behaviour =|| ||= Baseline local Results? =||= Baseline CI Results? =||= Baseline local/CI =||= Infos =||= Warnings =|| || Yes/Partial || - || Local || If HEAD tests is not subset or eq to Baseline tests: "If relevant tests exist on baseline, checkout baseline and running those tests OR fetch notes and use --baseline-ci || Warnings || || No || Yes || CI || "Using CI numbers, suggest running locally for more accurate results" || Warnings || || No || No || - || || "No baseline results, tests trivially pass" + suggest fetch notes or locally run tests on baseline ||
= From the perspective of the CI =
?? From CI, "local" is actually "CI". SO replace "is CI results available" with "no" and replace "Is local results available" with "is CI results available"
= When to automatically fetch CI results? =
If baseline commit doesn't have local nor CI results, and is old enough such that we expect CI to have been run (WARNING we would need to know the merge time, not the time that the commit was created, which could be long before it was merged? Or will GitLab bump the commit time on merge?)
New description: = Intro = Currently we always use the previous commit when running performance tests. This works well in CI where we fully test each commit in sequence (and hence always have test results for the previous commit). Remember, test results are stored in git notes and are not by default shared between repositories (i.e. your local repo will only have performance results when they were run locally on your machine). This is by design: we want to avoid comparing results form different machines. Unfortunately This is not so effective when testing locally. The programmer may have only run a subset of performance tests on the previous commit, and often have not run the tests at all (this is notably true after performing a rebase: the previous commit has changed). We need to rethink how we pick a baseline commit. = Goals = * In all cases, do something sensible. * Giving a warning if conditions are not idea. Provide clear and simple instructions on how to get to the ideal case. * Give control over the baseline commit to the programmer via command line options. * Could make it a baseline branch where we still do git merge-base. That would be useful if you are branching from a different branch than master. * Give control over the baseline of local or ci to the programmer via command line options. * In general, performance tests should just work! No extra knowledge needed by the programmer. * If tests pass without warning now, then they should pass without warning later. = Proposed Solution = * When running performance tests, results will be compared to a baseline commit that is the merge base with master (most recent commit from master). If HEAD is already in master, then the previous commit is used instead. * If any locally generated performance results exist, they are used exclusively for the baseline. * Else if any CI generated performance results exist (and have been fetched), they are used exclusively for the baseline. * Else performance tests trivially pass, and a warning is given to the user. To find the baseline commit: {{{ mergeBase = merge-base master HEAD baselineCommit = if mergeBase == HEAD then HEAD^ else mergeBase }}} == Reasoning == * We want each commit in master not to introduce a significant change in performance: hence we compare commits in mater to the previous commit. * If not on master (1 or more ahead and 0 or more commits behind master). We assume that the intention is to create a patch where all new commits will ultimately be squashed and placed on top of master as a single commit. On the other hand we don't want to consider changes in master from after we branched. Instead of using master HEAD as the baseline, we use the commit from which we branched from master (i.e. the merge base). In other words we are concerned only with the change in performance introduced by the newly crated commits. = Handling Expected changes = TODO this is the complicated part. What happens when the programmer is not planning on squashing commits and has many commits with expected changes :-( See [https://ghc.haskell.org/trac/ghc/wiki/Performance/Tests#ExpectedPerformanceC... expected performance changes]. If on master or an ancestor commit, the baseline is the previous commit and we can simply allow performance changes as specified in the current commit's message (this is already the behaviour of the test driver). If we have branched from master, then we may have multiple commits from the baseline commit to HEAD, each of which may have, possibly contradictory, expected performance changes. If any expected changes exist, aggregate them. We introduce an explicit "Metric Unchanged" option and aggregate per test where newer allowed changes overwrite older allowed changes. "Metric Unchanged" is necessary in the case that a new commit undoes a performance change such that a metric returns to the baseline value. The aggregate version should be output so that the programmer knows what to put in the commit message after squashing the commits. A warning should be given if expected changes appear in any commit inbetween HEAD and the baseline commit. In that warning Suggest e.g. "--baseline HEAD^ if not planning on squashing this commit" == Reasoning == creating new commits with expected changes is an interactive process. The programmer adds a 1 or more commits, runs the tests, then adds expected performance changes to a commit message. It would be too inconvenient to force the programmer to change old commit messages, and too verbose/annoying to have them enter a full list of expected changes in each commit. Hence we must aggregate the expected changes. This is of a bit risky as it is a context sensitive change in the semantics of expected changes. If we e.g. intend not to squash the commits, then all the sudden the expected changes mean something very different (change to the previous commit, not some distant baseline commit). Perhaps we just show a warning in this case. We must figure out what commit messages will be used in GitLab on merge. Does the programmer have to remember to sort out expected changes before merge some how? = Use cases = * We do not distinguish between full/partial performance results being available for the baseline commit: that would require checking out the baseline commit and extracting the full list of tests. * || BaselinelocalResults || BaselineCIResults || Infos || Warnings || ||||||= Case =||||||= Behaviour =|| ||= Platform =||= Baseline local Results? =||= Baseline CI Results? =||= Baseline local/CI =||= Infos =||= Warnings =|| || Local || Yes/Partial || - || Local || If HEAD tests is not subset or eq to Baseline tests: "If relevant tests exist on baseline, checkout baseline and running those tests OR fetch notes and use --baseline-ci || || || Local || No || Yes || CI || "Using CI numbers, suggest running locally for more accurate results" || || || Local || No || No || - || || "No baseline results, tests trivially pass" + suggest fetch notes or locally run tests on baseline || || CI |||| Yes || CI || || || || CI |||| No || - || || "No results. CI is not yet finished, or CI has failed for the baseline commit or CI hasn’t fetched" || = When to automatically fetch CI results? = Fetch if: * Testing locally (not a ci run) AND * Baseline commit doesn't have local nor CI results (before fetch) AND * Baseline commit is an ancestor of master. If fetching, suggest a command line option: --no-fetch. This is most convenient for local testing avoids fetch on ci, but may result in unwanted/wasted fetches -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15936#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15936: Rethink Choice of Baseline Commit for Performance Tests -------------------------------------+------------------------------------- Reporter: davide | Owner: davide Type: task | Status: new Priority: normal | Milestone: 8.6.3 Component: Test Suite | Version: 8.6.2 Resolution: | Keywords: performance | tests git notes Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Description changed by davide: Old description:
= Intro =
Currently we always use the previous commit when running performance tests. This works well in CI where we fully test each commit in sequence (and hence always have test results for the previous commit). Remember, test results are stored in git notes and are not by default shared between repositories (i.e. your local repo will only have performance results when they were run locally on your machine). This is by design: we want to avoid comparing results form different machines.
Unfortunately This is not so effective when testing locally. The programmer may have only run a subset of performance tests on the previous commit, and often have not run the tests at all (this is notably true after performing a rebase: the previous commit has changed). We need to rethink how we pick a baseline commit.
= Goals =
* In all cases, do something sensible. * Giving a warning if conditions are not idea. Provide clear and simple instructions on how to get to the ideal case. * Give control over the baseline commit to the programmer via command line options. * Could make it a baseline branch where we still do git merge-base. That would be useful if you are branching from a different branch than master. * Give control over the baseline of local or ci to the programmer via command line options. * In general, performance tests should just work! No extra knowledge needed by the programmer. * If tests pass without warning now, then they should pass without warning later.
= Proposed Solution =
* When running performance tests, results will be compared to a baseline commit that is the merge base with master (most recent commit from master). If HEAD is already in master, then the previous commit is used instead. * If any locally generated performance results exist, they are used exclusively for the baseline. * Else if any CI generated performance results exist (and have been fetched), they are used exclusively for the baseline. * Else performance tests trivially pass, and a warning is given to the user.
To find the baseline commit: {{{ mergeBase = merge-base master HEAD baselineCommit = if mergeBase == HEAD then HEAD^ else mergeBase }}}
== Reasoning ==
* We want each commit in master not to introduce a significant change in performance: hence we compare commits in mater to the previous commit. * If not on master (1 or more ahead and 0 or more commits behind master). We assume that the intention is to create a patch where all new commits will ultimately be squashed and placed on top of master as a single commit. On the other hand we don't want to consider changes in master from after we branched. Instead of using master HEAD as the baseline, we use the commit from which we branched from master (i.e. the merge base). In other words we are concerned only with the change in performance introduced by the newly crated commits.
= Handling Expected changes =
TODO this is the complicated part. What happens when the programmer is not planning on squashing commits and has many commits with expected changes :-(
See [https://ghc.haskell.org/trac/ghc/wiki/Performance/Tests#ExpectedPerformanceC... expected performance changes].
If on master or an ancestor commit, the baseline is the previous commit and we can simply allow performance changes as specified in the current commit's message (this is already the behaviour of the test driver).
If we have branched from master, then we may have multiple commits from the baseline commit to HEAD, each of which may have, possibly contradictory, expected performance changes. If any expected changes exist, aggregate them. We introduce an explicit "Metric Unchanged" option and aggregate per test where newer allowed changes overwrite older allowed changes. "Metric Unchanged" is necessary in the case that a new commit undoes a performance change such that a metric returns to the baseline value. The aggregate version should be output so that the programmer knows what to put in the commit message after squashing the commits. A warning should be given if expected changes appear in any commit inbetween HEAD and the baseline commit. In that warning Suggest e.g. "--baseline HEAD^ if not planning on squashing this commit"
== Reasoning ==
creating new commits with expected changes is an interactive process. The programmer adds a 1 or more commits, runs the tests, then adds expected performance changes to a commit message. It would be too inconvenient to force the programmer to change old commit messages, and too verbose/annoying to have them enter a full list of expected changes in each commit. Hence we must aggregate the expected changes.
This is of a bit risky as it is a context sensitive change in the semantics of expected changes. If we e.g. intend not to squash the commits, then all the sudden the expected changes mean something very different (change to the previous commit, not some distant baseline commit). Perhaps we just show a warning in this case.
We must figure out what commit messages will be used in GitLab on merge. Does the programmer have to remember to sort out expected changes before merge some how?
= Use cases =
* We do not distinguish between full/partial performance results being available for the baseline commit: that would require checking out the baseline commit and extracting the full list of tests. *
|| BaselinelocalResults || BaselineCIResults || Infos || Warnings ||
||||||= Case =||||||= Behaviour =|| ||= Platform =||= Baseline local Results? =||= Baseline CI Results? =||= Baseline local/CI =||= Infos =||= Warnings =|| || Local || Yes/Partial || - || Local || If HEAD tests is not subset or eq to Baseline tests: "If relevant tests exist on baseline, checkout baseline and running those tests OR fetch notes and use --baseline-ci || || || Local || No || Yes || CI || "Using CI numbers, suggest running locally for more accurate results" || || || Local || No || No || - || || "No baseline results, tests trivially pass" + suggest fetch notes or locally run tests on baseline || || CI |||| Yes || CI || || || || CI |||| No || - || || "No results. CI is not yet finished, or CI has failed for the baseline commit or CI hasn’t fetched" ||
= When to automatically fetch CI results? =
Fetch if: * Testing locally (not a ci run) AND * Baseline commit doesn't have local nor CI results (before fetch) AND * Baseline commit is an ancestor of master. If fetching, suggest a command line option: --no-fetch. This is most convenient for local testing avoids fetch on ci, but may result in unwanted/wasted fetches
New description: = Intro = Currently we always use the previous commit when running performance tests. This works well in CI where we fully test each commit in sequence (and hence always have test results for the previous commit). Remember, test results are stored in git notes and are not by default shared between repositories (i.e. your local repo will only have performance results when they were run locally on your machine). This is by design: we want to avoid comparing results form different machines. Unfortunately This is not so effective when testing locally. The programmer may have only run a subset of performance tests on the previous commit, and often have not run the tests at all (this is notably true after performing a rebase: the previous commit has changed). We need to rethink how we pick a baseline commit. = Goals = * In all cases, do something sensible. * Giving a warning if conditions are not idea. Provide clear and simple instructions on how to get to the ideal case. * Give control over the baseline commit to the programmer via command line options. * Could make it a baseline branch where we still do git merge-base. That would be useful if you are branching from a different branch than master. * Give control over the baseline of local or ci to the programmer via command line options. * In general, performance tests should just work! No extra knowledge needed by the programmer. * If tests pass without warning now, then they should pass without warning later. = Proposed Solution 1 = * Per test use baseline of closest ancestor commit's metric (favour local as opposed to CI metrics). aggregate expected changes, but skip tests with both expected increase and decrease. * If anything other than local metrics from previous commit were used, show a warning suggesting running tests on previous commit. * Stop search if a CI commit is found (these are assumed to be complete) or if searched an arbitrary number of commits e.g. 100 (this is important as we'll end up searching whole history if a new test was added). = Proposed Solution 2 = * When running performance tests, results will be compared to a baseline commit that is the merge base with master (most recent commit from master). If HEAD is already in master, then the previous commit is used instead. * If any locally generated performance results exist, they are used exclusively for the baseline. * Else if any CI generated performance results exist (and have been fetched), they are used exclusively for the baseline. * Else performance tests trivially pass, and a warning is given to the user. To find the baseline commit: {{{ mergeBase = merge-base master HEAD baselineCommit = if mergeBase == HEAD then HEAD^ else mergeBase }}} == Reasoning == * We want each commit in master not to introduce a significant change in performance: hence we compare commits in mater to the previous commit. * If not on master (1 or more ahead and 0 or more commits behind master). We assume that the intention is to create a patch where all new commits will ultimately be squashed and placed on top of master as a single commit. On the other hand we don't want to consider changes in master from after we branched. Instead of using master HEAD as the baseline, we use the commit from which we branched from master (i.e. the merge base). In other words we are concerned only with the change in performance introduced by the newly crated commits. = Handling Expected changes = TODO this is the complicated part. What happens when the programmer is not planning on squashing commits and has many commits with expected changes :-( See [https://ghc.haskell.org/trac/ghc/wiki/Performance/Tests#ExpectedPerformanceC... expected performance changes]. If on master or an ancestor commit, the baseline is the previous commit and we can simply allow performance changes as specified in the current commit's message (this is already the behaviour of the test driver). If we have branched from master, then we may have multiple commits from the baseline commit to HEAD, each of which may have, possibly contradictory, expected performance changes. If any expected changes exist, aggregate them. We introduce an explicit "Metric Unchanged" option and aggregate per test where newer allowed changes overwrite older allowed changes. "Metric Unchanged" is necessary in the case that a new commit undoes a performance change such that a metric returns to the baseline value. The aggregate version should be output so that the programmer knows what to put in the commit message after squashing the commits. A warning should be given if expected changes appear in any commit inbetween HEAD and the baseline commit. In that warning Suggest e.g. "--baseline HEAD^ if not planning on squashing this commit" == Reasoning == creating new commits with expected changes is an interactive process. The programmer adds a 1 or more commits, runs the tests, then adds expected performance changes to a commit message. It would be too inconvenient to force the programmer to change old commit messages, and too verbose/annoying to have them enter a full list of expected changes in each commit. Hence we must aggregate the expected changes. This is of a bit risky as it is a context sensitive change in the semantics of expected changes. If we e.g. intend not to squash the commits, then all the sudden the expected changes mean something very different (change to the previous commit, not some distant baseline commit). Perhaps we just show a warning in this case. We must figure out what commit messages will be used in GitLab on merge. Does the programmer have to remember to sort out expected changes before merge some how? = Use cases = * We do not distinguish between full/partial performance results being available for the baseline commit: that would require checking out the baseline commit and extracting the full list of tests. * || BaselinelocalResults || BaselineCIResults || Infos || Warnings || ||||||= Case =||||||= Behaviour =|| ||= Platform =||= Baseline local Results? =||= Baseline CI Results? =||= Baseline local/CI =||= Infos =||= Warnings =|| || Local || Yes/Partial || - || Local || If HEAD tests is not subset or eq to Baseline tests: "If relevant tests exist on baseline, checkout baseline and running those tests OR fetch notes and use --baseline-ci || || || Local || No || Yes || CI || "Using CI numbers, suggest running locally for more accurate results" || || || Local || No || No || - || || "No baseline results, tests trivially pass" + suggest fetch notes or locally run tests on baseline || || CI |||| Yes || CI || || || || CI |||| No || - || || "No results. CI is not yet finished, or CI has failed for the baseline commit or CI hasn’t fetched" || = When to automatically fetch CI results? = Fetch if: * Testing locally (not a ci run) AND * Baseline commit doesn't have local nor CI results (before fetch) AND * Baseline commit is an ancestor of master. If fetching, suggest a command line option: --no-fetch. This is most convenient for local testing avoids fetch on ci, but may result in unwanted/wasted fetches -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15936#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15936: Rethink Choice of Baseline Commit for Performance Tests -------------------------------------+------------------------------------- Reporter: davide | Owner: davide Type: task | Status: new Priority: normal | Milestone: Component: Test Suite | Version: 8.6.2 Resolution: | Keywords: performance | tests git notes Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Old description:
= Intro =
Currently we always use the previous commit when running performance tests. This works well in CI where we fully test each commit in sequence (and hence always have test results for the previous commit). Remember, test results are stored in git notes and are not by default shared between repositories (i.e. your local repo will only have performance results when they were run locally on your machine). This is by design: we want to avoid comparing results form different machines.
Unfortunately This is not so effective when testing locally. The programmer may have only run a subset of performance tests on the previous commit, and often have not run the tests at all (this is notably true after performing a rebase: the previous commit has changed). We need to rethink how we pick a baseline commit.
= Goals =
* In all cases, do something sensible. * Giving a warning if conditions are not idea. Provide clear and simple instructions on how to get to the ideal case. * Give control over the baseline commit to the programmer via command line options. * Could make it a baseline branch where we still do git merge-base. That would be useful if you are branching from a different branch than master. * Give control over the baseline of local or ci to the programmer via command line options. * In general, performance tests should just work! No extra knowledge needed by the programmer. * If tests pass without warning now, then they should pass without warning later.
= Proposed Solution 1 =
* Per test use baseline of closest ancestor commit's metric (favour local as opposed to CI metrics). aggregate expected changes, but skip tests with both expected increase and decrease. * If anything other than local metrics from previous commit were used, show a warning suggesting running tests on previous commit. * Stop search if a CI commit is found (these are assumed to be complete) or if searched an arbitrary number of commits e.g. 100 (this is important as we'll end up searching whole history if a new test was added).
= Proposed Solution 2 =
* When running performance tests, results will be compared to a baseline commit that is the merge base with master (most recent commit from master). If HEAD is already in master, then the previous commit is used instead. * If any locally generated performance results exist, they are used exclusively for the baseline. * Else if any CI generated performance results exist (and have been fetched), they are used exclusively for the baseline. * Else performance tests trivially pass, and a warning is given to the user.
To find the baseline commit: {{{ mergeBase = merge-base master HEAD baselineCommit = if mergeBase == HEAD then HEAD^ else mergeBase }}}
== Reasoning ==
* We want each commit in master not to introduce a significant change in performance: hence we compare commits in mater to the previous commit. * If not on master (1 or more ahead and 0 or more commits behind master). We assume that the intention is to create a patch where all new commits will ultimately be squashed and placed on top of master as a single commit. On the other hand we don't want to consider changes in master from after we branched. Instead of using master HEAD as the baseline, we use the commit from which we branched from master (i.e. the merge base). In other words we are concerned only with the change in performance introduced by the newly crated commits.
= Handling Expected changes =
TODO this is the complicated part. What happens when the programmer is not planning on squashing commits and has many commits with expected changes :-(
See [https://ghc.haskell.org/trac/ghc/wiki/Performance/Tests#ExpectedPerformanceC... expected performance changes].
If on master or an ancestor commit, the baseline is the previous commit and we can simply allow performance changes as specified in the current commit's message (this is already the behaviour of the test driver).
If we have branched from master, then we may have multiple commits from the baseline commit to HEAD, each of which may have, possibly contradictory, expected performance changes. If any expected changes exist, aggregate them. We introduce an explicit "Metric Unchanged" option and aggregate per test where newer allowed changes overwrite older allowed changes. "Metric Unchanged" is necessary in the case that a new commit undoes a performance change such that a metric returns to the baseline value. The aggregate version should be output so that the programmer knows what to put in the commit message after squashing the commits. A warning should be given if expected changes appear in any commit inbetween HEAD and the baseline commit. In that warning Suggest e.g. "--baseline HEAD^ if not planning on squashing this commit"
== Reasoning ==
creating new commits with expected changes is an interactive process. The programmer adds a 1 or more commits, runs the tests, then adds expected performance changes to a commit message. It would be too inconvenient to force the programmer to change old commit messages, and too verbose/annoying to have them enter a full list of expected changes in each commit. Hence we must aggregate the expected changes.
This is of a bit risky as it is a context sensitive change in the semantics of expected changes. If we e.g. intend not to squash the commits, then all the sudden the expected changes mean something very different (change to the previous commit, not some distant baseline commit). Perhaps we just show a warning in this case.
We must figure out what commit messages will be used in GitLab on merge. Does the programmer have to remember to sort out expected changes before merge some how?
= Use cases =
* We do not distinguish between full/partial performance results being available for the baseline commit: that would require checking out the baseline commit and extracting the full list of tests. *
|| BaselinelocalResults || BaselineCIResults || Infos || Warnings ||
||||||= Case =||||||= Behaviour =|| ||= Platform =||= Baseline local Results? =||= Baseline CI Results? =||= Baseline local/CI =||= Infos =||= Warnings =|| || Local || Yes/Partial || - || Local || If HEAD tests is not subset or eq to Baseline tests: "If relevant tests exist on baseline, checkout baseline and running those tests OR fetch notes and use --baseline-ci || || || Local || No || Yes || CI || "Using CI numbers, suggest running locally for more accurate results" || || || Local || No || No || - || || "No baseline results, tests trivially pass" + suggest fetch notes or locally run tests on baseline || || CI |||| Yes || CI || || || || CI |||| No || - || || "No results. CI is not yet finished, or CI has failed for the baseline commit or CI hasn’t fetched" ||
= When to automatically fetch CI results? =
Fetch if: * Testing locally (not a ci run) AND * Baseline commit doesn't have local nor CI results (before fetch) AND * Baseline commit is an ancestor of master. If fetching, suggest a command line option: --no-fetch. This is most convenient for local testing avoids fetch on ci, but may result in unwanted/wasted fetches
New description: = Intro = Currently we always use the previous commit when running performance tests. This works well in CI where we fully test each commit in sequence (and hence always have test results for the previous commit). Remember, test results are stored in git notes and are not by default shared between repositories (i.e. your local repo will only have performance results when they were run locally on your machine). This is by design: we want to avoid comparing results form different machines. Unfortunately This is not so effective when testing locally. The programmer may have only run a subset of performance tests on the previous commit, and often have not run the tests at all (this is notably true after performing a rebase: the previous commit has changed). We need to rethink how we pick a baseline commit. = Proposed Solution = * Search for a baseline per test/test_env/metric/way * Start at HEAD^ and use local metrics. If none exist, use CI metrics. If none exist continue the search at the parent. * Stop after a constant number of commits (failing to find a baseline). * Stop the search if the child commit has expected changes for this test/metric/way (failing to find a baseline). * It's possible that there are multiple runs of the test (e.g if the test was run many times locally). In that case take the average. * If no baseline is found, show a warning and let the test pass. === Handling Expected changes === If there are expected changes between HEAD and a potential baseline commit, then that baseline cannot be used. We make no attempt to approximate a baseline. A warning will be issued telling the user to run the tests of the previous commit or try and fetch CI results. ==== Issues ==== * The programmer is responsible for the final commit message having the correct expected changes. This is particularly important when merged via gitlab with a squash (this can change the commit message). * We do not distinguish between full/partial performance results being available for the baseline commit: that would require checking out the baseline commit and extracting the full list of tests (This seems fragile and far too expensive). === When to automatically fetch CI results? === Do not fetch CI results. Allow the user to do this, but give the exact git command so they can just copy and paste. = Alternative Solution = Ultimately this was deemed too complicated. It assumes that commits will be squashed and merged into master (not always true). * When running performance tests, results will be compared to a baseline commit that is the merge base with master (most recent commit from master). If HEAD is already in master, then the previous commit is used instead. * If any locally generated performance results exist, they are used exclusively for the baseline. * Else if any CI generated performance results exist (and have been fetched), they are used exclusively for the baseline. * Else performance tests trivially pass, and a warning is given to the user. To find the baseline commit: {{{ mergeBase = merge-base master HEAD baselineCommit = if mergeBase == HEAD then HEAD^ else mergeBase }}} === Reasoning === * We want each commit in master not to introduce a significant change in performance: hence we compare commits in mater to the previous commit. * If not on master (1 or more ahead and 0 or more commits behind master). We assume that the intention is to create a patch where all new commits will ultimately be squashed and placed on top of master as a single commit. On the other hand we don't want to consider changes in master from after we branched. Instead of using master HEAD as the baseline, we use the commit from which we branched from master (i.e. the merge base). In other words we are concerned only with the change in performance introduced by the newly crated commits. -- Comment (by davide): I've added a MR: https://gitlab.haskell.org/ghc/ghc/merge_requests/22. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15936#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15936: Rethink Choice of Baseline Commit for Performance Tests -------------------------------------+------------------------------------- Reporter: davide | Owner: davide Type: task | Status: closed Priority: normal | Milestone: Component: Test Suite | Version: 8.6.2 Resolution: fixed | Keywords: performance | tests git notes Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by davide): * status: new => closed * resolution: => fixed Comment: Fixed in https://gitlab.haskell.org/ghc/ghc/merge_requests/294 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15936#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC