[GHC] #13220: Performance regressions in testsuite from join points

#13220: Performance regressions in testsuite from join points -------------------------------------+------------------------------------- Reporter: lukemaurer | Owner: lukemaurer Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.1 Keywords: JoinPoints | Operating System: Unknown/Multiple Architecture: | Type of failure: None/Unknown Unknown/Multiple | Test Case: | Blocked By: Blocking: | Related Tickets: Differential Rev(s): | Wiki Page: -------------------------------------+------------------------------------- Need to look at, among others: * `perf/compiler/T9961` * `deriving/perf/T10858` -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13220 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13220: Performance regressions in testsuite from join points -------------------------------------+------------------------------------- Reporter: lukemaurer | Owner: lukemaurer Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Description changed by simonpj: @@ -1,1 +1,2 @@ - Need to look at, among others: + Need to look at some modest performance regressions for the main join- + point patch (#12988, wiki [wiki:SequentCore]): @@ -3,2 +4,4 @@ - * `perf/compiler/T9961` - * `deriving/perf/T10858` + * `perf/compiler/T9961`: `bytes_allocated` + * `perf/compiler/T10356`: `bytes_allocated` + * `perf/compiler/T1969`: `max_bytes_used` + * `deriving/perf/T10858`: `bytes_allocated` New description: Need to look at some modest performance regressions for the main join- point patch (#12988, wiki [wiki:SequentCore]): * `perf/compiler/T9961`: `bytes_allocated` * `perf/compiler/T10356`: `bytes_allocated` * `perf/compiler/T1969`: `max_bytes_used` * `deriving/perf/T10858`: `bytes_allocated` -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13220#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13220: Performance regressions in testsuite from join points -------------------------------------+------------------------------------- Reporter: lukemaurer | Owner: lukemaurer Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): I'm actually not sure that the effect on the whole is all that modest, unfortunately. [[https://perf.haskell.org/ghc/#revision/8d5cf8bf584fd4849917c29d82dcf46ee75dd035|Gipeda]] shows that the build time for GHC itself went up by 13%. While some increase is to be expected, this is quite significant. Are there any obvious knobs that could be adjusted to reduce the amount of work that the join point analysis/simplifier do under `-O1`? The full analysis is great for `-O2`, but I do fear that users will be up in arms if the GHC regression is representative of user code. Also, strangely enough, while the join points work eliminated all allocations from `fannkuch-redux`, the actual runtime of this test might have actually gone up by 8%. I say "might have" since runtime measurements are notoriously unstable, but it does look likely that this is a real regression: the test's runtime is quite long (4 seconds) and 8% is quite a large change. Even if this measurement were off by a factor of two we would want to understand it. Can you shed any light on this, Luke? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13220#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13220: Performance regressions in testsuite from join points -------------------------------------+------------------------------------- Reporter: lukemaurer | Owner: lukemaurer Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by nomeata): The `fannkuch-redux` weirdness is tracked at #13225. And it is stable: Once a few more commits come in, one can clearly distinguish a regression from a occasional one-time bump (that occur with some tests), and this is one: -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13220#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13220: Performance regressions in testsuite from join points -------------------------------------+------------------------------------- Reporter: lukemaurer | Owner: lukemaurer Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj):
Gipeda shows that the build time for GHC itself went up by 13%
Well that is new and very unwelcome information. Some of the `compiler/perf` tests went down quite a bit, a couple increased. So I was under the impression that compile time impact was slight, and some runtime improvements were jolly big. But a 13% increase across the entire build is massive. Can we get any insight into what is increasing? It's hard to konw where to start with a single number across the entire build. A good place to look might be binary sizes: if we generate more code then that'll take longer to compile. Bother. Still, there is no good reason that any of this join point stuff should cost much more. Yes, the occurrence analyser is doing a bit more work, but not much. If occ-anal cost 13% more I'd be surprised. But it'd have to cost 100% more to make the entire build take 13% more. Something is fishy. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13220#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13220: Performance regressions in testsuite from join points -------------------------------------+------------------------------------- Reporter: lukemaurer | Owner: lukemaurer Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Description changed by simonpj: @@ -7,0 +7,2 @@ + * `perf/compile/T9020`: see + [https://git.haskell.org/ghc.git/commitdiff/c2becee48aa73795cbf04905f3891f543...] New description: Need to look at some modest performance regressions for the main join- point patch (#12988, wiki [wiki:SequentCore]): * `perf/compiler/T9961`: `bytes_allocated` * `perf/compiler/T10356`: `bytes_allocated` * `perf/compiler/T1969`: `max_bytes_used` * `perf/compile/T9020`: see [https://git.haskell.org/ghc.git/commitdiff/c2becee48aa73795cbf04905f3891f543...] * `deriving/perf/T10858`: `bytes_allocated` -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13220#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13220: Performance regressions in testsuite from join points -------------------------------------+------------------------------------- Reporter: lukemaurer | Owner: lukemaurer Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonmar): What about looking at compiler allocations for nofib, to try to get a smaller example? I think this should be an 8.2 release blocker. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13220#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13220: Performance regressions in testsuite from join points -------------------------------------+------------------------------------- Reporter: lukemaurer | Owner: lukemaurer Type: bug | Status: new Priority: highest | Milestone: 8.2.1 Component: Compiler | Version: 8.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by simonmar): * priority: normal => highest * milestone: => 8.2.1 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13220#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13220: Performance regressions in testsuite from join points -------------------------------------+------------------------------------- Reporter: lukemaurer | Owner: lukemaurer Type: bug | Status: new Priority: highest | Milestone: 8.2.1 Component: Compiler | Version: 8.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by dobenour): Is the bootstrap time regression due to a bug I introduced in Phab:D2879 that was fixed by rwbarton in Phab:D3070? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13220#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13220: Performance regressions in testsuite from join points -------------------------------------+------------------------------------- Reporter: lukemaurer | Owner: lukemaurer Type: bug | Status: new Priority: highest | Milestone: 8.2.1 Component: Compiler | Version: 8.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by mpickering): No they are two different bad regressions. Reid has a spreadsheet which shows how the build time has changed over time. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13220#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13220: Performance regressions in testsuite from join points -------------------------------------+------------------------------------- Reporter: lukemaurer | Owner: lukemaurer Type: bug | Status: new Priority: highest | Milestone: 8.2.1 Component: Compiler | Version: 8.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): Reid, Ben, David: might one of you be able to dig into where the 13% build time increase is coming from? The compiler/perf tests suggest that it's not an across-the-board increase in compile time, so perhaps some things are big outliers? Without more insight we are stuck on this, and we can hardly release with such a big step change in build time -- at least not without understanding it. Thanks -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13220#comment:10 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13220: Performance regressions in testsuite from join points -------------------------------------+------------------------------------- Reporter: lukemaurer | Owner: lukemaurer Type: bug | Status: new Priority: highest | Milestone: 8.2.1 Component: Compiler | Version: 8.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by rwbarton): Yes, I'll do some preliminary investigations. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13220#comment:11 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

I'm actually not sure that the effect on compiler performance is all
#13220: Performance regressions in testsuite from join points -------------------------------------+------------------------------------- Reporter: lukemaurer | Owner: lukemaurer Type: bug | Status: new Priority: highest | Milestone: 8.2.1 Component: Compiler | Version: 8.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by lukemaurer): Replying to [comment:2 bgamari]: that modest, unfortunately. [[https://perf.haskell.org/ghc/#revision/8d5cf8bf584fd4849917c29d82dcf46ee75dd035|Gipeda]] shows that the build time for GHC itself went up by 13%. While some increase is to be expected, this is quite significant. Are there any obvious knobs that could be adjusted to reduce the amount of work that the join point analysis/simplifier do under `-O1`? The full analysis is great for `-O2`, but I do fear that users will be up in arms if the GHC regression is representative of user code. There isn't anything obvious—the occurrence analyser looks for join points as it goes, and that process should only be a small constant overhead. Nothing in the simplifier (or elsewhere that I can think of) has to work particularly hard to handle join points (or it shouldn't, anyway!).
Also, strangely enough, while the join points work eliminated all allocations from `fannkuch-redux`, the actual runtime of this test might have actually gone up by 8%. I say "might have" since runtime measurements are notoriously unstable, but it does look likely that this is a real regression: the test's runtime is quite long (4 seconds) and 8% is quite a large change. Even if this measurement were off by a factor of two we would want to understand it. Can you shed any light on this, Luke?
I haven't seen that jump before, and it's not obvious why it would happen. A quick diff of the dumps shows that the two versions end up with wildly different code, so it may be hard to pin any particular culprit. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13220#comment:12 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13220: Performance regressions in testsuite from join points -------------------------------------+------------------------------------- Reporter: lukemaurer | Owner: lukemaurer Type: bug | Status: new Priority: highest | Milestone: 8.2.1 Component: Compiler | Version: 8.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by rwbarton): The `fannkuch-redux` test is one that can be sensitive to mysterious code alignment effects; see #8279. However the reduction in allocations is so striking that it's worth checking whether something else could have gone wrong here. I'm pretty familiar with `fannkuch-redux`, so I'd be happy to investigate that one as well. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13220#comment:13 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13220: Performance regressions in testsuite from join points -------------------------------------+------------------------------------- Reporter: lukemaurer | Owner: lukemaurer Type: bug | Status: new Priority: highest | Milestone: 8.2.1 Component: Compiler | Version: 8.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by rwbarton): An initial report. I collected timing information from validation builds of the "Join points" commit and its parent commit, each built twice using ghc-8.0.2 and then a perf build of the same commit as the bootstrapping compiler (for reasons to become clear below). I have only looked at the timing results for a few modules so far. Here are the times for everyone's favorite module, `DynFlags.hs`. {{{ stage1 build: bootstrapped by mutator CPU seconds ghc-8.0.2 63s ghc-HEAD-before-join 63s ghc-HEAD-after-join 67s stage2 build: ghc-HEAD-before-join 150-154s ghc-HEAD-after-join 209-213s }}} Observations: * Performance of the perf build (used as the bootstrapping compiler, for the stage1 build of `DynFlags`) may have regressed by a few percent with join points. For now this is looking at the time for just one module, so I wouldn't take that number very seriously. * The time it took the stage1 compiler to perform the stage2 build of `DynFlags` increased by around 40% with the join point change. But, these numbers are both MUCH larger than the time it took to do the stage1 build of `DynFlags`. There are a few possible causes: - The stage1 compiler is built with only `-O`, unlike the perf build that produced the bootstrapping compiler, which built with `-O2`. - The stage1 compiler is built with `-DDEBUG`, so it contains extra sanity checks. - The stage2 compiler is built with `-dcore-lint`, so additional extra checks are being done during the stage2 build. - Something else about the build could be different between the two stages, like the values of CPP macros. I think this is unlikely to be the cause in this case. One of these points seems to have gotten much worse with join points. The good news is that, according to preliminary results, the performance of the perf (i.e., release) build of the compiler seems to be minimally affected. And if the performance regression is in `-dcore-lint`, for example, it may be easy to track down and eliminate. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13220#comment:14 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13220: Performance regressions in testsuite from join points -------------------------------------+------------------------------------- Reporter: lukemaurer | Owner: lukemaurer Type: bug | Status: new Priority: highest | Milestone: 8.2.1 Component: Compiler | Version: 8.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by rwbarton): Another data point: The total time it to finish the stage1 build (which does include some time spent not in ghc, like running configure scripts) increased from 1208s to 1272s when moving from `ghc-HEAD-before-join` to `ghc-HEAD-after-join` as the bootstrapping compiler, an increase of 5.3%. That's roughly consistent with the 6% increase in `DynFlags.hs` build time mentioned in the previous comment. Still not wonderful, but also not nearly as bad as the total build time regression would seem to indicate. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13220#comment:15 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13220: Performance regressions in testsuite from join points -------------------------------------+------------------------------------- Reporter: lukemaurer | Owner: lukemaurer Type: bug | Status: new Priority: highest | Milestone: 8.2.1 Component: Compiler | Version: 8.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): When GHC takes longer to compile the same file, it's sometime because it's generated bigger code. What does `-dshow-passes` show `ghc-before` and `ghc-after` compiling the same `DynFlags.hs`? `DynFlags` is a jolly big file. Are there smaller modules that show a big compile-time difference. `nofib-analyse` shows some per-module info I think. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13220#comment:16 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13220: Performance regressions in testsuite from join points -------------------------------------+------------------------------------- Reporter: lukemaurer | Owner: lukemaurer Type: bug | Status: new Priority: highest | Milestone: 8.2.1 Component: Compiler | Version: 8.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by rwbarton): There are two issues here. * Core Lint is expensive, and join points made it more expensive. For example `HsExpr` took twice as long to compile with `-O -dcore-lint` as with `-O`, before join points; and three times as long, after. I'm still waiting on some builds to finish, but I suspect most of the 13% build time regression is attributable to extra Core Lint checks added with join points. Just looking at the code, `markAllJoinsBad` looks somewhat expensive; and it is called on the right hand side of every binding, the body of every lambda, the scrutinee of every case, etc. This doesn't affect normal users, who won't use `-dcore-lint`. * Normal compiles (without `-dcore-lint`) are also slightly slower after join points. I'm working on getting better data on this and looking for outlier modules where the build time went up a lot. At Simon's suggestion, I looked at program sizes but they went up only a tiny bit on average (< 1%). The number of simplifier iterations also increased a tiny bit (< 1%). -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13220#comment:17 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13220: Performance regressions in testsuite from join points -------------------------------------+------------------------------------- Reporter: lukemaurer | Owner: lukemaurer Type: bug | Status: new Priority: highest | Milestone: 8.2.1 Component: Compiler | Version: 8.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by lukemaurer): Thanks—glad to hear it was (so far) a simple fix … -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13220#comment:18 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13220: Performance regressions in testsuite from join points -------------------------------------+------------------------------------- Reporter: lukemaurer | Owner: lukemaurer Type: bug | Status: new Priority: highest | Milestone: 8.2.1 Component: Compiler | Version: 8.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj):
markAllJoinsBad looks somewhat expensive
You are right. It's ridiculously expensive, and there is a vastly easier way to do the job: keep track of the valid join points, not the invalid ones! Patch coming -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13220#comment:19 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13220: Performance regressions in testsuite from join points
-------------------------------------+-------------------------------------
Reporter: lukemaurer | Owner: lukemaurer
Type: bug | Status: new
Priority: highest | Milestone: 8.2.1
Component: Compiler | Version: 8.1
Resolution: | Keywords: JoinPoints
Operating System: Unknown/Multiple | Architecture:
| Unknown/Multiple
Type of failure: None/Unknown | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Rev(s):
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by Simon Peyton Jones

#13220: Performance regressions in testsuite from join points -------------------------------------+------------------------------------- Reporter: lukemaurer | Owner: lukemaurer Type: bug | Status: new Priority: highest | Milestone: 8.2.1 Component: Compiler | Version: 8.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): Reid, does this patch help? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13220#comment:21 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13220: Performance regressions in testsuite from join points -------------------------------------+------------------------------------- Reporter: lukemaurer | Owner: lukemaurer Type: bug | Status: new Priority: highest | Milestone: 8.2.1 Component: Compiler | Version: 8.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): Status on this: Reid/David are going to look into why so much time is going into Lint. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13220#comment:22 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13220: Performance regressions in testsuite from join points -------------------------------------+------------------------------------- Reporter: lukemaurer | Owner: lukemaurer Type: bug | Status: new Priority: highest | Milestone: 8.2.1 Component: Compiler | Version: 8.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj):
Status on this: Reid/David are going to look into why so much time is going into Lint.
Any progress on this? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13220#comment:23 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13220: Performance regressions in testsuite from join points -------------------------------------+------------------------------------- Reporter: lukemaurer | Owner: lukemaurer Type: bug | Status: new Priority: highest | Milestone: 8.2.1 Component: Compiler | Version: 8.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): [[https://perf.haskell.org/ghc/#revision/6bab649bde653f13c15eba30d5007bef4a9a9d3a|Gipeda]] suggests that the patch didn't have much of an effect. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13220#comment:24 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13220: Performance regressions in testsuite from join points -------------------------------------+------------------------------------- Reporter: lukemaurer | Owner: lukemaurer Type: bug | Status: new Priority: highest | Milestone: 8.2.1 Component: Compiler | Version: 8.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by rwbarton): I now believe the culprit is this added line in originally `lintCoreVar`, now `lintVarOcc`: {{{#!diff + ; ensureEqTys ty ty' $ mkBndrOccTypeMismatchMsg var' var ty' ty }}} Reasoning: `perf` showed that when running with `-dcore-lint`, the single function that takes the most time after the garbage collector is `nonDetCmpTypeX_go`. Before the join points patch, it took 6.5% of the total time. After, it took 9.1% of the total time. And there seems to be nothing that corresponds to this `ensureEqTys` call before the join points patch. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13220#comment:25 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13220: Performance regressions in testsuite from join points -------------------------------------+------------------------------------- Reporter: lukemaurer | Owner: lukemaurer Type: bug | Status: new Priority: highest | Milestone: 8.2.1 Component: Compiler | Version: 8.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by dfeuer): * cc: dfeuer (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13220#comment:26 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13220: Performance regressions in testsuite from join points -------------------------------------+------------------------------------- Reporter: lukemaurer | Owner: lukemaurer Type: bug | Status: new Priority: highest | Milestone: 8.2.1 Component: Compiler | Version: 8.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by rwbarton): Hmm, removing that test (if it's even okay) also didn't have as much effect as I hoped. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13220#comment:27 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13220: Performance regressions in testsuite from join points -------------------------------------+------------------------------------- Reporter: lukemaurer | Owner: lukemaurer Type: bug | Status: new Priority: highest | Milestone: 8.2.1 Component: Compiler | Version: 8.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by rwbarton): Okay, found the REAL culprit. It's embarrassingly simple, and not at all what I had been looking for. Running `ghc` with `-dcore-lint -v` and grepping for `Core Linted result` revealed the issue, this new line in `simplifyPgmIO`: {{{#!diff Err.dumpIfSet_dyn dflags Opt_D_dump_occur_anal "Occurrence analysis" (pprCoreBindings tagged_binds); + lintPassResult hsc_env CoreOccurAnal tagged_binds; -- Get any new rules, and extend the rule base -- See Note [Overall plumbing for rules] in Rules.hs }}} We now run Core Lint twice per simplifier iteration, which in my sample program amounted to running Core Lint 50% more times overall. That's rather expensive. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13220#comment:28 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13220: Performance regressions in testsuite from join points -------------------------------------+------------------------------------- Reporter: lukemaurer | Owner: lukemaurer Type: bug | Status: new Priority: highest | Milestone: 8.2.1 Component: Compiler | Version: 8.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by simonpj): * cc: lukemaurer (added) Comment: Ha! That's funny. I don't know why Luke added that, but I suggest we just remove it. Occurrence analysis doesn't really add anything new that Lint can catch, I think. Or add yet another flag. But taking a big slow- down for this just doesn't seem worth it. Luke do you agree? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13220#comment:29 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13220: Performance regressions in testsuite from join points -------------------------------------+------------------------------------- Reporter: lukemaurer | Owner: lukemaurer Type: bug | Status: new Priority: highest | Milestone: 8.2.1 Component: Compiler | Version: 8.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): BTW, do you withdraw comment:25? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13220#comment:30 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13220: Performance regressions in testsuite from join points -------------------------------------+------------------------------------- Reporter: lukemaurer | Owner: lukemaurer Type: bug | Status: new Priority: highest | Milestone: 8.2.1 Component: Compiler | Version: 8.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): And does this address Joachim's comment on [https://phabricator.haskell.org/rGHC2effe18ab51d]? {{{ This might be interesting for @lukemaurer: According to https://perf.haskell.org/ghc/#revision/2effe18ab51d66474724d38b20e49cc1b8738... this patch undid the effect of join points on fannkuch-redux (i..e allocation goes up by 1318762.9%, but runtime goes down by 5%). Was this ever cleared up? Also, integer runtime goes up by 13%, and hidden by 6%, and code size generally by around 4%. Expected? }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13220#comment:31 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13220: Performance regressions in testsuite from join points -------------------------------------+------------------------------------- Reporter: lukemaurer | Owner: lukemaurer Type: bug | Status: new Priority: highest | Milestone: 8.2.1 Component: Compiler | Version: 8.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by rwbarton): Replying to [comment:30 simonpj]:
BTW, do you withdraw comment:25?
Well, the addition of that `ensureEqTys` did contribute a couple percent to the total compilation time (with `-dcore-lint`) in my test. If it's useful we can keep it; I don't understand what kind of substitution `le_subst` Core Lint is carrying around, so it's unclear to me whether this is actually testing something about the program being compiled or testing the consistency of Core Lint itself. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13220#comment:32 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13220: Performance regressions in testsuite from join points -------------------------------------+------------------------------------- Reporter: lukemaurer | Owner: lukemaurer Type: bug | Status: new Priority: highest | Milestone: 8.2.1 Component: Compiler | Version: 8.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): OK, fine. Reid, will you remove the offending Lint call, and pick up the perf benefits (I hope)? That may still leave comment:31 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13220#comment:33 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13220: Performance regressions in testsuite from join points -------------------------------------+------------------------------------- Reporter: lukemaurer | Owner: lukemaurer Type: bug | Status: new Priority: highest | Milestone: 8.2.1 Component: Compiler | Version: 8.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Phab:3391 Wiki Page: | -------------------------------------+------------------------------------- Changes (by rwbarton): * differential: => Phab:3391 Comment: Phab:3391 is for removing the added Core Lint pass after occurrence analysis. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13220#comment:34 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13220: Performance regressions in testsuite from join points -------------------------------------+------------------------------------- Reporter: lukemaurer | Owner: lukemaurer Type: bug | Status: new Priority: highest | Milestone: 8.2.1 Component: Compiler | Version: 8.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Phab:3391 Wiki Page: | -------------------------------------+------------------------------------- Old description:
Need to look at some modest performance regressions for the main join- point patch (#12988, wiki [wiki:SequentCore]):
* `perf/compiler/T9961`: `bytes_allocated` * `perf/compiler/T10356`: `bytes_allocated` * `perf/compiler/T1969`: `max_bytes_used` * `perf/compile/T9020`: see [https://git.haskell.org/ghc.git/commitdiff/c2becee48aa73795cbf04905f3891f543...] * `deriving/perf/T10858`: `bytes_allocated`
New description: Need to look at some modest performance regressions for the main join- point patch (#12988, wiki [wiki:SequentCore]): * `perf/compiler/T9961`: `bytes_allocated` * `perf/compiler/T13056`: `bytes_allocated` * `perf/compiler/T1969`: `max_bytes_used` * `perf/compile/T9020`: see [https://git.haskell.org/ghc.git/commitdiff/c2becee48aa73795cbf04905f3891f543...] * ~~`deriving/perf/T10858`: `bytes_allocated`~~ (was due to `-dcore-lint`) -- Comment (by rwbarton): One of the performance test regressions was due to Core Lint running more and is fixed by the above patch. The others remain to be investigated. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13220#comment:35 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13220: Performance regressions in testsuite from join points
-------------------------------------+-------------------------------------
Reporter: lukemaurer | Owner: lukemaurer
Type: bug | Status: new
Priority: highest | Milestone: 8.2.1
Component: Compiler | Version: 8.1
Resolution: | Keywords: JoinPoints
Operating System: Unknown/Multiple | Architecture:
| Unknown/Multiple
Type of failure: None/Unknown | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Rev(s): Phab:3391
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by Ben Gamari

#13220: Performance regressions in testsuite from join points -------------------------------------+------------------------------------- Reporter: lukemaurer | Owner: lukemaurer Type: bug | Status: closed Priority: highest | Milestone: 8.2.1 Component: Compiler | Version: 8.1 Resolution: fixed | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Phab:3391 Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * status: new => closed * resolution: => fixed Comment: Merged to `ghc-8.2` as 52d14ba3a0bafafee5620091a3c22717a883a2d7. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13220#comment:37 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13220: Performance regressions in testsuite from join points -------------------------------------+------------------------------------- Reporter: lukemaurer | Owner: (none) Type: bug | Status: new Priority: highest | Milestone: 8.2.1 Component: Compiler | Version: 8.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Phab:3391 Wiki Page: | -------------------------------------+------------------------------------- Changes (by simonpj): * owner: lukemaurer => (none) * status: closed => new * resolution: fixed => Comment: comment:31 not yet attended to, so re-opening. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13220#comment:38 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13220: Performance regressions in testsuite from join points -------------------------------------+------------------------------------- Reporter: lukemaurer | Owner: (none) Type: bug | Status: new Priority: highest | Milestone: 8.2.1 Component: Compiler | Version: 8.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Phab:3391 Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): Ahh, yes, fair point. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13220#comment:39 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13220: Performance regressions in testsuite from join points -------------------------------------+------------------------------------- Reporter: lukemaurer | Owner: rwbarton Type: bug | Status: new Priority: highest | Milestone: 8.2.1 Component: Compiler | Version: 8.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Phab:3391 Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * owner: (none) => rwbarton Comment: Reid, do you suppose you could have a look at the remaining regressions when you get a chance? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13220#comment:40 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13220: Performance regressions in testsuite from join points -------------------------------------+------------------------------------- Reporter: lukemaurer | Owner: rwbarton Type: bug | Status: new Priority: normal | Milestone: 8.2.1 Component: Compiler | Version: 8.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Phab:3391 Wiki Page: | -------------------------------------+------------------------------------- Changes (by rwbarton): * priority: highest => normal Comment: The other regressions are small (~5%) and the join points commit is so complicated and so long ago that I doubt looking at these changes is the easiest way to gain a few percent on a few benchmarks. Furthermore the issue is clouded by indirect effects of the join points change that result from running a stage2 compiler. For example parsing and typechecking allocates slightly less after join points, but these gains are roughly canceled out by increased allocations during desugaring. Since these phases were presumably not changed by the join points commit, this changes must be due to different code generation with join points. These changes are probably more interesting than slight changes in perf tests. In all three `perf/compiler` tests where allocations changed, there were no significant changes to Core program size. It looks like the simplifier is just doing more work, which is no big surprise. I looked at T9020 (the simplest test case) with stage1 compilers built with ticky, but the code changes are extensive enough that I couldn't draw any conclusion from the results (many changes, new functions added, and old functions removed). -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13220#comment:41 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13220: Performance regressions in testsuite from join points -------------------------------------+------------------------------------- Reporter: lukemaurer | Owner: rwbarton Type: bug | Status: new Priority: normal | Milestone: 8.2.1 Component: Compiler | Version: 8.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Phab:3391 Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): Fair enough. Shall we close this, then? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13220#comment:42 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13220: Performance regressions in testsuite from join points -------------------------------------+------------------------------------- Reporter: lukemaurer | Owner: rwbarton Type: bug | Status: closed Priority: normal | Milestone: 8.2.1 Component: Compiler | Version: 8.1 Resolution: fixed | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Phab:3391 Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * status: new => closed * resolution: => fixed -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13220#comment:43 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC