RE: D808 progress report

Kind equalities are the Big New Thing in 8.0. Let's just get it in and deal with the fallout.
After all, there is no reason for performance to be worse. For programs that 7.10 accepts, 8.0 should yield essentially the same coercions. They might need a bit of optimisation to squeeze them down but the result should be essentially identical. If not, let's investigate.
I could imagine the typechecker being a bit slower, but not a lot.
For T3738, compile the compiler before and after with -ticky and compare.
| In light of all this, I propose the following:
| - Scramble to fix all non-perf failures. I expect I can finish this by
| Wed evening.
| - Hope that one of you (or another dev) can take a look at T3738 and
| friends. That clearly needs to get fixed.
| - Adjust perf targets to get validation to work, clearly labeling the
| remaining problems as the fault of type=kind.
| - Commit to fixing #8095 in the next two weeks. But probably not by
| early next week, I'm afraid.
|
In short, yes.
Simon
| -----Original Message-----
| From: Richard Eisenberg [mailto:eir@cis.upenn.edu]
| Sent: 08 December 2015 03:35
| To: Simon Peyton Jones

On Dec 8, 2015, at 7:22 AM, Simon Peyton Jones
Kind equalities are the Big New Thing in 8.0. Let's just get it in and deal with the fallout.
After all, there is no reason for performance to be worse. For programs that 7.10 accepts, 8.0 should yield essentially the same coercions. They might need a bit of optimisation to squeeze them down but the result should be essentially identical. If not, let's investigate.
Yes. Modulo levity polymorphism, I agree. However, I just can't find a "smoking gun" in any of the profiling that might indicate what's causing the regressions. It seems to be that everything is just a bit more sluggish. Of course, what that suggests is that there is some low-level function, used a ton, which is slower, but I just haven't found it yet. Richard
I could imagine the typechecker being a bit slower, but not a lot.
For T3738, compile the compiler before and after with -ticky and compare.
| In light of all this, I propose the following: | - Scramble to fix all non-perf failures. I expect I can finish this by | Wed evening. | - Hope that one of you (or another dev) can take a look at T3738 and | friends. That clearly needs to get fixed. | - Adjust perf targets to get validation to work, clearly labeling the | remaining problems as the fault of type=kind. | - Commit to fixing #8095 in the next two weeks. But probably not by | early next week, I'm afraid. |
In short, yes.
Simon
| -----Original Message----- | From: Richard Eisenberg [mailto:eir@cis.upenn.edu] | Sent: 08 December 2015 03:35 | To: Simon Peyton Jones
; Ben Gamari ; Austin Seipp | Subject: D808 progress report | | Hi Simon, Ben, Austin, | | First, the bad news: | I'm a bit stalled on performance issues. When I sent my earlier email, | I was celebrating having gotten one test case from 319M of allocation | down to 182M via several seemingly general-purpose optimizations. But | this was with -fno-opt-coercion. Once I re-enabled coercion | optimization, that particular test case still fails | (pert/compiler/T5030), along with 22 others. This is bad. But many ~4 | hours of effort this evening I've made no substantive progress at all, | shaving off maybe 1% of allocation via a few tiny tweaks. Even | characterizing what's going wrong is proving difficult. I've only | analyzed a few of the failing tests, but each one is stubbornly | refusing to break, so I'm losing hope about the others. | | Then, the good news: | I think the idea posited in #8095 (not to bother building coercions | unless -dcore-lint is on) will solve all of these problems and more, | as long as users don't use -dcore-lint. With one exception that I've | noticed (see below), my performance failures aren't catastrophic: on | the performance tests, which tend to be pathological, my branch is | running 10-20% worse than HEAD. Not good, but not so bad that -dcore- | lint users can't cope. So, with #8095 addressed, I think we'll be OK. | And #8095 should be very straightforward and done in a few hours' | work. | | Finally, the ugly: | The exception to the non-catastrophic nature of the failures is this: | perf/should_run/T3738 fails with 3479.1% overage. (Yes, the percentage | is in the thousands.) Worse, this is at runtime, not in the compiler. | Yet the Core produced in my branch (as observed by -ddump-simpl) and | in HEAD appears identical. There are a few other should_run failures | that have me nervous, but my guess is that they're all from one | source. I'd love an offer of help to debug this. | | | In light of all this, I propose the following: | - Scramble to fix all non-perf failures. I expect I can finish this by | Wed evening. | - Hope that one of you (or another dev) can take a look at T3738 and | friends. That clearly needs to get fixed. | - Adjust perf targets to get validation to work, clearly labeling the | remaining problems as the fault of type=kind. | - Commit to fixing #8095 in the next two weeks. But probably not by | early next week, I'm afraid. | | What do we think? | | Thanks, | Richard

I've just updated the nokinds-dev branch with the latest. It should compile with bootstrapping from 7.8. Haddock should also compile, but only after doing this from utils/haddock:
git remote add goldfire git://github.com/goldfirere/haddock.git git fetch goldfire
For some reason, I couldn't push a wip/rae-nokinds branch to haddock.git at git.haskell.org.
I'm also still hitting the out-of-memory error when posting to Phab. :(
Nothing particularly interesting to report otherwise. I still have hope that I'll be able to validate cleanly (modulo performance) by Wed evening.
Thanks,
Richard
On Dec 8, 2015, at 9:35 AM, Richard Eisenberg
On Dec 8, 2015, at 7:22 AM, Simon Peyton Jones
wrote: Kind equalities are the Big New Thing in 8.0. Let's just get it in and deal with the fallout.
After all, there is no reason for performance to be worse. For programs that 7.10 accepts, 8.0 should yield essentially the same coercions. They might need a bit of optimisation to squeeze them down but the result should be essentially identical. If not, let's investigate.
Yes. Modulo levity polymorphism, I agree. However, I just can't find a "smoking gun" in any of the profiling that might indicate what's causing the regressions. It seems to be that everything is just a bit more sluggish. Of course, what that suggests is that there is some low-level function, used a ton, which is slower, but I just haven't found it yet.
Richard
I could imagine the typechecker being a bit slower, but not a lot.
For T3738, compile the compiler before and after with -ticky and compare.
| In light of all this, I propose the following: | - Scramble to fix all non-perf failures. I expect I can finish this by | Wed evening. | - Hope that one of you (or another dev) can take a look at T3738 and | friends. That clearly needs to get fixed. | - Adjust perf targets to get validation to work, clearly labeling the | remaining problems as the fault of type=kind. | - Commit to fixing #8095 in the next two weeks. But probably not by | early next week, I'm afraid. |
In short, yes.
Simon
| -----Original Message----- | From: Richard Eisenberg [mailto:eir@cis.upenn.edu] | Sent: 08 December 2015 03:35 | To: Simon Peyton Jones
; Ben Gamari ; Austin Seipp | Subject: D808 progress report | | Hi Simon, Ben, Austin, | | First, the bad news: | I'm a bit stalled on performance issues. When I sent my earlier email, | I was celebrating having gotten one test case from 319M of allocation | down to 182M via several seemingly general-purpose optimizations. But | this was with -fno-opt-coercion. Once I re-enabled coercion | optimization, that particular test case still fails | (pert/compiler/T5030), along with 22 others. This is bad. But many ~4 | hours of effort this evening I've made no substantive progress at all, | shaving off maybe 1% of allocation via a few tiny tweaks. Even | characterizing what's going wrong is proving difficult. I've only | analyzed a few of the failing tests, but each one is stubbornly | refusing to break, so I'm losing hope about the others. | | Then, the good news: | I think the idea posited in #8095 (not to bother building coercions | unless -dcore-lint is on) will solve all of these problems and more, | as long as users don't use -dcore-lint. With one exception that I've | noticed (see below), my performance failures aren't catastrophic: on | the performance tests, which tend to be pathological, my branch is | running 10-20% worse than HEAD. Not good, but not so bad that -dcore- | lint users can't cope. So, with #8095 addressed, I think we'll be OK. | And #8095 should be very straightforward and done in a few hours' | work. | | Finally, the ugly: | The exception to the non-catastrophic nature of the failures is this: | perf/should_run/T3738 fails with 3479.1% overage. (Yes, the percentage | is in the thousands.) Worse, this is at runtime, not in the compiler. | Yet the Core produced in my branch (as observed by -ddump-simpl) and | in HEAD appears identical. There are a few other should_run failures | that have me nervous, but my guess is that they're all from one | source. I'd love an offer of help to debug this. | | | In light of all this, I propose the following: | - Scramble to fix all non-perf failures. I expect I can finish this by | Wed evening. | - Hope that one of you (or another dev) can take a look at T3738 and | friends. That clearly needs to get fixed. | - Adjust perf targets to get validation to work, clearly labeling the | remaining problems as the fault of type=kind. | - Commit to fixing #8095 in the next two weeks. But probably not by | early next week, I'm afraid. | | What do we think? | | Thanks, | Richard _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Richard Eisenberg
I've just updated the nokinds-dev branch with the latest. It should compile with bootstrapping from 7.8.
Haddock should also compile, but only after doing this from utils/haddock:
git remote add goldfire git://github.com/goldfirere/haddock.git git fetch goldfire
For some reason, I couldn't push a wip/rae-nokinds branch to haddock.git at git.haskell.org.
I'm also still hitting the out-of-memory error when posting to Phab. :(
Hmm. Austin, what is the status of the Phabricator migration? Perhaps this has been fixed? It would be nice to know that this validates on Harbormaster.
Nothing particularly interesting to report otherwise. I still have hope that I'll be able to validate cleanly (modulo performance) by Wed evening.
I've opened a pull request against your branch fixing up some haddock issues that were preventing a standard `make all` build from finishing. Now since I have a build I'll be turning my attention to some of the performance issues. Cheers, - Ben

Ben Gamari
Now since I have a build I'll be turning my attention to some of the performance issues.
Here is a list of the major (>10% delta) performance regressions according to my testsuite run roughly in order of severity, mdo003: compile never completes mdo006: compile never completes T5336: alloc +3479% T9872a: alloc +78% T9872c: alloc +58% T3294: alloc +49% T9872c: alloc +48% T9661: max_bytes +29%, peak_allocated: +48% T6048: alloc +40% T5837: alloc +17% T9872b: alloc +15% T9872a: alloc +13% T9675: alloc +11% WarningWildcardInstantiations: alloc +13% I'll be starting with T5336. The full set of testsuite failures that I saw was, TEST="landmines tc141 mod71 T6018rnfail rnfail026 readFail003 T7848 T2431 Defer01 T7861 T10284 tcrun044 tcrun043 T9858c rule2 T11077 T8958 Roles13 mdo006 mdo003 T10267 T8031 T7276a T3177a T7064 PolyKinds02 T8566 T7404 T7230 T8616 T7224 T7328 T10503 T7278 T9222 T7438 T7524 T6021 T6123 T3330c T5439 T3330a SimpleFail6 T2627b SimpleFail1a T9171 T2664 T4179 SimpleFail14 T7786 numsparks001 T10403 WarningWildcardInstantiations WildcardInstantiations T10045 UnsatFun Defer02 T7873 T8353 ghci059 T6018ghcirnfail T8674 ghci047 T3208b PushedInAsGivens T9201 T3540 TcCoercibleFail T9109 T9999 tcfail013 T8806 T7734 tcfail014 T8030 T7696 T8603 tcfail113 tcfail032 tcfail057 T2994 tcfail099 T4875 tcfail200 tcfail201 tcfail078 tcfail158 tcfail090 T7368 tcfail058 tcfail196 tcfail197 T5570 FrozenErrorTests tcfail063 tcfail146 T11112 T7778 T7368a T7609 T7410 tcfail004 tcfail005 T5853 T7645 T10285 tcfail002 T7453 tcfail212 T3950 tcfail140 T8262 tcfail161 tcfail160 T9196 T5022 DepFail1 TypeSkolEscape PromotedClass BadTelescope2 RAE_T32a drvfail009 T7959 drvfail005 gadt-escape1 gadt13 gadt7 T9161-2 RAE_T32b KindEqualities2 KindEqualities Rae31 T5536 T3738 MethSharing haddock.Cabal haddock.compiler haddock.base T5030 T9675 T6048 T5631 T9872c T9872a T9872d T9961 T3064 T9872b T1969 T5321Fun T5837 T3294" Cheers, - Ben

Ben Gamari
Ben Gamari
writes: Now since I have a build I'll be turning my attention to some of the performance issues.
Here is a list of the major (>10% delta) performance regressions according to my testsuite run roughly in order of severity,
mdo003: compile never completes ...
My apologies, the test names in this list are entirely incorrect [1]. Let's try again, $ make test 2>&1 | tee log $ grep Deviation log | sort -t: -nk 2 Deviation T9872d(normal) bytes allocated: -21.8 % Deviation T1969(normal) bytes allocated: 7.2 % Deviation T9961(normal) bytes allocated: 7.3 % Deviation T3294(normal) bytes allocated: 8.5 % Deviation T5536(normal) bytes allocated: 8.9 % Deviation T5837(normal) bytes allocated: 11.1 % Deviation haddock.compiler(normal) bytes allocated: 11.6 % Deviation MethSharing(normal) bytes allocated: 12.5 % Deviation T6048(optasm) bytes allocated: 12.6 % Deviation haddock.Cabal(normal) bytes allocated: 13.4 % Deviation T5321Fun(normal) bytes allocated: 14.9 % Deviation haddock.base(normal) bytes allocated: 16.6 % Deviation T3064(normal) bytes allocated: 16.6 % Deviation T9675(optasm) peak_megabytes_allocated: 28.4 % Deviation T9675(optasm) max_bytes_used: 29.4 % Deviation T5030(normal) bytes allocated: 39.6 % Deviation T5631(normal) bytes allocated: 48.0 % Deviation T9872a(normal) bytes allocated: 48.5 % Deviation T9872b(normal) bytes allocated: 57.7 % Deviation T9872c(normal) bytes allocated: 78.1 % Deviation T3738(normal) bytes allocated: 3479.3 % Consequently I'm looking at T3738. Cheers, - Ben [1] note to self: don't succumb to the temptation to look at the "unexpected failure" line preceding a stat failure for the test name).

Hi Ben,
Great. Thanks. I'll be working this morning on non-performance errors, starting with utter failures (wrong exit codes). I've also added you as a committer on github.com/goldfirere/ghc.git. Feel free to push changes to nokinds-dev -- I'll take a look at them when they come across.
I'm also on Skype. Feel free to call if you've got any questions.
Thanks for the hand!
Richard
On Dec 9, 2015, at 8:56 AM, Ben Gamari
Ben Gamari
writes: Ben Gamari
writes: Now since I have a build I'll be turning my attention to some of the performance issues.
Here is a list of the major (>10% delta) performance regressions according to my testsuite run roughly in order of severity,
mdo003: compile never completes ...
My apologies, the test names in this list are entirely incorrect [1]. Let's try again,
$ make test 2>&1 | tee log $ grep Deviation log | sort -t: -nk 2 Deviation T9872d(normal) bytes allocated: -21.8 % Deviation T1969(normal) bytes allocated: 7.2 % Deviation T9961(normal) bytes allocated: 7.3 % Deviation T3294(normal) bytes allocated: 8.5 % Deviation T5536(normal) bytes allocated: 8.9 % Deviation T5837(normal) bytes allocated: 11.1 % Deviation haddock.compiler(normal) bytes allocated: 11.6 % Deviation MethSharing(normal) bytes allocated: 12.5 % Deviation T6048(optasm) bytes allocated: 12.6 % Deviation haddock.Cabal(normal) bytes allocated: 13.4 % Deviation T5321Fun(normal) bytes allocated: 14.9 % Deviation haddock.base(normal) bytes allocated: 16.6 % Deviation T3064(normal) bytes allocated: 16.6 % Deviation T9675(optasm) peak_megabytes_allocated: 28.4 % Deviation T9675(optasm) max_bytes_used: 29.4 % Deviation T5030(normal) bytes allocated: 39.6 % Deviation T5631(normal) bytes allocated: 48.0 % Deviation T9872a(normal) bytes allocated: 48.5 % Deviation T9872b(normal) bytes allocated: 57.7 % Deviation T9872c(normal) bytes allocated: 78.1 % Deviation T3738(normal) bytes allocated: 3479.3 %
Consequently I'm looking at T3738.
Cheers,
- Ben
[1] note to self: don't succumb to the temptation to look at the "unexpected failure" line preceding a stat failure for the test name).

But if we generate identical Core after desugaring there just aren't any downstream changes. Or are there?
| -----Original Message-----
| From: Richard Eisenberg [mailto:eir@cis.upenn.edu]
| Sent: 08 December 2015 14:35
| To: Simon Peyton Jones
participants (3)
-
Ben Gamari
-
Richard Eisenberg
-
Simon Peyton Jones