
Hi all, Using Ben's timing patch [^1], Cabal, and a Haskell program to parse generated logs [^2], I generated some tables that show compile times of modules in hxt, haskell-src-exts, lens, and all of their dependencies: https://gist.githubusercontent.com/osa1/fd82335181a584679c9f3200b7b0a8a5/raw... Some notes: - Modules and passes in modules are sorted. - At the end of the file you can see the cumulative numbers for the passes. In fact, that's a small table so I'm just going to paste it here: ======================Total====================== CodeGen 41.32% Simplifier 34.83% Renamer/typechecker 12.22% Desugar 2.11% CorePrep 1.90% Demand analysis 1.44% CoreTidy 1.35% Called arity analysis 1.10% Float inwards 0.96% Common sub-expression 0.87% Parser 0.75% SpecConstr 0.57% Specialise 0.30% Worker Wrapper binds 0.17% Liberate case 0.12% ByteCodeGen 0.00% Simplify 0.00% I don't know how to make use of this yet, but I thought ghc-devs may find it useful. As a next thing I'm hoping to add some more `withTiming` calls. The analyze program can handle nested `withTiming` calls and renders passes as a tree (GHC HEAD doesn't have nested `withTiming`s so we don't see it in the file above), so we can benchmark things in more details. I also want to experiment a little bit with different `force` parameters to `withTiming`. If anyone has any other ideas I can also try those. --- [^1]: https://phabricator.haskell.org/D1959 [^2]: https://github.com/osa1/analyze-ghc-timings

Very cool! It would be nice to add build flags to the table (or at least optimization levels) as these probably differ across packages, and will certainly impact the numbers. I'd also be really interested to see a comparison of the timing data for -O0 and -O. I think the biggest impact for performance improvements will probably come from working on -O0, since that's what people will use for development. Thanks! Eric On Tue, Mar 29, 2016, at 14:00, Ömer Sinan Ağacan wrote:
Hi all,
Using Ben's timing patch [^1], Cabal, and a Haskell program to parse generated logs [^2], I generated some tables that show compile times of modules in hxt, haskell-src-exts, lens, and all of their dependencies:
https://gist.githubusercontent.com/osa1/fd82335181a584679c9f3200b7b0a8a5/raw...
Some notes:
- Modules and passes in modules are sorted.
- At the end of the file you can see the cumulative numbers for the passes. In fact, that's a small table so I'm just going to paste it here:
======================Total====================== CodeGen 41.32% Simplifier 34.83% Renamer/typechecker 12.22% Desugar 2.11% CorePrep 1.90% Demand analysis 1.44% CoreTidy 1.35% Called arity analysis 1.10% Float inwards 0.96% Common sub-expression 0.87% Parser 0.75% SpecConstr 0.57% Specialise 0.30% Worker Wrapper binds 0.17% Liberate case 0.12% ByteCodeGen 0.00% Simplify 0.00%
I don't know how to make use of this yet, but I thought ghc-devs may find it useful.
As a next thing I'm hoping to add some more `withTiming` calls. The analyze program can handle nested `withTiming` calls and renders passes as a tree (GHC HEAD doesn't have nested `withTiming`s so we don't see it in the file above), so we can benchmark things in more details. I also want to experiment a little bit with different `force` parameters to `withTiming`. If anyone has any other ideas I can also try those.
---
[^1]: https://phabricator.haskell.org/D1959 [^2]: https://github.com/osa1/analyze-ghc-timings _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Great! Thanks for y'all putting effort towards performance. It really is
crucial
for developer productivity.
In the particular case of haskell-src-exts, I found that removing many of
the
more complicated typeclasses from deriving (Data, Generics, etc) brought the
compilation time way down. IIRC it was pretty drastic, like 40 seconds vs 10
seconds (that could be just the timing for that one types module though, I
don't recall). Could be valuable to investigate exactly what's going on
there.
-Michael
On Tue, Mar 29, 2016 at 2:00 PM, Ömer Sinan Ağacan
Hi all,
Using Ben's timing patch [^1], Cabal, and a Haskell program to parse generated logs [^2], I generated some tables that show compile times of modules in hxt, haskell-src-exts, lens, and all of their dependencies:
https://gist.githubusercontent.com/osa1/fd82335181a584679c9f3200b7b0a8a5/raw...
Some notes:
- Modules and passes in modules are sorted.
- At the end of the file you can see the cumulative numbers for the passes. In fact, that's a small table so I'm just going to paste it here:
======================Total====================== CodeGen 41.32% Simplifier 34.83% Renamer/typechecker 12.22% Desugar 2.11% CorePrep 1.90% Demand analysis 1.44% CoreTidy 1.35% Called arity analysis 1.10% Float inwards 0.96% Common sub-expression 0.87% Parser 0.75% SpecConstr 0.57% Specialise 0.30% Worker Wrapper binds 0.17% Liberate case 0.12% ByteCodeGen 0.00% Simplify 0.00%
I don't know how to make use of this yet, but I thought ghc-devs may find it useful.
As a next thing I'm hoping to add some more `withTiming` calls. The analyze program can handle nested `withTiming` calls and renders passes as a tree (GHC HEAD doesn't have nested `withTiming`s so we don't see it in the file above), so we can benchmark things in more details. I also want to experiment a little bit with different `force` parameters to `withTiming`. If anyone has any other ideas I can also try those.
---
[^1]: https://phabricator.haskell.org/D1959 [^2]: https://github.com/osa1/analyze-ghc-timings _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

This ticket may be of interest: https://ghc.haskell.org/trac/ghc/ticket/9630 Deriving costs a lot and we just need someone to figure out what's going on. Edward Excerpts from Michael Sloan's message of 2016-03-29 15:23:50 -0700:
Great! Thanks for y'all putting effort towards performance. It really is crucial for developer productivity.
In the particular case of haskell-src-exts, I found that removing many of the more complicated typeclasses from deriving (Data, Generics, etc) brought the compilation time way down. IIRC it was pretty drastic, like 40 seconds vs 10 seconds (that could be just the timing for that one types module though, I don't recall). Could be valuable to investigate exactly what's going on there.
-Michael
On Tue, Mar 29, 2016 at 2:00 PM, Ömer Sinan Ağacan
wrote: Hi all,
Using Ben's timing patch [^1], Cabal, and a Haskell program to parse generated logs [^2], I generated some tables that show compile times of modules in hxt, haskell-src-exts, lens, and all of their dependencies:
https://gist.githubusercontent.com/osa1/fd82335181a584679c9f3200b7b0a8a5/raw...
Some notes:
- Modules and passes in modules are sorted.
- At the end of the file you can see the cumulative numbers for the passes. In fact, that's a small table so I'm just going to paste it here:
======================Total====================== CodeGen 41.32% Simplifier 34.83% Renamer/typechecker 12.22% Desugar 2.11% CorePrep 1.90% Demand analysis 1.44% CoreTidy 1.35% Called arity analysis 1.10% Float inwards 0.96% Common sub-expression 0.87% Parser 0.75% SpecConstr 0.57% Specialise 0.30% Worker Wrapper binds 0.17% Liberate case 0.12% ByteCodeGen 0.00% Simplify 0.00%
I don't know how to make use of this yet, but I thought ghc-devs may find it useful.
As a next thing I'm hoping to add some more `withTiming` calls. The analyze program can handle nested `withTiming` calls and renders passes as a tree (GHC HEAD doesn't have nested `withTiming`s so we don't see it in the file above), so we can benchmark things in more details. I also want to experiment a little bit with different `force` parameters to `withTiming`. If anyone has any other ideas I can also try those.
---
[^1]: https://phabricator.haskell.org/D1959 [^2]: https://github.com/osa1/analyze-ghc-timings _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Hi, Am Dienstag, den 29.03.2016, 15:29 -0700 schrieb Edward Z. Yang:
This ticket may be of interest:
https://ghc.haskell.org/trac/ghc/ticket/9630
Deriving costs a lot and we just need someone to figure out what's going on.
are you sure that deriving itself costs a lot? I would expect that deriving itself is reasonably cheap, but results in a lot of complicated (e.g. recursive) code that later analyzes have to go through... http://ghc.haskell.org/trac/ghc/ticket/9557 and tickets linked from there are also relevant. In fact, there is even a keyword for this meta-issue: https://ghc.haskell.org/trac/ghc/query?status=!closed&keywords=~deriving-perf Greetings, Joachim -- Joachim “nomeata” Breitner mail@joachim-breitner.de • https://www.joachim-breitner.de/ XMPP: nomeata@joachim-breitner.de • OpenPGP-Key: 0xF0FBF51F Debian Developer: nomeata@debian.org

| > Deriving costs a lot and we just need someone to figure out what's | > going on. | | are you sure that deriving itself costs a lot? I would expect that | deriving itself is reasonably cheap, but results in a lot of | complicated (e.g. recursive) code that later analyzes have to go | through... I think that's right. But somehow the code generated by 'deriving' tickles some non-linearity (I guess) in the rest of the compiler. It would be great to know what it was. Simon | -----Original Message----- | From: ghc-devs [mailto:ghc-devs-bounces@haskell.org] On Behalf Of | Joachim Breitner | Sent: 30 March 2016 08:11 | To: ghc-devs@haskell.org | Subject: Re: Initial compile time benchmarks | | Hi, | | Am Dienstag, den 29.03.2016, 15:29 -0700 schrieb Edward Z. Yang: | > This ticket may be of interest: | > | > https://ghc.haskell.org/trac/ghc/ticket/9630 | > | > Deriving costs a lot and we just need someone to figure out what's | > going on. | | are you sure that deriving itself costs a lot? I would expect that | deriving itself is reasonably cheap, but results in a lot of | complicated (e.g. recursive) code that later analyzes have to go | through... | | https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fghc.has | kell.org%2ftrac%2fghc%2fticket%2f9557&data=01%7c01%7csimonpj%40064d.mgd | .microsoft.com%7c209ce28b44644d1603d408d3586a7ea2%7c72f988bf86f141af91a | b2d7cd011db47%7c1&sdata=7NJ%2fd7z98HtnTvPqQvR5o%2bGr5U6RewWBNlRudH4xZvY | %3d and tickets linked from there are also relevant. In fact, there is | even a keyword for this | meta-issue: | https://ghc.haskell.org/trac/ghc/query?status=!closed&keywords=~derivin | g-perf | | Greetings, | Joachim | | | -- | Joachim “nomeata” Breitner | mail@joachim-breitner.de • | https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fwww.jo | achim- | breitner.de%2f&data=01%7c01%7csimonpj%40064d.mgd.microsoft.com%7c209ce2 | 8b44644d1603d408d3586a7ea2%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata | =bIiW5fIZ9Wnv2IWkDGjTU3PMuA6nmitgcHmBdY5q31Q%3d | XMPP: nomeata@joachim-breitner.de • OpenPGP-Key: 0xF0FBF51F | Debian Developer: nomeata@debian.org

Hi Ömer, thanks for the log. This is roughly GHC HEAD, right? I wanted to check for low-hanging fruit in “my” code, so I compiled Data.Reflection, where your tool reports 12% time spent in Call Arity. But if I run it with profiling, it says 3.5% of time is spent in that pass, and if I run your tool on the timing output produced by a profiled GHC, I get 8.3%, and on the output of a a development build, I get 3.8%. So what is your setup: How did you compile GHC, and with what options are you running it? Greetings, Joachim -- Joachim “nomeata” Breitner mail@joachim-breitner.de • https://www.joachim-breitner.de/ XMPP: nomeata@joachim-breitner.de • OpenPGP-Key: 0xF0FBF51F Debian Developer: nomeata@debian.org

Hi Joachim,
That's GHC HEAD at the time with -O1 for stage 2 + libs.
The way I generate the logs is:
- I create a cabal sandbox and run `cabal exec zsh` to set up the env
variables.
- I install packages in that shell.
- $ (cat .cabal-sandbox/logs/*.log | analyze-ghc-timings) > output
I don't know why there would be that much difference. Note that my setup was
not perfect, I run some other (although lightweight) processes on the side.
2016-03-31 4:49 GMT-04:00 Joachim Breitner
Hi Ömer,
thanks for the log. This is roughly GHC HEAD, right?
I wanted to check for low-hanging fruit in “my” code, so I compiled Data.Reflection, where your tool reports 12% time spent in Call Arity.
But if I run it with profiling, it says 3.5% of time is spent in that pass, and if I run your tool on the timing output produced by a profiled GHC, I get 8.3%, and on the output of a a development build, I get 3.8%.
So what is your setup: How did you compile GHC, and with what options are you running it?
Greetings, Joachim
-- Joachim “nomeata” Breitner mail@joachim-breitner.de • https://www.joachim-breitner.de/ XMPP: nomeata@joachim-breitner.de • OpenPGP-Key: 0xF0FBF51F Debian Developer: nomeata@debian.org
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Forgot to mention that I use `cabal install <package name>
--ghc-options="-v3" -v3` to install the packages.
2016-03-31 12:00 GMT-04:00 Ömer Sinan Ağacan
Hi Joachim,
That's GHC HEAD at the time with -O1 for stage 2 + libs.
The way I generate the logs is:
- I create a cabal sandbox and run `cabal exec zsh` to set up the env variables.
- I install packages in that shell.
- $ (cat .cabal-sandbox/logs/*.log | analyze-ghc-timings) > output
I don't know why there would be that much difference. Note that my setup was not perfect, I run some other (although lightweight) processes on the side.
2016-03-31 4:49 GMT-04:00 Joachim Breitner
: Hi Ömer,
thanks for the log. This is roughly GHC HEAD, right?
I wanted to check for low-hanging fruit in “my” code, so I compiled Data.Reflection, where your tool reports 12% time spent in Call Arity.
But if I run it with profiling, it says 3.5% of time is spent in that pass, and if I run your tool on the timing output produced by a profiled GHC, I get 8.3%, and on the output of a a development build, I get 3.8%.
So what is your setup: How did you compile GHC, and with what options are you running it?
Greetings, Joachim
-- Joachim “nomeata” Breitner mail@joachim-breitner.de • https://www.joachim-breitner.de/ XMPP: nomeata@joachim-breitner.de • OpenPGP-Key: 0xF0FBF51F Debian Developer: nomeata@debian.org
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

I just benchmarked another set of packages, this time using -O2 optimized libs
+ stage 2 (6ea42c7):
https://gist.githubusercontent.com/osa1/b483216953e3b02c0c79f9e4c8f8f931/raw...
For me the most surprising part is that CodeGen is sometimes taking 86% of the
time. It seems like we can start looking for improvements there. (I'll do that
myself but I'm too busy for a few more days)
Joachim, I think the difference is because of how thunk evaluation and cost
centers work. If you generate a thunk with cost-center "X", you attribute the
evaluation costs of that thunk to the cost-center "X". This is unlike how GHC
currently printing the timing information: We essentially measure the time of
thunk creation, not normalization of thunks. So the numbers you get from the
profiling output may be more correct.
We should try actually forcing the values in `withTiming` and see how it'll
effect the timings. I feel like some of the costs attributed to CodeGen step
is because of thunks we generate in previous steps. It's probably not possible
to completely evaluate the values (because of cycles), but we can still do a
better job probably.
One last thing is that profiling can prevent some optimizations and cause
different runtime behavior. Problems with instrumentation ...
2016-03-31 12:01 GMT-04:00 Ömer Sinan Ağacan
Forgot to mention that I use `cabal install <package name> --ghc-options="-v3" -v3` to install the packages.
2016-03-31 12:00 GMT-04:00 Ömer Sinan Ağacan
: Hi Joachim,
That's GHC HEAD at the time with -O1 for stage 2 + libs.
The way I generate the logs is:
- I create a cabal sandbox and run `cabal exec zsh` to set up the env variables.
- I install packages in that shell.
- $ (cat .cabal-sandbox/logs/*.log | analyze-ghc-timings) > output
I don't know why there would be that much difference. Note that my setup was not perfect, I run some other (although lightweight) processes on the side.
2016-03-31 4:49 GMT-04:00 Joachim Breitner
: Hi Ömer,
thanks for the log. This is roughly GHC HEAD, right?
I wanted to check for low-hanging fruit in “my” code, so I compiled Data.Reflection, where your tool reports 12% time spent in Call Arity.
But if I run it with profiling, it says 3.5% of time is spent in that pass, and if I run your tool on the timing output produced by a profiled GHC, I get 8.3%, and on the output of a a development build, I get 3.8%.
So what is your setup: How did you compile GHC, and with what options are you running it?
Greetings, Joachim
-- Joachim “nomeata” Breitner mail@joachim-breitner.de • https://www.joachim-breitner.de/ XMPP: nomeata@joachim-breitner.de • OpenPGP-Key: 0xF0FBF51F Debian Developer: nomeata@debian.org
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Ömer Sinan Ağacan
I just benchmarked another set of packages, this time using -O2 optimized libs + stage 2 (6ea42c7):
https://gist.githubusercontent.com/osa1/b483216953e3b02c0c79f9e4c8f8f931/raw...
For me the most surprising part is that CodeGen is sometimes taking 86% of the time. It seems like we can start looking for improvements there. (I'll do that myself but I'm too busy for a few more days)
Joachim, I think the difference is because of how thunk evaluation and cost centers work. If you generate a thunk with cost-center "X", you attribute the evaluation costs of that thunk to the cost-center "X". This is unlike how GHC currently printing the timing information: We essentially measure the time of thunk creation, not normalization of thunks. So the numbers you get from the profiling output may be more correct.
We should try actually forcing the values in `withTiming` and see how it'll effect the timings. I feel like some of the costs attributed to CodeGen step is because of thunks we generate in previous steps. It's probably not possible to completely evaluate the values (because of cycles), but we can still do a better job probably.
One last thing is that profiling can prevent some optimizations and cause different runtime behavior. Problems with instrumentation ...
Absolutely true; this is why I ultimately decided to hold off adding any forcing in the initial patch. I had hoped to have time to do more sanity checking of the numbers myself before the release, but sadly time hasn't allowed. I'm very happy that you are picking up this effort. Thanks! Cheers, - Ben
participants (7)
-
Ben Gamari
-
Edward Z. Yang
-
Eric Seidel
-
Joachim Breitner
-
Michael Sloan
-
Simon Peyton Jones
-
Ömer Sinan Ağacan