Understanding core2core optimisation pipeline

24 Jul 2014

      Devs,

I'm trying to understand how the core2core pipeline works. Sadly, we don't have a wiki page about 
this so the "only" source of information are the papers and the source code. Papers give pretty 
much detail about each transformation in separate but none of the papers gives a comprehensive 
and up-to-date overview of how the whole pipeline is structured. My questions are based on 
reading the user documentation and the following papers:

[1] - "The Glasgow Haskell Compiler" from The Architecture of Open Source Application, vol. 2
[2] - "Compilation by Transformation in Non-Strict Functional Languages", PhD by Santos
[3] - "Secrets of the Glasgow Haskell Compiler inliner"
[4] - "A transformation-based optimiser for Haskell"
[5] - "Modular, Higher-Order Cardinality Analysis in Theory and Practice"
[6] - "Let-floatig: moving bindings to give faster programs"
[7] - "Playing by the Rules: Rewriting as a practical optimisation technique in GHC"

I know there are several papers missing from this list, eg. "Constructed Product Result Analysis 
for Haskell" or "Call-pattern specialisation for Haskell programs". The reason is that these 
optimisations are beyond the scope of what I'm doing at the moment (or so I believe).

This mail basically asks just one question: what is the order of optimizations pefromed on Core? 
Since this question is pretty big and general I've separated it into smaller questions that arose 
from reading the above papers, documentation, and experimenting with GHC.

Now the detailed questions:

1. What is the difference between a "simplifier iteration" and "simplifier phase"? Section 
7.20.6.5 of the user guide mentions phases but I believe that iterations are not explained 
anywhere. My best guess, expressed in pseudo-code, is this (sorry about the imperative style):

foreach (i in iterations) {
  // some optimisations here?
  foreach (p in phases) {
    //...optimisations here
  }
  // some optimisations here?
}

1a. What is the default maximum iterations count? User documentation does not specify that.

2. How can I observe the effects of `-ddump-simpl-phases`. I tried compiling several different 
programs and this flag seems to have no effect (ie. nothing gets printed).

3. Cardinality anlaysis and inlining: cardinality analysis can determine that a let binding is 
used exactly once. Can the inliner re-use this information from the cardinality analysis or does 
it recompute it per [3], section 3.1?

4. I've compiled a sample program using `-dverbose-core2core` and got the following phases:
  - Desugar (after optimization)
  - Simplifier (Phase = InitialPhase [Gentle])
  - Specialise
  - Levels added
  - Float out
  - Float inwards
  - Simplifier (Phase = 2 [main])
  - Simplifier (Phase = 1 [main])
  - Simplifier (Phase = 0 [main])
  - Demand analysis
  - Worker Wrapper binds
  - Simplifier (Phase = 0 [post-worker-wrapper])
  - Levels added
  - Float out
  - Common sub-expression
  - Float inwards
  - Simplifier (Phase = 0 [final])
  - Tidy Core
  - CorePrep

This raises lots of questions:

4a. The first phase is "Desugar (after optimization)". What optimizations are performed during 
desugaring?

4b. I'm not sure whether I'm looking at a single iteration of core2core transformation or at 
multiple ones. Some passes are performed several times (Float out, Float inwards), which suggests 
that there might be many iterations here. On the other hand simplifier phases are decreasing 
towards 0, which looks as if it was one core2core iteration. My assumption here is that every 
time a new core2core iteration starts the simplifier phases are counted anew from 2 towards 0. Is 
that correct?

4c. Why are there several 0 phases of the Simplifier? I find it confusing.

4d. I understand that some passes can be enabled or disabled using command-line options. Can the 
decission to run some passes be made dynamically by the compiler (eg. to run extra simplifier 
passes)?

4e. Are there more phases that could appear here, ie. they were ommited with -O?

4f. "Levels added" pass before the "Float out" pass: my guess is that this is preparation for the 
full laziness transform. So, is full laziness performed as part of "Float out" pass?

A general note is that I am confused by many Simplifier passes being interleaved with other 
passes. I expected that simplifier phases will grouped into a single pass, as speculated in 
question 1.

5. What optimizations *exactly* are performed by the Simplifier? I assume that most of what's 
described in chapter 3 of [2]: beta reduction, let elimination, case elimination, case floating, 
constant folding and eta expansion. I'm not sure about floating let outwards and inwards - [1], 
pg. 7, says these are in a pass separate from the simplifier. `-dverbose-core2core` seems to 
confirm that since it reveals separate "Float out" and "Float inwards" passes.

6. [4], pg. 31, mentions the Deforestation optimisation. Is everything described in 
that "Deforestation" section subsumed by cardinality analysis ([5], end of section 2.1 and 
section 7.1)? If not then when is deforestation performed?

7. [5], section 6.1 says: "We run the analysis twice, once in the middle of the optimisation 
pipeline and once near the end". When exactly in the middle of the pipeline? Between which 
passes? This does not show up with `-dverbose-core2core` (or at least it is not explicitly 
named).

8. How does the rules rewriting fit into the picture? Section 7.20.6.5 of the User Guide and 
section 4.1 of [7] explain the interaction between rules and inlining and my guess is that both 
are performed by the Simplifier. Again the "simplifier phases/iterations" distinction puzzles me 
as to what exactly is happening when. Within a single phase is the inlining happening before 
rewriting or vice versa?

I know that all of the above questions can be answered by looking at the source code for 
sufficiently long. This is actually what I'm planning to do next but if anyone could help me by 
answering some of these questions this would certainly save me some time. My plan is to gather up 
the answers on a wiki page.

Janek

Jan Stolarek

Simon Peyton Jones

Jan Stolarek

tags

participants (2)