RE: Understanding core2core optimisation pipeline

24 Jul 2014

      | My plan is to gather up
| the answers on a wiki page.

Excellent -- please do that!  My replies are below

Simon

| This mail basically asks just one question: what is the order of
| optimizations pefromed on Core?

It's entirely defined by 
  SimplCore.getCoreToDo :: DynFlags -> [CoreToDo]
The code there should be reasonably self-explanatory.  The type signature is very descriptive.

| 1. What is the difference between a "simplifier iteration" and
| "simplifier phase"?

Roughtly, a complete run of the simplifier means "run the simplifier repeatedly until nothing further happens".  The iterations are the successive iterations of this loop. Currently there's a (rather arbitrary) limit of four such iterations before we give up and declare victory.

| 1a. What is the default maximum iterations count? User documentation does
| not specify that.

Four

| 2. How can I observe the effects of `-ddump-simpl-phases`. I tried
| compiling several different
| programs and this flag seems to have no effect (ie. nothing gets
| printed).

I had to read the source code.  I think you say
	"-ddump-simpl-phases=A,B,C"
to dump the output of phases called A,B,C. But no, it seems that it only affects output of simplifier statistics (see simplifyPgmIO).  I have never used this flag.  Maybe it can go.  Looks strange to me.

| 3. Cardinality anlaysis and inlining: cardinality analysis can determine
| that a let binding is
| used exactly once. Can the inliner re-use this information from the
| cardinality analysis or does
| it recompute it per [3], section 3.1?

Cardinality analysis determines that something is *demanded* once.  The occurrence analyser determines when it *occurs* once.  For example
	if .. then x else x+1
x occurs twice, but is demanded once.  So they are different.  The inliner uses occurrence information.

| 4a. The first phase is "Desugar (after optimization)". What optimizations
| are performed during
| desugaring?

Just a few basic ones; see CoreSubst.simpleOptPgm.  It implements the "Very Simple Optimiser" which is only a page or two of code.  Reading it and writing a Note that enumerates what optimisations it does would be a Good Thing.

| 4b. I'm not sure whether I'm looking at a single iteration of core2core
| transformation or at
| multiple ones. Some passes are performed several times (Float out, Float
| inwards), which suggests
| that there might be many iterations here. On the other hand simplifier
| phases are decreasing
| towards 0, which looks as if it was one core2core iteration. My
| assumption here is that every
| time a new core2core iteration starts the simplifier phases are counted
| anew from 2 towards 0. Is
| that correct?

No. getCoreToDo produces a list of CoreToDos.  Each specifies a stage in the pipeline. One such stage is a run of the simplifier.  Such a run has a "phase" number, which is set in getCoreToDo.  This phase number is used (only) to control INLINE pragmas and RULES (see extensive documentation in the user manual e.g http://www.haskell.org/ghc/docs/latest/html/users_guide/pragmas.html#phase-c...)

| 4c. Why are there several 0 phases of the Simplifier? I find it
| confusing.

We need to run the simplifier several times to propagate the effects of (say) strictness analysis or let-floating.  But by that stage the need for staging RULES and INLINE pragmas is over.

Yes. Plug-ins do precisely that, by manipulating the [CoreToDo] pipeline list.

| 4e. Are there more phases that could appear here, ie. they were ommited
| with -O?

Could appear where?

| 4f. "Levels added" pass before the "Float out" pass: my guess is that
| this is preparation for the
| full laziness transform. So, is full laziness performed as part of "Float
| out" pass?

The full laziness pass is simply a combination of 
	set-levels (an analysis)
followed by
	float out (which transforms the program)

Many passes produce output that can readily be simplified.  Rather than require them to perform those simplifications, we delegate it to the simplifier.

| 5. What optimizations *exactly* are performed by the Simplifier? I assume
| that most of what's
| described in chapter 3 of [2]: beta reduction, let elimination, case
| elimination, case floating,
| constant folding and eta expansion. I'm not sure about floating let
| outwards and inwards - [1],
| pg. 7, says these are in a pass separate from the simplifier. `-dverbose-
| core2core` seems to
| confirm that since it reveals separate "Float out" and "Float inwards"
| passes.

I don't have an exhaustive list, I'm afraid.  The general rule is: if it can be done with local knowledge, the simplifier should do it.  Making a full list would be another good exercise.  If you start I could add more.  Examples:
- constant foldings
- applying RULES
- inlining (a really important one)
- case of case
- case of known constructor
- eta expansion and eta reduction
- combining adjacent casts
- pushing a cast out of the way of an application
    e.g.  (f |> k) a  ==>  f (a |> k1) |> k2
          for suitable k1, k2
and there are probably a lot more.  Look for SimplifierTicks; every time the simplifier does something significant it bumps a tick count.  (or should do so.)

| 6. [4], pg. 31, mentions the Deforestation optimisation. Is everything
| described in
| that "Deforestation" section subsumed by cardinality analysis ([5], end
| of section 2.1 and
| section 7.1)? If not then when is deforestation performed?

Deforestation is simply the application of rewrite RULES; that's done by the simplifier.

I'm not actually certain that cardinality analysis *is* run twice in HEAD.  Maybe it was only in the version for the paper.  Ilya can tell us

| 8. How does the rules rewriting fit into the picture? 

It's done by the simplifier

RE: Understanding core2core optimisation pipeline

Simon Peyton Jones