GitLab

Marge Bot pushed to branch wip/marge_bot_batch_merge_job at Glasgow Haskell Compiler / GHC

Commits:

e9623c94

by Simon Peyton Jones at 2026-03-31T21:01:01+01:00

Refactor eta-expansion in Prep

The Prep pass does eta-expansion but I found cases where it was
doing bad things.  So I refactored and simplified it quite a bit.

In the new design
* There is no distinction between `rhs` and `body`; in particular,
  lambdas can now appear anywhere, rather than just as the RHS of
  a let-binding.

* This change led to a significant simplification of Prep, and
  a more straightforward explanation of eta-expansion. See the new
     Note [Eta expansion]

* The consequences is that CoreToStg needs to handle naked lambdas.
  This is very easy; but it does need a unique supply, which forces
  some simple refactoring.  Having a unique supply to hand is probably
  a good thing anyway.

3a33983b

by Simon Peyton Jones at 2026-03-31T21:01:02+01:00

Clarify Note [Interesting dictionary arguments]

Ticket #26831 ended up concluding that the code for
GHC.Core.Opt.Specialise.interestingDict was good, but the
commments were a bit inadequate.

This commit improves the comments slightly.

690acec9

by Simon Peyton Jones at 2026-03-31T21:01:02+01:00

Make inlining a bit more eager for overloaded functions

If we have
   f d = ... (class-op d x y) ...
we should be eager to inline `f`, because that may change the
higher order call (class-op d x y) into a call to a statically
known function.

See the discussion on #26831.

Even though this does a bit /more/ inlining, compile times
decrease by an average of 0.4%.

Compile time changes:

 DsIncompleteRecSel3(normal)     431,786,104  -2.2%
    ManyAlternatives(normal)     670,883,768  -1.6%
    ManyConstructors(normal)   3,758,493,832  -2.6% GOOD
MultilineStringsPerf(normal)      29,900,576  -2.8%
            T14052Type(ghci)   1,047,600,848  -1.2%
              T17836(normal)     392,852,328  -5.2%
              T18478(normal)     442,785,768  -1.4%
             T21839c(normal)     341,536,992 -14.1% GOOD
               T3064(normal)     174,086,152  +5.3%  BAD
               T5631(normal)     506,867,800  +1.0%
      hard_hole_fits(normal)     209,530,736  -1.3%
 info_table_map_perf(normal)  19,523,093,184  -1.2%
          parsing001(normal)     377,810,528  -1.1%
          pmcOrPats(normal)       60,075,264  -0.5%

                                 geo. mean    -0.4%
                                 minimum     -14.1%
                                 maximum      +5.3%

Runtime changes
    haddock.Cabal(normal)   27,351,988,792  -0.7%
     haddock.base(normal)   26,997,212,560  -0.6%
 haddock.compiler(normal)  219,531,332,960  -1.0%

Metric Decrease:
    ManyConstructors
    T17949
    T21839c
    T13035
    TcPlugin_RewritePerf
    hard_hole_fits

Metric Increase:
    T3064

c7ac6d41
by Simon Jakobi at 2026-03-31T19:20:43-04:00
```
Add perf test for #13960

Closes #13960.
```

20 changed files:

compiler/GHC/Builtin/PrimOps.hs
compiler/GHC/Core/Opt/Arity.hs
compiler/GHC/Core/Opt/Specialise.hs
compiler/GHC/Core/Tidy.hs
compiler/GHC/Core/Unfold.hs
compiler/GHC/CoreToStg.hs
compiler/GHC/CoreToStg/Prep.hs
compiler/GHC/Driver/Main.hs
compiler/GHC/Stg/Lint.hs
compiler/GHC/Types/Id.hs
compiler/GHC/Types/Id/Info.hs
testsuite/tests/arityanal/should_compile/Arity01.stderr
testsuite/tests/arityanal/should_compile/Arity05.stderr
testsuite/tests/arityanal/should_compile/Arity08.stderr
testsuite/tests/arityanal/should_compile/Arity11.stderr
testsuite/tests/arityanal/should_compile/Arity14.stderr
+ testsuite/tests/perf/compiler/T13960.hs
testsuite/tests/perf/compiler/all.T
testsuite/tests/simplCore/should_compile/T15205.stderr
testsuite/tests/wasm/should_run/control-flow/LoadCmmGroup.hs

Changes:

compiler/GHC/Builtin/PrimOps.hs

@@ -807,16 +807,23 @@ the former has an additional type binder. Hmmm....
  Note [Eta expanding primops]
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+-
  STG requires that primop applications be saturated. This makes code generation
  significantly simpler since otherwise we would need to define a calling
  convention for curried applications that can accommodate representation
  polymorphism.
 -To ensure saturation, CorePrep eta expands all primop applications as
 -described in Note [Eta expansion of hasNoBinding things in CorePrep] in
 +To ensure saturation, CorePrep eta expands all primop applications
 +as described in Note [Eta expansion of unsaturated calls] in
  GHC.Core.Prep.
 +Side note: this decision is somewhat in flux: see comments with `hasNoBinding`.
 +The question is: do we generate a trivial wrapper for each primop
 +   (+#) x y = (+#) x y
 +and now we can call that wrapper unsaturated.  But in practice we
 +might never call it because in practice Prep eta-expands all partial
 +applications!
++
++
  Historical Note:
  For a short period around GHC 8.8 we rewrote unsaturated primop applications to

compiler/GHC/Core/Opt/Arity.hs

@@ -2551,9 +2551,6 @@ This reduces clutter, sometimes a lot. See Note [Do not eta-expand PAPs]
  in GHC.Core.Opt.Simplify.Utils, where we are careful not to eta-expand
  a PAP.  If eta-expanding is bad, then eta-reducing is good!
 -Also the code generator likes eta-reduced PAPs; see GHC.CoreToStg.Prep
 -Note [No eta reduction needed in rhsToBody].
+-
  But note that we don't want to eta-reduce
       \x y.  f <expensive> x y
  to

compiler/GHC/Core/Opt/Specialise.hs

@@ -3247,9 +3247,14 @@ case we can clearly specialise. But there are wrinkles:
  (ID6) The Main Plan says that it's worth specialising if the argument is an application
     of a dictionary contructor.  But what if the dictionary has no methods?  Then we
 -   gain nothing by specialising, unless the /superclasses/ are interesting.   A case
 -   in point is constraint tuples (% d1, .., dn %); a constraint N-tuple is a class
 -   with N superclasses and no methods.
 +   gain nothing by specialising, unless the /superclasses/ are interesting.
++
 +   So if there are no methods, we recursively call `interestingDict` on the
 +   superclasses.  Why recurse? If we have
 +         \d1 d2.  f (CTuple d1 d2)
 +   If `d1 and `d2` are uninteresting dictionaries, then so is (CTuple d1 d2).
 +   (Remember: a constraint tuple is just a class with N superclasses and no methods.)
 +   See discussion on #26831.
  (ID7) A unary (single-method) class is currently represented by (meth |> co).  We
     will unwrap the cast (see (ID5)) and then want to reply "yes" if the method

compiler/GHC/Core/Tidy.hs

@@ -165,6 +165,7 @@ computeCbvInfo fun_id rhs
                  map mkMark val_args
      cbv_bndr | any isMarkedCbv cbv_marks
 +               -- isMarkedCbv: see (CBV2) in Note [CBV Function Ids: overview]
               = cbv_marks `seqList` setIdCbvMarks fun_id cbv_marks
                 -- seqList: avoid retaining the original rhs
@@ -176,6 +177,7 @@ computeCbvInfo fun_id rhs
      -- We don't set CBV marks on functions which take unboxed tuples or sums as
      -- arguments.  Doing so would require us to compute the result of unarise
      -- here in order to properly determine argument positions at runtime.
 +    -- See (CBV1) in Note [CBV Function Ids: overview]
      --
      -- In practice this doesn't matter much. Most "interesting" functions will
      -- get a W/W split which will eliminate unboxed tuple arguments, and unboxed

compiler/GHC/Core/Unfold.hs

@@ -779,22 +779,28 @@ litSize _other = 0    -- Must match size of nullary constructors
  classOpSize :: UnfoldingOpts -> Class -> [Id] -> [CoreExpr] -> ExprSize
  -- See (IA1) in Note [Interesting arguments] in GHC.Core.Opt.Simplify.Utils
 -classOpSize opts cls top_args args
 -  | isUnaryClass cls
 -  = sizeZero   -- See (UCM4) in Note [Unary class magic] in GHC.Core.TyCon
 -  | otherwise
 -  = case args of
 -       []                -> sizeZero
 -       (arg1:other_args) -> SizeIs (size other_args) (arg_discount arg1) 0
 +classOpSize _opts _cls _top_args []
 +  = sizeZero   -- A non-applied classop
 +classOpSize opts cls top_args (dict_arg:other_val_args)
 +  = SizeIs size (arg_discount dict_arg) 0
    where
 -    size other_args = 20 + (10 * length other_args)
 +    size | isUnaryClass cls = 0    -- See (UCM4) in Note [Unary class magic] in GHC.Core.TyCon
 +         | otherwise        = 20 + (10 * length other_val_args)
      -- If the class op is scrutinising a lambda bound dictionary then
      -- give it a discount, to encourage the inlining of this function
 -    -- The actual discount is rather arbitrarily chosen
 -    arg_discount (Var dict) | dict `elem` top_args
 -                   = unitBag (dict, unfoldingDictDiscount opts)
 -    arg_discount _ = emptyBag
 +    arg_discount (Cast arg _co)                    = arg_discount arg
 +    arg_discount (Var dict) | dict `elem` top_args = unitBag (dict, dict_discount)
 +    arg_discount _                                 = emptyBag
++
 +    -- If we have (class-op d arg1 .. argn) then it's super-good to inline
 +    -- to expose `d`; not only can we do the dictionary selection
 +    -- (class-op d), but that will likely expose a lambda which we can then
 +    -- apply.  In that case (n > 0), we add `unfoldingFunAppDiscount`.
 +    -- See the discussion on #26831, esp "Delicate inlining".
 +    dict_discount
 +      | null other_val_args = unfoldingDictDiscount opts
 +      | otherwise           = unfoldingDictDiscount opts + unfoldingFunAppDiscount opts
  -- | The size of a function call
  callSize

compiler/GHC/CoreToStg.hs

@@ -39,6 +39,8 @@ import GHC.Types.Basic  ( Arity, TypeOrConstraint(..) )
  import GHC.Types.Literal
  import GHC.Types.ForeignCall
  import GHC.Types.IPE
 +import GHC.Types.Unique.Supply
 +import GHC.Types.Unique
  import GHC.Unit.Module
  import GHC.Platform        ( Platform )
@@ -49,297 +51,309 @@ import GHC.Utils.Outputable
  import GHC.Utils.Monad
  import GHC.Utils.Misc (HasDebugCallStack)
  import GHC.Utils.Panic
 +import GHC.Data.FastString
  import Control.Monad (ap)
 --- Note [Live vs free]
 --- ~~~~~~~~~~~~~~~~~~~
 ---
 --- The two are not the same. Liveness is an operational property rather
 --- than a semantic one. A variable is live at a particular execution
 --- point if it can be referred to directly again. In particular, a dead
 --- variable's stack slot (if it has one):
 ---
 ---           - should be stubbed to avoid space leaks, and
 ---           - may be reused for something else.
 ---
 --- There ought to be a better way to say this. Here are some examples:
 ---
 ---         let v = [q] \[x] -> e
 ---         in
 ---         ...v...  (but no q's)
 ---
 --- Just after the `in', v is live, but q is dead. If the whole of that
 --- let expression was enclosed in a case expression, thus:
 ---
 ---         case (let v = [q] \[x] -> e in ...v...) of
 ---                 alts[...q...]
 ---
 --- (ie `alts' mention `q'), then `q' is live even after the `in'; because
 --- we'll return later to the `alts' and need it.
 ---
 --- Let-no-escapes make this a bit more interesting:
 ---
 ---         let-no-escape v = [q] \ [x] -> e
 ---         in
 ---         ...v...
 ---
 --- Here, `q' is still live at the `in', because `v' is represented not by
 --- a closure but by the current stack state.  In other words, if `v' is
 --- live then so is `q'. Furthermore, if `e' mentions an enclosing
 --- let-no-escaped variable, then its free variables are also live if `v' is.
 +{- Note [Live vs free]
 +~~~~~~~~~~~~~~~~~~~~~~
 +The two are not the same. Liveness is an operational property rather
 +than a semantic one. A variable is live at a particular execution
 +point if it can be referred to directly again. In particular, a dead
 +variable's stack slot (if it has one):
 --- Note [What are these SRTs all about?]
 --- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 ---
 --- Consider the Core program,
 ---
 ---     fibs = go 1 1
 ---       where go a b = let c = a + c
 ---                      in c : go b c
 ---     add x = map (\y -> x*y) fibs
 ---
 --- In this case we have a CAF, 'fibs', which is quite large after evaluation and
 --- has only one possible user, 'add'. Consequently, we want to ensure that when
 --- all references to 'add' die we can garbage collect any bit of 'fibs' that we
 --- have evaluated.
 ---
 --- However, how do we know whether there are any references to 'fibs' still
 --- around? Afterall, the only reference to it is buried in the code generated
 --- for 'add'. The answer is that we record the CAFs referred to by a definition
 --- in its info table, namely a part of it known as the Static Reference Table
 --- (SRT).
 ---
 --- Since SRTs are so common, we use a special compact encoding for them in: we
 --- produce one table containing a list of CAFs in a module and then include a
 --- bitmap in each info table describing which entries of this table the closure
 --- references.
 ---
 --- See also: commentary/rts/storage/gc/CAFs on the GHC Wiki.
 +          - should be stubbed to avoid space leaks, and
 +          - may be reused for something else.
 --- Note [What is a non-escaping let]
 --- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 ---
 --- NB: Nowadays this is recognized by the occurrence analyser by turning a
 --- "non-escaping let" into a join point. The following is then an operational
 --- account of join points.
 ---
 --- Consider:
 ---
 ---     let x = fvs \ args -> e
 ---     in
 ---         if ... then x else
 ---            if ... then x else ...
 ---
 --- `x' is used twice (so we probably can't unfold it), but when it is
 --- entered, the stack is deeper than it was when the definition of `x'
 --- happened.  Specifically, if instead of allocating a closure for `x',
 --- we saved all `x's fvs on the stack, and remembered the stack depth at
 --- that moment, then whenever we enter `x' we can simply set the stack
 --- pointer(s) to these remembered (compile-time-fixed) values, and jump
 --- to the code for `x'.
 ---
 --- All of this is provided x is:
 ---   1. non-updatable;
 ---   2. guaranteed to be entered before the stack retreats -- ie x is not
 ---      buried in a heap-allocated closure, or passed as an argument to
 ---      something;
 ---   3. all the enters have exactly the right number of arguments,
 ---      no more no less;
 ---   4. all the enters are tail calls; that is, they return to the
 ---      caller enclosing the definition of `x'.
 ---
 --- Under these circumstances we say that `x' is non-escaping.
 ---
 --- An example of when (4) does not hold:
 ---
 ---     let x = ...
 ---     in case x of ...alts...
 ---
 --- Here, `x' is certainly entered only when the stack is deeper than when
 --- `x' is defined, but here it must return to ...alts... So we can't just
 --- adjust the stack down to `x''s recalled points, because that would lost
 --- alts' context.
 ---
 --- Things can get a little more complicated.  Consider:
 ---
 ---     let y = ...
 ---     in let x = fvs \ args -> ...y...
 ---     in ...x...
 ---
 --- Now, if `x' is used in a non-escaping way in ...x..., and `y' is used in a
 --- non-escaping way in ...y..., then `y' is non-escaping.
 ---
 --- `x' can even be recursive!  Eg:
 ---
 ---     letrec x = [y] \ [v] -> if v then x True else ...
 ---     in
 ---         ...(x b)...
 +There ought to be a better way to say this. Here are some examples:
 --- Note [Cost-centre initialization plan]
 --- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 ---
 --- Previously `coreToStg` was initializing cost-centre stack fields as `noCCS`,
 --- and the fields were then fixed by a separate pass `stgMassageForProfiling`.
 --- We now initialize these correctly. The initialization works like this:
 ---
 ---   - For non-top level bindings always use `currentCCS`.
 ---
 ---   - For top-level bindings, check if the binding is a CAF
 ---
 ---     - CAF:      If -fcaf-all is enabled, create a new CAF just for this CAF
 ---                 and use it. Note that these new cost centres need to be
 ---                 collected to be able to generate cost centre initialization
 ---                 code, so `coreToTopStgRhs` now returns `CollectedCCs`.
 ---
 ---                 If -fcaf-all is not enabled, use "all CAFs" cost centre.
 ---
 ---     - Non-CAF:  Top-level (static) data is not counted in heap profiles; nor
 ---                 do we set CCCS from it; so we just slam in
 ---                 dontCareCostCentre.
+-
 --- Note [Coercion tokens]
 --- ~~~~~~~~~~~~~~~~~~~~~~
 --- In coreToStgArgs, we drop type arguments completely, but we replace
 --- coercions with a special coercionToken# placeholder. Why? Consider:
 ---
 ---   f :: forall a. Int ~# Bool -> a
 ---   f = /\a. \(co :: Int ~# Bool) -> error "impossible"
 ---
 --- If we erased the coercion argument completely, we’d end up with just
 --- f = error "impossible", but then f `seq` () would be ⊥!
 ---
 --- This is an artificial example, but back in the day we *did* treat
 --- coercion lambdas like type lambdas, and we had bug reports as a
 --- result. So now we treat coercion lambdas like value lambdas, but we
 --- treat coercions themselves as zero-width arguments — coercionToken#
 --- has representation VoidRep — which gets the best of both worlds.
 ---
 --- (For the gory details, see also the (unpublished) paper, “Practical
 --- aspects of evidence-based compilation in System FC.”)
 +        let v = [q] \[x] -> e
 +        in
 +        ...v...  (but no q's)
 --- Note [Saturation of data constructors in STG]
 --- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 --- We guarantee that `StgConApp` is an exactly-saturated application of a data
 --- constructor worker.
 ---
 --- * If the data constructor is /under/-saturated we just fall through to build
 ---   a `StgApp`.  Remember, data constructor workers have a regular top-level definition
 ---   (injected by GHC.CoreToStg.Prep.mkDataConWorkers) so we can partially apply
 ---   that function.
 ---
 --- * If the data constructor is /over/-saturated, which can happen (see #23865) we again
 ---   fall through to `StgApp`.  That will fail horribly at runtime (by applying data
 ---   constructor to an argument) but it should be in dead code, and at least the compiler
 ---   itself won't crash.  (We could inject an error-thunk instead.)
 +Just after the `in', v is live, but q is dead. If the whole of that
 +let expression was enclosed in a case expression, thus:
++
 +        case (let v = [q] \[x] -> e in ...v...) of
 +                alts[...q...]
++
 +(ie `alts' mention `q'), then `q' is live even after the `in'; because
 +we'll return later to the `alts' and need it.
++
 +Let-no-escapes make this a bit more interesting:
++
 +        let-no-escape v = [q] \ [x] -> e
 +        in
 +        ...v...
++
 +Here, `q' is still live at the `in', because `v' is represented not by
 +a closure but by the current stack state.  In other words, if `v' is
 +live then so is `q'. Furthermore, if `e' mentions an enclosing
 +let-no-escaped variable, then its free variables are also live if `v' is.
++
 +Note [What are these SRTs all about?]
 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
++
 +Consider the Core program,
++
 +    fibs = go 1 1
 +      where go a b = let c = a + c
 +                     in c : go b c
 +    add x = map (\y -> x*y) fibs
++
 +In this case we have a CAF, 'fibs', which is quite large after evaluation and
 +has only one possible user, 'add'. Consequently, we want to ensure that when
 +all references to 'add' die we can garbage collect any bit of 'fibs' that we
 +have evaluated.
++
 +However, how do we know whether there are any references to 'fibs' still
 +around? Afterall, the only reference to it is buried in the code generated
 +for 'add'. The answer is that we record the CAFs referred to by a definition
 +in its info table, namely a part of it known as the Static Reference Table
 +(SRT).
 +Since SRTs are so common, we use a special compact encoding for them in: we
 +produce one table containing a list of CAFs in a module and then include a
 +bitmap in each info table describing which entries of this table the closure
 +references.
++
 +See also: commentary/rts/storage/gc/CAFs on the GHC Wiki.
++
 +Note [What is a non-escaping let]
 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
++
 +NB: Nowadays this is recognized by the occurrence analyser by turning a
 +"non-escaping let" into a join point. The following is then an operational
 +account of join points.
++
 +Consider:
++
 +    let x = fvs \ args -> e
 +    in
 +        if ... then x else
 +           if ... then x else ...
++
 +`x' is used twice (so we probably can't unfold it), but when it is
 +entered, the stack is deeper than it was when the definition of `x'
 +happened.  Specifically, if instead of allocating a closure for `x',
 +we saved all `x's fvs on the stack, and remembered the stack depth at
 +that moment, then whenever we enter `x' we can simply set the stack
 +pointer(s) to these remembered (compile-time-fixed) values, and jump
 +to the code for `x'.
++
 +All of this is provided x is:
 +  1. non-updatable;
 +  2. guaranteed to be entered before the stack retreats -- ie x is not
 +     buried in a heap-allocated closure, or passed as an argument to
 +     something;
 +  3. all the enters have exactly the right number of arguments,
 +     no more no less;
 +  4. all the enters are tail calls; that is, they return to the
 +     caller enclosing the definition of `x'.
++
 +Under these circumstances we say that `x' is non-escaping.
++
 +An example of when (4) does not hold:
++
 +    let x = ...
 +    in case x of ...alts...
++
 +Here, `x' is certainly entered only when the stack is deeper than when
 +`x' is defined, but here it must return to ...alts... So we can't just
 +adjust the stack down to `x''s recalled points, because that would lost
 +alts' context.
++
 +Things can get a little more complicated.  Consider:
++
 +    let y = ...
 +    in let x = fvs \ args -> ...y...
 +    in ...x...
++
 +Now, if `x' is used in a non-escaping way in ...x..., and `y' is used in a
 +non-escaping way in ...y..., then `y' is non-escaping.
++
 +`x' can even be recursive!  Eg:
++
 +    letrec x = [y] \ [v] -> if v then x True else ...
 +    in
 +        ...(x b)...
++
 +Note [Cost-centre initialization plan]
 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
++
 +Previously `coreToStg` was initializing cost-centre stack fields as `noCCS`,
 +and the fields were then fixed by a separate pass `stgMassageForProfiling`.
 +We now initialize these correctly. The initialization works like this:
++
 +  - For non-top level bindings always use `currentCCS`.
++
 +  - For top-level bindings, check if the binding is a CAF
++
 +    - CAF:      If -fcaf-all is enabled, create a new CAF just for this CAF
 +                and use it. Note that these new cost centres need to be
 +                collected to be able to generate cost centre initialization
 +                code, so `coreToTopStgRhs` now returns `CollectedCCs`.
++
 +                If -fcaf-all is not enabled, use "all CAFs" cost centre.
++
 +    - Non-CAF:  Top-level (static) data is not counted in heap profiles; nor
 +                do we set CCCS from it; so we just slam in
 +                dontCareCostCentre.
++
 +Note [Coercion tokens]
 +~~~~~~~~~~~~~~~~~~~~~~
 +In coreToStgArgs, we drop type arguments completely, but we replace
 +coercions with a special coercionToken# placeholder. Why? Consider:
++
 +  f :: forall a. Int ~# Bool -> a
 +  f = /\a. \(co :: Int ~# Bool) -> error "impossible"
++
 +If we erased the coercion argument completely, we’d end up with just
 +f = error "impossible", but then f `seq` () would be ⊥!
++
 +This is an artificial example, but back in the day we *did* treat
 +coercion lambdas like type lambdas, and we had bug reports as a
 +result. So now we treat coercion lambdas like value lambdas, but we
 +treat coercions themselves as zero-width arguments — coercionToken#
 +has representation VoidRep — which gets the best of both worlds.
++
 +(For the gory details, see also the (unpublished) paper, “Practical
 +aspects of evidence-based compilation in System FC.”)
++
 +Note [Saturation of data constructors in STG]
 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 +We guarantee that `StgConApp` is an exactly-saturated application of a data
 +constructor worker.
++
 +* If the data constructor is /under/-saturated we just fall through to build
 +  a `StgApp`.  Remember, data constructor workers have a regular top-level definition
 +  (injected by GHC.CoreToStg.Prep.mkDataConWorkers) so we can partially apply
 +  that function.
++
 +* If the data constructor is /over/-saturated, which can happen (see #23865) we again
 +  fall through to `StgApp`.  That will fail horribly at runtime (by applying data
 +  constructor to an argument) but it should be in dead code, and at least the compiler
 +  itself won't crash.  (We could inject an error-thunk instead.)
++
 +Note [Naked lambdas in coreToStgExpr]
 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 +Consider
 +  f x = case x of
 +           True  -> \y. y+x
 +           False -> blah
 +If `f` is not eta expanded (which would have happened in Prep if it was
 +going to happen at all, the code for f must allocate a closure for the
 +(\y. y+x).  So the STG code we want has
++
 +     True -> let pap = \y. y+x
 +             in pap
++
 +The Lam case of `coreToStgExpr` deals with adding this `StgLet`. It's the
 +main reason we need a unique supply in the monad.
++
 +Historical note: in the past, Prep guaranteed there would be no such naked
 +lambdas, so we didn't need a unique supply at all. But that proved too hard
 +in the end (see Note [Eta expansion and the CorePrep invariants]) so we
 +just deal with it here; it's very easy.
 +-}
  -- --------------------------------------------------------------
  -- Setting variable info: top-level, binds, RHSs
  -- --------------------------------------------------------------
 -coreToStg :: CoreToStgOpts -> Module -> ModLocation -> CoreProgram
 -          -> ([StgTopBinding], InfoTableProvMap, CollectedCCs)
 -coreToStg opts@CoreToStgOpts
 -  { coreToStg_ways = ways
 -  , coreToStg_AutoSccsOnIndividualCafs = opt_AutoSccsOnIndividualCafs
 -  , coreToStg_InfoTableMap = opt_InfoTableMap
 -  , coreToStg_stgDebugOpts = stgDebugOpts
 -  } this_mod ml pgm
 -  = (pgm'', denv, final_ccs)
 +coreToStg :: CoreToStgOpts -> Module -> ModLocation
 +          -> CoreProgram
 +          -> IO ([StgTopBinding], InfoTableProvMap, CollectedCCs)
 +coreToStg opts this_mod ml pgm
 +  = do { us <- mkSplitUniqSupply StgTag
 +       ; let (_, (local_ccs, local_cc_stacks), pgm')
 +                = initCts opts us $
 +                  coreTopBindsToStg opts this_mod emptyCollectedCCs pgm
++
 +             -- See Note [Mapping Info Tables to Source Positions]
 +             (!pgm'', !denv)
 +               | opt_InfoTableMap
 +               = collectDebugInformation stgDebugOpts ml pgm'
 +               | otherwise = (pgm', emptyInfoTableProvMap)
++
 +             final_ccs
 +               | prof && opt_AutoSccsOnIndividualCafs
 +               = (local_ccs,local_cc_stacks)  -- don't need "all CAFs" CC
 +               | prof
 +               = (all_cafs_cc:local_ccs, all_cafs_ccs:local_cc_stacks)
 +               | otherwise
 +               = emptyCollectedCCs
++
 +      ; return (pgm'', denv, final_ccs) }
    where
 -    (_, (local_ccs, local_cc_stacks), pgm')
 -      = coreTopBindsToStg opts this_mod emptyVarEnv emptyCollectedCCs pgm
+-
 -    -- See Note [Mapping Info Tables to Source Positions]
 -    (!pgm'', !denv)
 -      | opt_InfoTableMap
 -      = collectDebugInformation stgDebugOpts ml pgm'
 -      | otherwise = (pgm', emptyInfoTableProvMap)
 +    CoreToStgOpts { coreToStg_ways = ways
 +                  , coreToStg_AutoSccsOnIndividualCafs = opt_AutoSccsOnIndividualCafs
 +                  , coreToStg_InfoTableMap = opt_InfoTableMap
 +                  , coreToStg_stgDebugOpts = stgDebugOpts }
 +       = opts
      prof = hasWay ways WayProf
+-
 -    final_ccs
 -      | prof && opt_AutoSccsOnIndividualCafs
 -      = (local_ccs,local_cc_stacks)  -- don't need "all CAFs" CC
 -      | prof
 -      = (all_cafs_cc:local_ccs, all_cafs_ccs:local_cc_stacks)
 -      | otherwise
 -      = emptyCollectedCCs
+-
      (all_cafs_cc, all_cafs_ccs) = getAllCAFsCC this_mod
  coreTopBindsToStg
      :: CoreToStgOpts
      -> Module
 -    -> IdEnv HowBound           -- environment for the bindings
      -> CollectedCCs
      -> CoreProgram
 -    -> (IdEnv HowBound, CollectedCCs, [StgTopBinding])
 +    -> CtsM (IdEnv HowBound, CollectedCCs, [StgTopBinding])
++
 +coreTopBindsToStg _ _ ccs []
 +  = do { env <- getCtsEnv
 +       ; return (env, ccs, []) }
 -coreTopBindsToStg _      _        env ccs []
 -  = (env, ccs, [])
 -coreTopBindsToStg opts this_mod env ccs (b:bs)
 +coreTopBindsToStg opts this_mod ccs (b:bs)
    | NonRec _ rhs <- b, isTyCoArg rhs
 -  = coreTopBindsToStg opts this_mod env1 ccs1 bs
 +  = coreTopBindsToStg opts this_mod ccs bs
    | otherwise
 -  = (env2, ccs2, b':bs')
 -  where
 -    (env1, ccs1, b' ) = coreTopBindToStg opts this_mod env ccs b
 -    (env2, ccs2, bs') = coreTopBindsToStg opts this_mod env1 ccs1 bs
 +  = do { (env1, ccs1, b' ) <- coreTopBindToStg opts this_mod ccs b
 +       ; (env2, ccs2, bs') <- setCtsEnv env1 $
 +                              coreTopBindsToStg opts this_mod ccs1 bs
 +      ; return (env2, ccs2, b':bs') }
  coreTopBindToStg
          :: CoreToStgOpts
          -> Module
 -        -> IdEnv HowBound
          -> CollectedCCs
          -> CoreBind
 -        -> (IdEnv HowBound, CollectedCCs, StgTopBinding)
 +        -> CtsM (IdEnv HowBound, CollectedCCs, StgTopBinding)
 -coreTopBindToStg _ _ env ccs (NonRec id e)
 +coreTopBindToStg _ _ ccs (NonRec id e)
    | Just str <- exprIsTickedString_maybe e
    -- top-level string literal
    -- See Note [Core top-level string literals] in GHC.Core
 -  = let
 -        env' = extendVarEnv env id how_bound
 -        how_bound = LetBound TopLet 0
 -    in (env', ccs, StgTopStringLit id str)
+-
 -coreTopBindToStg opts@CoreToStgOpts
 -  { coreToStg_platform = platform
 -  } this_mod env ccs (NonRec id rhs)
 -  = let
 -        env'      = extendVarEnv env id how_bound
 -        how_bound = LetBound TopLet $! manifestArity rhs
+-
 -        (ccs', (id', stg_rhs)) =
 -            initCts platform env $
 -              coreToTopStgRhs opts this_mod ccs (id,rhs)
+-
 -        bind = StgTopLifted $ StgNonRec id' stg_rhs
 -    in
 -      -- NB: previously the assertion printed 'rhs' and 'bind'
 -      --     as well as 'id', but that led to a black hole
 -      --     where printing the assertion error tripped the
 -      --     assertion again!
 -    (env', ccs', bind)
+-
 -coreTopBindToStg opts@CoreToStgOpts
 -  { coreToStg_platform = platform
 -  } this_mod env ccs (Rec pairs)
 +  = do { env <- getCtsEnv
 +       ; let env' = extendVarEnv env id how_bound
 +             how_bound = LetBound TopLet 0
 +       ; return (env', ccs, StgTopStringLit id str) }
++
 +coreTopBindToStg opts this_mod ccs (NonRec id rhs)
 +  = do { (ccs', (id', stg_rhs)) <- coreToTopStgRhs opts this_mod ccs (id,rhs)
++
 +       ; env <- getCtsEnv
 +       ; let env'      = extendVarEnv env id how_bound
 +             how_bound = LetBound TopLet $! manifestArity rhs
 +             bind      = StgTopLifted $ StgNonRec id' stg_rhs
 +       ; return (env', ccs', bind) }
++
 +coreTopBindToStg opts this_mod ccs (Rec pairs)
    = assert (not (null pairs)) $
 -    let
 -        extra_env' = [ (b, LetBound TopLet $! manifestArity rhs)
 -                     | (b, rhs) <- pairs ]
 -        env' = extendVarEnvList env extra_env'
+-
 -        -- generate StgTopBindings and CAF cost centres created for CAFs
 -        (ccs', stg_rhss)
 -          = initCts platform env' $ mapAccumLM (coreToTopStgRhs opts this_mod) ccs pairs
 -        bind = StgTopLifted $ StgRec stg_rhss
 -    in
 -    (env', ccs', bind)
 +    do { env <- getCtsEnv
 +       ; let extra_env' = [ (b, LetBound TopLet $! manifestArity rhs)
 +                          | (b, rhs) <- pairs ]
 +             env' = extendVarEnvList env extra_env'
++
 +       -- Generate StgTopBindings and CAF cost centres created for CAFs
 +       ; (ccs', stg_rhss) <- setCtsEnv env' $
 +                             mapAccumLM (coreToTopStgRhs opts this_mod) ccs pairs
 +       ; let bind = StgTopLifted $ StgRec stg_rhss
++
 +       ; return (env', ccs', bind) }
  coreToTopStgRhs
          :: CoreToStgOpts
@@ -420,16 +434,24 @@ coreToStgExpr expr@(App _ _)
        res_ty                  = exprType expr
        (app_head, args, ticks) = myCollectArgs expr res_ty
 -coreToStgExpr expr@(Lam _ _)
 -  = let
 -        (args, body) = myCollectBinders expr
 -    in
 -    case filterStgBinders args of
+-
 -      [] -> coreToStgExpr body
+-
 -      _ -> pprPanic "coretoStgExpr" $
 -        text "Unexpected value lambda:" $$ ppr expr
 +coreToStgExpr expr@(Lam {})
 +  | null val_bndrs
 +  = coreToStgExpr body
 +  | otherwise
 +  = -- See Note [Naked lambdas in coreToStgExpr]
 +    do { body' <- extendVarEnvCts [ (a, LambdaBound) | a <- val_bndrs ] $
 +                  coreToStgExpr body
 +       ; uniq <- getCtsUnique
 +       ; let body_ty = exprType body
 +             fun_ty  = mkLamTypes val_bndrs body_ty
 +                       -- This type is a bit ill-formed but it doesn't matter
 +             rhs = StgRhsClosure noExtFieldSilent currentCCS
 +                                 ReEntrant val_bndrs body' body_ty
 +             tmp_fun = mkSysLocal (fsLit "pap") uniq ManyTy fun_ty
 +       ; return (StgLet noExtFieldSilent (StgNonRec tmp_fun rhs) $
 +                 StgApp tmp_fun []) }
 +  where
 +    (val_bndrs, body) = myCollectBinders NotJoinPoint expr
  coreToStgExpr (Tick tick expr)
    = do
@@ -634,8 +656,13 @@ coreToStgArgs (arg : args) = do         -- Non-type argument
          stg_arg_rep = stgArgRep arg'
          bad_args = not (primRepsCompatible platform arg_rep stg_arg_rep)
 -    massertPpr (length ticks' <= 1) (text "More than one Tick in trivial arg:" <+> ppr arg)
 -    warnPprTraceM bad_args "Dangerous-looking argument. Probable cause: bad unsafeCoerce#" (ppr arg)
 +    -- Yikes!  This assert FAILS in tests T13658, T14779b
 +    -- It has been so for ages, but without the "() <-" it was lazily dropped
 +    -- Hence commenting it out: see #27132
 +    --    massertPpr (length ticks' <= 1) (text "More than one Tick in trivial arg:" <+> ppr arg)
++
 +    () <- warnPprTraceM bad_args
 +            "Dangerous-looking argument. Probable cause: bad unsafeCoerce#" (ppr arg)
      return (arg' : stg_args, ticks' ++ ticks)
@@ -710,12 +737,11 @@ coreToStgRhs (bndr, rhs) = do
  -- coreToStgExpr that can handle value lambdas.
  coreToMkStgRhs :: HasDebugCallStack => Id -> CoreExpr -> CtsM MkStgRhs
  coreToMkStgRhs bndr expr = do
 -  let (args, body) = myCollectBinders expr
 -  let args'        = filterStgBinders args
 -  extendVarEnvCts [ (a, LambdaBound) | a <- args' ] $ do
 +  let (bndrs, body) = myCollectBinders (idJoinPointHood bndr) expr
 +  extendVarEnvCts [ (a, LambdaBound) | a <- bndrs ] $ do
      body' <- coreToStgExpr body
      let mk_rhs = MkStgRhs
 -          { rhs_args = args'
 +          { rhs_args = bndrs
            , rhs_expr = body'
            , rhs_type = exprType body
            , rhs_is_join = isJoinId bndr
@@ -733,7 +759,7 @@ coreToMkStgRhs bndr expr = do
  newtype CtsM a = CtsM
      { unCtsM :: Platform -- Needed for checking for bad coercions in coreToStgArgs
               -> IdEnv HowBound
 -             -> a
 +             -> UniqSM a
+     }
      deriving (Functor)
@@ -769,20 +795,22 @@ data LetInfo
  -- The std monad functions:
 -initCts :: Platform -> IdEnv HowBound -> CtsM a -> a
 -initCts platform env m = unCtsM m platform env
+-
 +initCts :: CoreToStgOpts -> UniqSupply -> CtsM a -> a
 +initCts opts us cts_m
 +  = initUs_ us $
 +    unCtsM cts_m (coreToStg_platform opts) emptyVarEnv
  {-# INLINE thenCts #-}
  {-# INLINE returnCts #-}
  returnCts :: a -> CtsM a
 -returnCts e = CtsM $ \_ _ -> e
 +returnCts e = CtsM $ \_ _ -> return e
  thenCts :: CtsM a -> (a -> CtsM b) -> CtsM b
 -thenCts m k = CtsM $ \platform env
 -  -> unCtsM (k (unCtsM m platform env)) platform env
 +thenCts m k = CtsM $ \platform env ->
 +              do { v <- unCtsM m platform env
 +                 ; unCtsM (k v) platform env }
  instance Applicative CtsM where
      pure = returnCts
@@ -792,17 +820,26 @@ instance Monad CtsM where
      (>>=)  = thenCts
  getPlatform :: CtsM Platform
 -getPlatform = CtsM const
 +getPlatform = CtsM $ \platform _ -> return platform
  -- Functions specific to this monad:
 +setCtsEnv :: IdEnv HowBound -> CtsM a -> CtsM a
 +setCtsEnv env thing = CtsM $ \platform _ -> unCtsM thing platform env
++
 +getCtsEnv :: CtsM (IdEnv HowBound)
 +getCtsEnv = CtsM $ \_ env -> return env
++
 +getCtsUnique :: CtsM Unique
 +getCtsUnique = CtsM $ \_ _ -> getUniqueM
++
  extendVarEnvCts :: [(Id, HowBound)] -> CtsM a -> CtsM a
  extendVarEnvCts ids_w_howbound expr
     =    CtsM $   \platform env
     -> unCtsM expr platform (extendVarEnvList env ids_w_howbound)
  lookupVarCts :: Id -> CtsM HowBound
 -lookupVarCts v = CtsM $ \_ env -> lookupBinding env v
 +lookupVarCts v = CtsM $ \_ env -> return (lookupBinding env v)
  lookupBinding :: IdEnv HowBound -> Id -> HowBound
  lookupBinding env v = case lookupVarEnv env v of
@@ -814,13 +851,26 @@ lookupBinding env v = case lookupVarEnv env v of
  filterStgBinders :: [Var] -> [Var]
  filterStgBinders bndrs = filter isId bndrs
 -myCollectBinders :: Expr Var -> ([Var], Expr Var)
 -myCollectBinders expr
 +myCollectBinders :: JoinPointHood -> Expr Var -> ([Var], Expr Var)
 +-- Collect the binders from a lambda:
 +--   * Dropping type lambdas
 +--   * Stopping at join-point arity
 +myCollectBinders NotJoinPoint expr
    = go [] expr
    where
 -    go bs (Lam b e)          = go (b:bs) e
 -    go bs (Cast e _)         = go bs e
 -    go bs e                  = (reverse bs, e)
 +    go bs (Lam b e) | isRuntimeVar b = go (b:bs) e
 +                    | otherwise      = go bs     e
 +    go bs (Cast e _)                 = go bs e
 +    go bs e                          = (reverse bs, e)
++
 +myCollectBinders (JoinPoint n) expr
 +  = go n [] expr
 +  where
 +    go n bs e | n==0                   = (reverse bs, e)
 +    go n bs (Lam b e) | isRuntimeVar b = go (n-1) (b:bs) e
 +                      | otherwise      = go (n-1) bs     e
 +    go n bs (Cast e _)                 = go n bs e
 +    go _ bs e                          = (reverse bs, e)
  -- | If the argument expression is (potential chain of) 'App', return the head
  -- of the app chain, and collect ticks/args along the chain.

compiler/GHC/CoreToStg/Prep.hs

@@ -144,16 +144,13 @@ Here is the syntax of the Core produced by CorePrep:
      Expressions
         body ::= app
 -             |  let(rec) x = rhs in body     -- Boxed only
 +             |  let(rec) x = body in body     -- Boxed only
               |  case body of pat -> body
 -             |  /\a. body | /\c. body
 +             |  /\a. body | /\c. body | \x. body
               |  body |> co
 -    Right hand sides (only place where value lambdas can occur)
 -       rhs ::= /\a.rhs  |  \x.rhs  |  body
+-
 -We define a synonym for each of these non-terminals.  Functions
 -with the corresponding name produce a result in that syntax.
 +We define a synonym for each of these non-terminals, CpeArg, CpeApp, and
 +CpeBody.  Functions with the corresponding name produce a result in that syntax.
  Note [Cloning in CorePrep]
  ~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -218,7 +215,6 @@ So our plan is:
  type CpeArg  = CoreExpr    -- Non-terminal 'arg'
  type CpeApp  = CoreExpr    -- Non-terminal 'app'
  type CpeBody = CoreExpr    -- Non-terminal 'body'
 -type CpeRhs  = CoreExpr    -- Non-terminal 'rhs'
  {-
  ************************************************************************
@@ -261,7 +257,7 @@ corePrepExpr logger config expr = do
      withTiming logger (text "CorePrep [expr]") (\e -> e `seq` ()) $ do
        us <- mkSplitUniqSupply StgTag
        let initialCorePrepEnv = mkInitialCorePrepEnv config
 -      let new_expr = initUs_ us (cpeBodyNF initialCorePrepEnv expr)
 +      let new_expr = initUs_ us (cpeBody initialCorePrepEnv expr)
        putDumpFileMaybe logger Opt_D_dump_prep "CorePrep" FormatCore (ppr new_expr)
        return new_expr
@@ -665,16 +661,16 @@ cpeBind top_lvl env (Rec pairs)
  ---------------
  cpePair :: TopLevelFlag -> RecFlag -> Demand -> Levity
          -> CorePrepEnv -> OutId -> CoreExpr
 -        -> UniqSM (Floats, CpeRhs)
 +        -> UniqSM (Floats, CpeBody)
  -- Used for all bindings
  -- The binder is already cloned, hence an OutId
  cpePair top_lvl is_rec dmd lev env0 bndr rhs
    = assert (isNothing $ joinPointBinding_maybe bndr rhs) $ -- those should use cpeJoinPair
 -    do { (floats1, rhs1) <- cpeRhsE env rhs
 +    do { (floats1, rhs1) <- cpeBodyF env rhs
         -- See if we are allowed to float this stuff out of the RHS
         ; let dec = want_float_from_rhs floats1 rhs1
 -       ; (floats2, rhs2) <- executeFloatDecision env dec floats1 rhs1
 +             (floats2, rhs2) = executeFloatDecision dec floats1 rhs1
         -- Make the arity match up
         ; (floats3, rhs3)
@@ -717,7 +713,7 @@ it seems good for CorePrep to be robust.
  ---------------
  cpeJoinPair :: CorePrepEnv -> JoinId -> CoreExpr
 -            -> UniqSM (JoinId, CpeRhs)
 +            -> UniqSM (JoinId, CpeBody)
  -- Used for all join bindings
  -- No eta-expansion: see Note [Do not eta-expand join points] in GHC.Core.Opt.Simplify.Utils
  cpeJoinPair env bndr rhs
@@ -729,7 +725,7 @@ cpeJoinPair env bndr rhs
         ; (env', bndrs') <- cpCloneBndrs env bndrs
 -       ; body' <- cpeBodyNF env' body -- Will let-bind the body if it starts
 +       ; body' <- cpeBody env' body -- Will let-bind the body if it starts
                                        -- with a lambda
         ; let rhs'  = mkCoreLams bndrs' body'
@@ -757,10 +753,20 @@ for us to mess with the arity because a join point is never exported.
  -}
  -- ---------------------------------------------------------------------------
 ---              CpeRhs: produces a result satisfying CpeRhs
 +--              cpeBodyF: produces a result satisfying CpeBody
  -- ---------------------------------------------------------------------------
 -cpeRhsE :: CorePrepEnv -> CoreExpr -> UniqSM (Floats, CpeRhs)
 +cpeBodyF :: CorePrepEnv -> CoreExpr -> UniqSM (Floats, CpeBody)
 +-- | Convert a 'CoreExpr' so it satisfies 'CpeBody'; also produce
 +-- a list of 'Floats' which are being propagated upwards.  In
 +-- fact, this function is used in only two cases: to
 +-- implement 'cpeBody' (which is what you usually want),
 +-- and in the case when a let-binding is in a case scrutinee--here,
 +-- we can always float out:
 +--
 +--      case (let x = y in z) of ...
 +--      ==> let x = y in case z of ...
 +--
  -- If
  --      e  ===>  (bs, e')
  -- then
@@ -769,32 +775,32 @@ cpeRhsE :: CorePrepEnv -> CoreExpr -> UniqSM (Floats, CpeRhs)
  -- For example
  --      f (g x)   ===>   ([v = g x], f v)
 -cpeRhsE env (Type ty)
 +cpeBodyF env (Type ty)
    = return (emptyFloats, Type (cpSubstTy env ty))
 -cpeRhsE env (Coercion co)
 +cpeBodyF env (Coercion co)
    = return (emptyFloats, Coercion (cpSubstCo env co))
 -cpeRhsE env expr@(Lit lit)
 +cpeBodyF env expr@(Lit lit)
    | LitNumber LitNumBigNat i <- lit
      = cpeBigNatLit env i
    | otherwise = return (emptyFloats, expr)
 -cpeRhsE env expr@(Var {})  = cpeApp env expr
 -cpeRhsE env expr@(App {})  = cpeApp env expr
 +cpeBodyF env expr@(Var {})  = cpeApp env expr
 +cpeBodyF env expr@(App {})  = cpeApp env expr
 -cpeRhsE env (Let bind body)
 +cpeBodyF env (Let bind body)
    = do { (env', bind_floats, maybe_bind') <- cpeBind NotTopLevel env bind
 -       ; (body_floats, body') <- cpeRhsE env' body
 +       ; (body_floats, body') <- cpeBodyF env' body
         ; let expr' = case maybe_bind' of Just bind' -> Let bind' body'
                                           Nothing    -> body'
         ; return (bind_floats `appFloats` body_floats, expr') }
 -cpeRhsE env (Tick tickish expr)
 +cpeBodyF env (Tick tickish expr)
    -- Pull out ticks if they are allowed to be floated.
    | tickishFloatable tickish
 -  = do { (floats, body) <- cpeRhsE env expr
 +  = do { (floats, body) <- cpeBodyF env expr
           -- See [Floating Ticks in CorePrep]
         ; return (FloatTick tickish `consFloat` floats, body) }
    | otherwise
 -  = do { body <- cpeBodyNF env expr
 +  = do { body <- cpeBody env expr
         ; return (emptyFloats, mkTick tickish' body) }
    where
      tickish' | Breakpoint ext bid fvs <- tickish
@@ -803,17 +809,17 @@ cpeRhsE env (Tick tickish expr)
               | otherwise
               = tickish
 -cpeRhsE env (Cast expr co)
 -   = do { (floats, expr') <- cpeRhsE env expr
 +cpeBodyF env (Cast expr co)
 +   = do { (floats, expr') <- cpeBodyF env expr
          ; return (floats, Cast expr' (cpSubstCo env co)) }
 -cpeRhsE env expr@(Lam {})
 +cpeBodyF env expr@(Lam {})
     = do { let (bndrs,body) = collectBinders expr
          ; (env', bndrs') <- cpCloneBndrs env bndrs
 -        ; body' <- cpeBodyNF env' body
 +        ; body' <- cpeBody env' body
          ; return (emptyFloats, mkLams bndrs' body') }
 -cpeRhsE env (Case scrut bndr _ alts@[Alt con [covar] _])
 +cpeBodyF env (Case scrut bndr _ alts@[Alt con [covar] _])
    -- See (U3) in Note [Implementing unsafeCoerce]
    -- We need make the Case float, otherwise we get
    --   let x = case ... of UnsafeRefl co ->
@@ -828,7 +834,7 @@ cpeRhsE env (Case scrut bndr _ alts@[Alt con [covar] _])
    -- Note that `x` is a value here. This is visible in the GHCi debugger tests
    -- (such as `print003`).
    | Just rhs <- isUnsafeEqualityCase scrut bndr alts
 -  = do { (floats_scrut, scrut) <- cpeBody env scrut
 +  = do { (floats_scrut, scrut) <- cpeBodyF env scrut
         ; (env, bndr')  <- cpCloneBndr env bndr
         ; (env, covar') <- cpCloneCoVarBndr env covar
@@ -836,19 +842,19 @@ cpeRhsE env (Case scrut bndr _ alts@[Alt con [covar] _])
                            -- See Note [Cloning CoVars and TyVars]
           -- Up until here this should do exactly the same as the regular code
 -         -- path of `cpeRhsE Case{}`.
 -       ; (floats_rhs, rhs) <- cpeBody env rhs
 +         -- path of `cpeBodyF Case{}`.
 +       ; (floats_rhs, rhs) <- cpeBodyF env rhs
           -- ... but we want to float `floats_rhs` as in (U3) so that rhs' might
           -- become a value
         ; let case_float = UnsafeEqualityCase scrut bndr' con [covar']
           -- NB: It is OK to "evaluate" the proof eagerly.
           --     Usually there's the danger that we float the unsafeCoerce out of
           --     a branching Case alt. Not so here, because the regular code path
 -         --     for `cpeRhsE Case{}` will not float out of alts.
 +         --     for `cpeBodyF Case{}` will not float out of alts.
               floats = snocFloat floats_scrut case_float `appFloats` floats_rhs
         ; return (floats, rhs) }
 -cpeRhsE env (Case scrut bndr _ [Alt (DataAlt dc) [token_out, res] rhs])
 +cpeBodyF env (Case scrut bndr _ [Alt (DataAlt dc) [token_out, res] rhs])
    -- See item (SEQ4) of Note [seq# magic]. We want to match
    --   case seq# @a @RealWorld <ok-to-discard> s of (# s', _ #) -> rhs[s']
    -- and simplify to rhs[s]. Triggers in T15226.
@@ -869,10 +875,10 @@ cpeRhsE env (Case scrut bndr _ [Alt (DataAlt dc) [token_out, res] rhs])
        -- often zaps the OccInfo on case-alternative binders (see Note [DataAlt occ info]
        -- in GHC.Core.Opt.Simplify.Iteration) because the scrutinee is not a
        -- variable, and in that case the zapping doesn't happen; see that Note.
 -  = cpeRhsE (extendCorePrepEnv env token_out token_in') rhs
 +  = cpeBodyF (extendCorePrepEnv env token_out token_in') rhs
 -cpeRhsE env (Case scrut bndr ty alts)
 -  = do { (floats, scrut') <- cpeBody env scrut
 +cpeBodyF env (Case scrut bndr ty alts)
 +  = do { (floats, scrut') <- cpeBodyF env scrut
         ; (env', bndr2) <- cpCloneBndr env bndr
         ; let bndr3 = bndr2 `setIdUnfolding` evaldUnfolding
         ; let alts'
@@ -885,7 +891,7 @@ cpeRhsE env (Case scrut bndr ty alts)
                 , not (altsAreExhaustive alts)
                 = addDefault alts (Just err)
                 | otherwise = alts
 -               where err = mkImpossibleExpr ty "cpeRhsE: missing case alternative"
 +               where err = mkImpossibleExpr ty "cpeBodyF: missing case alternative"
         ; alts'' <- mapM (sat_alt env') alts'
         ; case alts'' of
@@ -896,7 +902,7 @@ cpeRhsE env (Case scrut bndr ty alts)
    where
      sat_alt env (Alt con bs rhs)
         = do { (env2, bs') <- cpCloneBndrs env bs
 -            ; rhs' <- cpeBodyNF env2 rhs
 +            ; rhs' <- cpeBody env2 rhs
              ; return (Alt con bs' rhs') }
  -- ---------------------------------------------------------------------------
@@ -908,74 +914,10 @@ cpeRhsE env (Case scrut bndr ty alts)
  -- let-bound using 'wrapBinds').  Generally you want this, esp.
  -- when you've reached a binding form (e.g., a lambda) and
  -- floating any further would be incorrect.
 -cpeBodyNF :: CorePrepEnv -> CoreExpr -> UniqSM CpeBody
 -cpeBodyNF env expr
 -  = do { (floats, body) <- cpeBody env expr
 -       ; return (wrapBinds floats body) }
+-
 --- | Convert a 'CoreExpr' so it satisfies 'CpeBody'; also produce
 --- a list of 'Floats' which are being propagated upwards.  In
 --- fact, this function is used in only two cases: to
 --- implement 'cpeBodyNF' (which is what you usually want),
 --- and in the case when a let-binding is in a case scrutinee--here,
 --- we can always float out:
 ---
 ---      case (let x = y in z) of ...
 ---      ==> let x = y in case z of ...
 ---
 -cpeBody :: CorePrepEnv -> CoreExpr -> UniqSM (Floats, CpeBody)
 +cpeBody :: CorePrepEnv -> CoreExpr -> UniqSM CpeBody
  cpeBody env expr
 -  = do { (floats1, rhs) <- cpeRhsE env expr
 -       ; (floats2, body) <- rhsToBody env rhs
 -       ; return (floats1 `appFloats` floats2, body) }
+-
 ---------
 -rhsToBody :: CorePrepEnv -> CpeRhs -> UniqSM (Floats, CpeBody)
 --- Remove top level lambdas by let-binding
+-
 -rhsToBody env (Tick t expr)
 -  | tickishHasNoScope t -- only float out of non-scoped annotations
 -  = do { (floats, expr') <- rhsToBody env expr
 -       ; return (floats, mkTick t expr') }
+-
 -rhsToBody env (Cast e co)
 -        -- You can get things like
 -        --      case e of { p -> coerce t (\s -> ...) }
 -  = do { (floats, e') <- rhsToBody env e
 -       ; return (floats, Cast e' co) }
+-
 -rhsToBody env expr@(Lam {})   -- See Note [No eta reduction needed in rhsToBody]
 -  | all isTyVar bndrs           -- Type lambdas are ok
 -  = return (emptyFloats, expr)
 -  | otherwise                   -- Some value lambdas
 -  = do { let rhs = cpeEtaExpand (exprArity expr) expr
 -       ; fn <- newVar env (exprType rhs)
 -       ; let float = Float (NonRec fn rhs) LetBound TopLvlFloatable
 -       ; return (unitFloat float, Var fn) }
 -  where
 -    (bndrs,_) = collectBinders expr
+-
 -rhsToBody _env expr = return (emptyFloats, expr)
+-
+-
 -{- Note [No eta reduction needed in rhsToBody]
 -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 -Historical note.  In the olden days we used to have a Prep-specific
 -eta-reduction step in rhsToBody:
 -  rhsToBody expr@(Lam {})
 -    | Just no_lam_result <- tryEtaReducePrep bndrs body
 -    = return (emptyFloats, no_lam_result)
+-
 -The goal was to reduce
 -        case x of { p -> \xs. map f xs }
 -    ==> case x of { p -> map f }
+-
 -to avoid allocating a lambda.  Of course, we'd allocate a PAP
 -instead, which is hardly better, but that's the way it was.
+-
 -Now we simply don't bother with this. It doesn't seem to be a win,
 -and it's extra work.
 --}
 +  = do { (floats, body) <- cpeBodyF env expr
 +       ; return (wrapBinds floats body) }
  -- ---------------------------------------------------------------------------
  --              CpeApp: produces a result satisfying CpeApp
@@ -1060,8 +1002,8 @@ body of the eta-expansion lambda, resulting in
  which is unproblematic.
  -}
 -cpeApp :: CorePrepEnv -> CoreExpr -> UniqSM (Floats, CpeRhs)
 --- May return a CpeRhs (instead of CpeApp) because of saturating primops
 +cpeApp :: CorePrepEnv -> CoreExpr -> UniqSM (Floats, CpeBody)
 +-- May return a CpeBody (instead of CpeApp) because of saturating primops
  cpeApp top_env expr
    = do { let (terminal, args) = collect_args expr
        --  ; pprTraceM "cpeApp" $ (ppr expr)
@@ -1103,7 +1045,7 @@ cpeApp top_env expr
      cpe_app :: CorePrepEnv
              -> CoreExpr -- The thing we are calling
              -> [ArgInfo]
 -            -> UniqSM (Floats, CpeRhs)
 +            -> UniqSM (Floats, CpeBody)
      cpe_app env (Var f) (AIApp Type{} : AIApp arg : args)
          | f `hasKey` lazyIdKey          -- Replace (lazy a) with a, and
              -- See Note [lazyId magic] in GHC.Types.Id.Make
@@ -1156,7 +1098,7 @@ cpeApp top_env expr
          --          case thing of res { __DEFAULT -> (# token, res#) } },
          -- allocating CaseBound Floats for token and thing as needed
          = do { (floats1, token) <- cpeArg env topDmd token
 -             ; (floats2, thing) <- cpeBody env thing
 +             ; (floats2, thing) <- cpeBodyF env thing
               ; case_bndr <- (`setIdUnfolding` evaldUnfolding) <$> newVar env ty
               ; let tup = mkCoreUnboxedTuple [token, Var case_bndr]
               ; let float = mkCaseFloat case_bndr thing
@@ -1173,9 +1115,10 @@ cpeApp top_env expr
                       then Just $! idArity v_hd
                       else Nothing
                     Nothing -> Nothing
 -          --  ; pprTraceM "cpe_app:stricts:" (ppr v <+> ppr args $$ ppr stricts $$ ppr (idCbvMarks_maybe v))
             ; (app, floats, unsat_ticks) <- rebuild_app env args e2 emptyFloats stricts min_arity
 -           ; mb_saturate hd app floats unsat_ticks depth }
 +           ; case hd of
 +               Nothing    -> do { massert (null unsat_ticks); return (floats, app) }
 +               Just fn_id -> return (floats, maybeSaturate fn_id app depth unsat_ticks) }
          where
            depth = val_args args
            stricts = case idDmdSig v of
@@ -1190,8 +1133,8 @@ cpeApp top_env expr
                  -- partial application might be seq'd
          -- We inlined into something that's not a var and has no args.
 -        -- Bounce it back up to cpeRhsE.
 -    cpe_app env fun [] = cpeRhsE env fun
 +        -- Bounce it back up to cpeBodyF.
 +    cpe_app env fun [] = cpeBodyF env fun
      -- Here we get:
      -- N-variable fun, better let-bind it
@@ -1202,7 +1145,8 @@ cpeApp top_env expr
                            -- If evalDmd says that it's sure to be evaluated,
                            -- we'll end up case-binding it
             ; (app, floats,unsat_ticks) <- rebuild_app env args fun' fun_floats [] Nothing
 -           ; mb_saturate Nothing app floats unsat_ticks (val_args args) }
 +           ; massert (null unsat_ticks)
 +           ; return (floats, app) }
      -- Count the number of value arguments *and* coercions (since we don't eliminate the later in STG)
      val_args :: [ArgInfo] -> Int
@@ -1223,13 +1167,6 @@ cpeApp top_env expr
                    | isTypeArg e = n
                    | otherwise   = n+1
 -    -- Saturate if necessary
 -    mb_saturate head app floats unsat_ticks depth =
 -       case head of
 -         Just fn_id -> do { sat_app <- maybeSaturate fn_id app depth unsat_ticks
 -                          ; return (floats, sat_app) }
 -         _other     -> do { massert (null unsat_ticks)
 -                          ; return (floats, app) }
      -- Deconstruct and rebuild the application, floating any non-atomic
      -- arguments to the outside.  We collect the type of the expression,
@@ -1561,11 +1498,11 @@ Wrinkles:
  cpeArg :: CorePrepEnv -> Demand
         -> CoreArg -> UniqSM (Floats, CpeArg)
  cpeArg env dmd arg
 -  = do { (floats1, arg1) <- cpeRhsE env arg     -- arg1 can be a lambda
 +  = do { (floats1, arg1) <- cpeBodyF env arg     -- arg1 can be a lambda
         ; let arg_ty = exprType arg1
               lev    = typeLevity arg_ty
               dec    = wantFloatLocal NonRecursive dmd lev floats1 arg1
 -       ; (floats2, arg2) <- executeFloatDecision env dec floats1 arg1
 +             (floats2, arg2) = executeFloatDecision dec floats1 arg1
                  -- Else case: arg1 might have lambdas, and we can't
                  --            put them inside a wrapBinds
@@ -1580,7 +1517,12 @@ cpeArg env dmd arg
                         arg3  = cpeEtaExpand arity arg2
                         -- See Note [Eta expansion of arguments in CorePrep]
                   ; let (arg_float, v') = mkNonRecFloat env lev v arg3
 -                 ---; pprTraceM "cpeArg" (ppr arg1 $$ ppr dec $$ ppr arg2)
 +--                 ; pprTraceM "cpeArg" (vcat [ text "arg1" <+> ppr arg1
 +--                                            , text "decision" <+>  ppr dec
 +--                                            , text "arg2" <+> ppr arg2
 +--                                            , text "arity" <+> ppr arity
 +--                                            , text "arg3" <+> ppr arg3
 +--                                            ])
                   ; return (snocFloat floats2 arg_float, varToCoreExpr v') }
+        }
@@ -1617,59 +1559,56 @@ eta_would_wreck_join (Tick _ e)        = eta_would_wreck_join e
  eta_would_wreck_join (Case _ _ _ alts) = any eta_would_wreck_join (rhssOfAlts alts)
  eta_would_wreck_join _                 = False
 -maybeSaturate :: Id -> CpeApp -> Int -> [CoreTickish] -> UniqSM CpeRhs
 +maybeSaturate :: Id -> CpeApp
 +              -> Int  -- Number of value arguments in the application
 +              -> [CoreTickish]
 +              -> CpeBody
  maybeSaturate fn expr n_args unsat_ticks
 -  | hasNoBinding fn        -- There's no binding
 -    -- See Note [Eta expansion of hasNoBinding things in CorePrep]
 -  = return $ wrapLamBody (\body -> foldr mkTick body unsat_ticks) sat_expr
+-
 -  | mark_arity > 0 -- A call-by-value function.
 -                   -- See Note [CBV Function Ids: overview]
 -  , not applied_marks
 -  = assertPpr
 -      ( not (isJoinId fn)) -- See Note [Do not eta-expand join points]
 -      ( ppr fn $$ text "expr:" <+> ppr expr $$ text "n_args:" <+> ppr n_args $$
 -          text "marks:" <+> ppr (idCbvMarks_maybe fn) $$
 -          text "join_arity" <+> ppr (idJoinPointHood fn) $$
 -          text "fn_arity" <+> ppr fn_arity
 -       ) $
 -    -- pprTrace "maybeSat"
 -    --   ( ppr fn $$ text "expr:" <+> ppr expr $$ text "n_args:" <+> ppr n_args $$
 -    --       text "marks:" <+> ppr (idCbvMarks_maybe fn) $$
 -    --       text "join_arity" <+> ppr (isJoinId_maybe fn) $$
 -    --       text "fn_arity" <+> ppr fn_arity $$
 -    --       text "excess_arity" <+> ppr excess_arity $$
 -    --       text "mark_arity" <+> ppr mark_arity
 -    --    ) $
 -    return sat_expr
 +  | isJoinId fn  -- Never eta-expand a call to a join point
 +                 -- See Note [Do not eta-expand join points]
 +  = assertPpr (not must_eta_expand) (ppr expr) $
 +    -- assertPpr: check that all arguments that need to be passed cbv
 +    -- are visible, so the backend can evalaute them if required
 +    expr
++
 +  | must_eta_expand || desirable_to_eta_expand
 +    -- n_args > 0: do not eta-expand a naked variable!
 +  = wrapLamBody (mkTicks unsat_ticks) $
 +    cpeEtaExpand excess_arity expr
    | otherwise
 -  = assert (null unsat_ticks) $
 -    return expr
 +  = expr
++
    where
 -    mark_arity    = idCbvMarkArity fn
 -    fn_arity      = idArity fn
 -    excess_arity  = (max fn_arity mark_arity) - n_args
 -    sat_expr      = cpeEtaExpand excess_arity expr
 -    applied_marks = n_args >= (length . dropWhile (not . isMarkedCbv) .
 -                               reverse . expectJust $ (idCbvMarks_maybe fn))
 -    -- For join points we never eta-expand (See Note [Do not eta-expand join points])
 -    -- so we assert all arguments that need to be passed cbv are visible so that the
 -    -- backend can evalaute them if required..
 +    must_eta_expand
 +      =  (hasNoBinding fn && fn_arity > n_args)
 +            -- hasNoBinding functions must be saturated
 +      || (mark_arity > n_args)
 +            -- CBV functions must be CBV-saturated
++
 +    desirable_to_eta_expand = fn_arity > n_args && n_args > 0
 +       -- n_args > 0: do not eta-expand a naked variable unless we have to
++
 +    mark_arity   = idCbvMarkArity fn
 +    fn_arity     = idArity fn
 +    excess_arity = (max fn_arity mark_arity) - n_args
  {- Note [Eta expansion]
  ~~~~~~~~~~~~~~~~~~~~~~~
 -Eta expand to match the arity claimed by the binder Remember,
 -CorePrep must not change arity
 +Eta expand to match the arity claimed by the binder.
 +Remember, CorePrep must not change arity
  Eta expansion might not have happened already, because it is done by
  the simplifier only when there at least one lambda already.
 -NB1:we could refrain when the RHS is trivial (which can happen
 -    for exported things).  This would reduce the amount of code
 -    generated (a little) and make things a little worse for
 -    code compiled without -O.  The case in point is data constructor
 -    wrappers.
 +We do eta-expansion (via `cpeEtaExpand`) in three places:
++
 +* At let-bindings; in `cpePair`
++
 +* On function arguments: in `cpeArg`
 +  See Note [Eta expansion of arguments in CorePrep]
++
 +* At un-saturated function calls: in `maybeSaturate`
  NB2: we have to be careful that the result of etaExpand doesn't
     invalidate any of the assumptions that CorePrep is attempting
@@ -1677,12 +1616,37 @@ NB2: we have to be careful that the result of etaExpand doesn't
     an SCC note - we're now careful in etaExpand to make sure the
     SCC is pushed inside any new lambdas that are generated.
 -Note [Eta expansion of hasNoBinding things in CorePrep]
 -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 -maybeSaturate deals with eta expanding to saturate things that can't deal
 -with unsaturated applications (identified by 'hasNoBinding', currently
 -foreign calls, unboxed tuple/sum constructors, and representation-polymorphic
 -primitives such as 'coerce' and 'unsafeCoerce#').
 +Note [Eta expansion for let-bindings]
 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 +Given f = rhs, we eta-expand `rhs` to match f's arity.
++
 +We could refrain when the RHS is trivial (which can happen for exported things).
 +This would reduce the amount of code generated (a little) and make things a
 +little worse for code compiled without -O.  The case in point is data
 +constructor wrappers.
++
 +Note [Eta expansion of unsaturated calls]
 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 +Give a call (f a1..an), where `f` is a known function with arity greater than `n`,
 +there are three reasons we might want to eta-expand:
++
 +* Must eta-expand: if `f` is a `hasNoBinding` function, we must saturate
 +  it, because the function has no (curried) binding to call. Currently
 +  this includes:
 +     - foreign calls,
 +     - unboxed tuple/sum constructors
 +     - representation-polymorphic primitives such as 'coerce' and 'unsafeCoerce#'
 +     - primops (for now anyway; see comments in `hasNoBinding`)
++
 +* Must eta-expand: if `f` has a call-by-value calling convention, we /must/
 +  call it with evaluated arguments.  The back end deals with adding the
 +  necessary evaluation at the call site, but we must first ensure that it is
 +  saturated.
++
 +* May eta-expand: consider
 +     \x -> f x True
 +  where `f` has arity 3.   Then it's much better to eta-expand f so we have
 +     \xy -> f x True y
  Historical Note: Note that eta expansion in CorePrep used to be very fragile
  due to the "prediction" of CAFfyness that we used to make during tidying.  We
@@ -1694,7 +1658,7 @@ Note [Eta expansion and the CorePrep invariants]
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  It turns out to be much much easier to do eta expansion
  *after* the main CorePrep stuff.  But that places constraints
 -on the eta expander: given a CpeRhs, it must return a CpeRhs.
 +on the eta expander: given a CpeBody, it must return a CpeBody.
  For example here is what we do not want:
                  f = /\a -> g (h 3)      -- h has arity 2
@@ -1706,6 +1670,26 @@ and now we do NOT want eta expansion to give
  Instead GHC.Core.Opt.Arity.etaExpand gives
                  f = /\a -> \y -> let s = h 3 in g s y
 +Another example:
 +  f x = case x of
 +           A -> \y. e
 +           B -> hnb 3  -- where `hnb` has no binding
 +           C -> z
 +Then we may eta-expand `hnb` to get
 +  f x = case x of
 +           A -> \y. e
 +           B -> \y. hnb 3 y
 +           C -> z
 +Now we come to the binding of `f` itself, and eta-expand that, to give
 +  f x y = case x of
 +            A -> e
 +            B -> hnb 3 y
 +            C -> z y
 +Notice how important it is that the eta-expansion for `f` doesn't
 +generate any crap like
 +            B -> (\y. hnb 3 y) y
 +Fortunately, the eta-expander is careful not to do so.
++
  Note [Eta expansion of arguments in CorePrep]
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  Suppose `g = \x y. blah` and consider the expression `f (g x)`; we ANFise to
@@ -1798,7 +1782,7 @@ There is a nasty Wrinkle:
        #24471 is a good example, where Prep took 25% of compile time!
  -}
 -cpeEtaExpand :: Arity -> CpeRhs -> CpeRhs
 +cpeEtaExpand :: Arity -> CpeBody -> CpeBody
  cpeEtaExpand arity expr
    | arity == 0 = expr
    | otherwise  = etaExpand arity expr
@@ -2165,9 +2149,6 @@ isEmptyFloats (Floats _ b) = isNilOL b
  getFloats :: Floats -> OrdList FloatingBind
  getFloats = fs_binds
 -unitFloat :: FloatingBind -> Floats
 -unitFloat = snocFloat emptyFloats
+-
  floatInfo :: FloatingBind -> FloatInfo
  floatInfo (Float _ _ info)     = info
  floatInfo UnsafeEqualityCase{} = LazyContextFloatable -- See Note [Floating in CorePrep]
@@ -2255,7 +2236,7 @@ decideFloatInfo FIA{fia_levity=lev, fia_demand=dmd, fia_is_hnf=is_hnf,
    | Lifted   <- lev       = (LetBound, TopLvlFloatable)
        -- And these float freely but can't be speculated, hence LetBound
 -mkCaseFloat :: Id -> CpeRhs -> FloatingBind
 +mkCaseFloat :: Id -> CpeBody -> FloatingBind
  mkCaseFloat bndr scrut
    = -- pprTrace "mkCaseFloat" (ppr bndr <+> ppr (bound,info)
      --                             -- <+> ppr is_lifted <+> ppr is_strict
@@ -2273,7 +2254,7 @@ mkCaseFloat bndr scrut
            -- (ok-for-spec case bindings are unlikely anyway.)
+       }
 -mkNonRecFloat :: CorePrepEnv -> Levity -> Id -> CpeRhs -> (FloatingBind, Id)
 +mkNonRecFloat :: CorePrepEnv -> Levity -> Id -> CpeBody -> (FloatingBind, Id)
  mkNonRecFloat env lev bndr rhs
    = -- pprTrace "mkNonRecFloat" (ppr bndr <+> ppr (bound,info)
      --                             <+> if is_strict then text "strict" else if is_lifted then text "lazy" else text "unlifted"
@@ -2413,24 +2394,18 @@ instance Outputable FloatDecision where
    ppr FloatNone = text "none"
    ppr FloatAll  = text "all"
 -executeFloatDecision :: CorePrepEnv -> FloatDecision -> Floats -> CpeRhs -> UniqSM (Floats, CpeRhs)
 -executeFloatDecision env dec floats rhs
 +executeFloatDecision :: FloatDecision -> Floats -> CpeBody -> (Floats, CpeBody)
 +executeFloatDecision dec floats rhs
    = case dec of
 -      FloatAll                 -> return (floats, rhs)
 -      FloatNone
 -        | isEmptyFloats floats -> return (emptyFloats, rhs)
 -        | otherwise            -> do { (floats', body) <- rhsToBody env rhs
 -                                     ; return (emptyFloats, wrapBinds floats $
 -                                                            wrapBinds floats' body) }
 -            -- FloatNone case: `rhs` might have lambdas, and we can't
 -            -- put them inside a wrapBinds, which expects a `CpeBody`.
 +      FloatAll  -> (floats,      rhs)
 +      FloatNone -> (emptyFloats, wrapBinds floats rhs)
  wantFloatTop :: Floats -> FloatDecision
  wantFloatTop fs
    | fs_info fs `floatsAtLeastAsFarAs` TopLvlFloatable = FloatAll
    | otherwise                                         = FloatNone
 -wantFloatLocal :: RecFlag -> Demand -> Levity -> Floats -> CpeRhs -> FloatDecision
 +wantFloatLocal :: RecFlag -> Demand -> Levity -> Floats -> CpeBody -> FloatDecision
  -- See Note [wantFloatLocal]
  wantFloatLocal is_rec rhs_dmd rhs_lev floats rhs
    |  isEmptyFloats floats -- Well yeah...
@@ -2479,7 +2454,7 @@ zero free variables.)
  In general, the inliner is good at eliminating these let-bindings.  However,
  there is one case where these trivial updatable thunks can arise: when
  we are optimizing away 'lazy' (see Note [lazyId magic], and also
 -'cpeRhsE'.)  Then, we could have started with:
 +'cpeBodyF'.)  Then, we could have started with:
       let x :: ()
           x = lazy @() y
@@ -2783,8 +2758,7 @@ wrapTicks floats expr
  -- ---------------------------------------------------------------------------
  -- | Converts Bignum literals into their final CoreExpr
 -cpeBigNatLit
 -   :: CorePrepEnv -> Integer -> UniqSM (Floats, CpeRhs)
 +cpeBigNatLit :: CorePrepEnv -> Integer -> UniqSM (Floats, CpeBody)
  cpeBigNatLit env i = assert (i >= 0) $ do
    let
      platform = cp_platform (cpe_config env)

compiler/GHC/Driver/Main.hs

@@ -2434,8 +2434,8 @@ myCoreToStg :: Logger -> DynFlags -> [Var]
                    , CollectedCCs -- CAF cost centre info (declared and used)
                    , StgCgInfos )
  myCoreToStg logger dflags ic_inscope for_bytecode this_mod ml prepd_binds = do
 -    let (stg_binds, denv, cost_centre_info)
 -         = {-# SCC "Core2Stg" #-}
 +    (stg_binds, denv, cost_centre_info)
 +       <- {-# SCC "Core2Stg" #-}
             coreToStg (initCoreToStgOpts dflags) this_mod ml prepd_binds
      (stg_binds_with_fvs,stg_cg_info)

compiler/GHC/Stg/Lint.hs

@@ -105,7 +105,7 @@ import GHC.Core             ( AltCon(..) )
  import GHC.Core.Type
  import GHC.Core.Lint        ( lintMessage )
 -import GHC.Types.Basic      ( TopLevelFlag(..), isTopLevel, isMarkedCbv )
 +import GHC.Types.Basic      ( TopLevelFlag(..), isTopLevel )
  import GHC.Types.CostCentre ( isCurrentCCS )
  import GHC.Types.Id
  import GHC.Types.Var.Set
@@ -123,12 +123,9 @@ import GHC.Unit.Module            ( Module )
  import GHC.Data.Bag         ( Bag, emptyBag, isEmptyBag, snocBag, bagToList )
  import Control.Monad
 -import Data.Maybe
 -import GHC.Utils.Misc
  import GHC.Core.Multiplicity (scaledThing)
  import GHC.Settings (Platform)
  import GHC.Core.TyCon (primRepCompatible, primRepsCompatible)
 -import GHC.Utils.Panic.Plain (panic)
  lintStgTopBindings :: forall a . (OutputablePass a, BinderP a ~ Id)
                     => Platform
@@ -174,36 +171,37 @@ lintStgTopBindings platform logger diag_opts opts extra_vars this_mod unarised w
      lint_bind (StgTopStringLit v _) = return [v]
  lintStgConArg :: StgArg -> LintM ()
 -lintStgConArg arg = do
 -  unarised <- lf_unarised <$> getLintFlags
 -  when unarised $ case stgArgRep_maybe arg of
 -    -- Note [Post-unarisation invariants], invariant 4
 -    Just [_] -> pure ()
 -    badRep   -> addErrL $
 -      text "Non-unary constructor arg: " <> ppr arg $$
 -      text "Its PrimReps are: " <> ppr badRep
+-
 -  case arg of
 -    StgLitArg _ -> pure ()
 -    StgVarArg v -> lintStgVar v
 +lintStgConArg arg
 +  = do { lintStgArg arg
++
 +       ; unarised <- lf_unarised <$> getLintFlags
 +       ; when unarised $ case stgArgRep_maybe arg of
 +           -- Note [Post-unarisation invariants], invariant 4
 +           Just [_] -> pure ()
 +           badRep   -> addErrL $
 +             text "Non-unary constructor arg: " <> ppr arg $$
 +             text "Its PrimReps are: " <> ppr badRep }
  lintStgFunArg :: StgArg -> LintM ()
 -lintStgFunArg arg = do
 -  unarised <- lf_unarised <$> getLintFlags
 -  when unarised $ case stgArgRep_maybe arg of
 -    -- Note [Post-unarisation invariants], invariant 3
 -    Just []  -> pure ()
 -    Just [_] -> pure ()
 -    badRep   -> addErrL $
 -      text "Function arg is not unary or void: " <> ppr arg $$
 -      text "Its PrimReps are: " <> ppr badRep
+-
 -  case arg of
 -    StgLitArg _ -> pure ()
 -    StgVarArg v -> lintStgVar v
+-
 -lintStgVar :: Id -> LintM ()
 -lintStgVar id = checkInScope id
 +lintStgFunArg arg
 +  = do { lintStgArg arg
++
 +       ; unarised <- lf_unarised <$> getLintFlags
 +       ; when unarised $ case stgArgRep_maybe arg of
 +           -- Note [Post-unarisation invariants], invariant 3
 +           Just []  -> pure ()
 +           Just [_] -> pure ()
 +           badRep   -> addErrL $
 +             text "Function arg is not unary or void: " <> ppr arg $$
 +             text "Its PrimReps are: " <> ppr badRep }
++
 +lintStgArg :: StgArg -> LintM ()
 +lintStgArg (StgLitArg _) = pure ()
 +lintStgArg (StgVarArg v) = do { lintStgVarOcc v
 +                              ; lintAppCbvMarks v [] }
++
 +lintStgVarOcc :: Id -> LintM ()
 +lintStgVarOcc id = checkInScope id
  lintStgBinds
      :: (OutputablePass a, BinderP a ~ Id)
@@ -275,13 +273,11 @@ lintStgExpr :: (OutputablePass a, BinderP a ~ Id) => GenStgExpr a -> LintM ()
  lintStgExpr (StgLit _) = return ()
 -lintStgExpr e@(StgApp fun args) = do
 -  lintStgVar fun
 -  mapM_ lintStgFunArg args
 -  lintAppCbvMarks e
 -  lintStgAppReps fun args
+-
+-
 +lintStgExpr (StgApp fun args)
 +  = do { lintStgVarOcc fun
 +       ; mapM_ lintStgFunArg args
 +       ; lintAppCbvMarks fun args
 +       ; lintStgAppReps fun args }
  lintStgExpr app@(StgConApp con _n args _arg_tys) = do
      -- unboxed sums should vanish during unarise
@@ -413,22 +409,20 @@ lintStgAppReps fun args = do
    match_args actual_arg_reps fun_arg_tys_reps
 -lintAppCbvMarks :: OutputablePass pass
 -                => GenStgExpr pass -> LintM ()
 -lintAppCbvMarks e@(StgApp fun args) = do
 -  lf <- getLintFlags
 -  when (lf_unarised lf) $ do
 +lintAppCbvMarks :: Id -> [StgArg] -> LintM ()
 +lintAppCbvMarks fun args
 +  | idCbvMarkArity fun > length args
      -- A function which expects a unlifted argument as n'th argument
      -- always needs to be applied to n arguments.
      -- See Note [CBV Function Ids: overview].
 -    let marks = fromMaybe [] $ idCbvMarks_maybe fun
 -    when (length (dropWhileEndLE (not . isMarkedCbv) marks) > length args) $ do
 -      addErrL $ hang (text "Undersatured cbv marked ID in App" <+> ppr e ) 2 $
 -        (text "marks" <> ppr marks $$
 -        text "args" <> ppr args $$
 -        text "arity" <> ppr (idArity fun) $$
 -        text "join_arity" <> ppr (idJoinPointHood fun))
 -lintAppCbvMarks _ = panic "impossible - lintAppCbvMarks"
 +  = addErrL $ hang (text "Undersatured cbv marked ID in App" <+> ppr fun)
 +                 2 (vcat [ text "marks" <> ppr (idCbvMarks_maybe fun)
 +                         , text "args" <> ppr args
 +                         , text "arity" <> ppr (idArity fun)
 +                         , text "join_arity" <> ppr (idJoinPointHood fun) ])
++
 +  | otherwise
 +  = return ()
  {-
  ************************************************************************

compiler/GHC/Types/Id.hs

@@ -852,7 +852,7 @@ idCbvMarks_maybe id = case idDetails id of
    _                    -> Nothing
  -- Id must be called with at least this arity in order to allow arguments to
 --- be passed unlifted.
 +-- be passed unlifted.  Return 0 if there are no CBV marks.
  idCbvMarkArity :: Id -> Arity
  idCbvMarkArity fn = maybe 0 length (idCbvMarks_maybe fn)

compiler/GHC/Types/Id/Info.hs

@@ -210,6 +210,7 @@ data IdDetails
          -- Can also work as a WorkerLikeId if given `CbvMark`s.
          -- See Note [CBV Function Ids: overview]
          -- The [CbvMark] is always empty (and ignored) until after Tidy.
++
    | WorkerLikeId [CbvMark]
          -- ^ An 'Id' for a worker like function, which might expect some arguments to be
          -- passed both evaluated and tagged.
@@ -217,8 +218,10 @@ data IdDetails
          -- aren't used unapplied.
          -- See Note [CBV Function Ids: overview]
          -- See Note [EPT enforcement]
 -        -- The [CbvMark] is always empty (and ignored) until after Tidy for ids from the current
 -        -- module.
 +        -- Invariants:
 +        --   - the [CbvMark] is always empty (and ignored) until after Tidy
 +        --     for ids from the current module
 +        --   - If non-empty, at least is isMarkedCbbv; see (CBV2)
  data RecSelInfo
    = RSI { rsi_def   :: [ConLike]   -- Record selector defined for these
@@ -297,9 +300,7 @@ Here's how it all works:
    to identify strict arguments.  See Note [Call-by-value for worker args] for
    how a worker guarantees to be strict in strict datacon fields.
 -  TODO: We currently don't do this for arguments that are unboxed sums or tuples,
 -  because then we'd have to predict the result of unarisation. But it would be nice to
 -  do so. See `computeCbvInfo`.
 +  See (CBV1) and (CBV2).
  * During CorePrep calls to CBV Ids are eta expanded.
    See `GHC.CoreToStg.Prep.maybeSaturate`.
@@ -319,6 +320,16 @@ Here's how it all works:
  * Imported functions may be CBV, and then there is no point in eta-reducing
    them; we'll just have to eta-expand later; see GHC.Core.Opt.Arity.cantEtaReduceFun.
 +Wrinkles
++
 +(CBV1) We do not set the CBV-marks for a function that takes an unboxed sum or tuple,
 +  as an argument, because then we'd have to predict the result of unarisation.
 +  It would be nice to do so in future. See `computeCbvInfo`.
++
 +(CBV2) We do not set CBV-marks if none of them are `isMarkedCbv`.  Why not?
 +  Because if none are CBV then there is nothing special to do for this function;
 +  in particular, we don't need to saturate its calls.  See `computeCbvInfo`.
++
  *** SPJ really? Andreas? ****
  We only use this for workers and specialized versions of SpecConstr
  But we also check other functions during tidy and potentially turn some of them into

testsuite/tests/arityanal/should_compile/Arity01.stderr

@@ -5,19 +5,19 @@ Result size of Tidy Core = {terms: 71, types: 43, coercions: 0, joins: 0/0}
  -- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0}
  F1.f2 :: Integer
  [GblId, Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 10 10}]
 -F1.f2 = GHC.Num.Integer.IS 1#
 +F1.f2 = GHC.Internal.Bignum.Integer.IS 1#
  Rec {
  -- RHS size: {terms: 24, types: 6, coercions: 0, joins: 0/0}
  F1.f1_h1 [Occ=LoopBreaker] :: Integer -> Integer -> Integer -> Integer
  [GblId, Arity=3, Str=<1L><1L><SL>, Unf=OtherCon []]
  F1.f1_h1
 -  = \ (n :: Integer) (x :: Integer) (eta [OS=OneShot] :: Integer) ->
 +  = \ (n :: Integer) (x [OS=OneShot] :: Integer) (eta [OS=OneShot] :: Integer) ->
        case x of x1 { __DEFAULT ->
        case n of y1 { __DEFAULT ->
 -      case GHC.Num.Integer.integerLt# x1 y1 of {
 +      case GHC.Internal.Bignum.Integer.integerLt# x1 y1 of {
          __DEFAULT -> eta;
 -        1# -> F1.f1_h1 y1 (GHC.Num.Integer.integerAdd x1 F1.f2) (GHC.Num.Integer.integerAdd x1 eta)
 +        1# -> F1.f1_h1 y1 (GHC.Internal.Bignum.Integer.integerAdd x1 F1.f2) (GHC.Internal.Bignum.Integer.integerAdd x1 eta)
+       }
+       }
+       }
@@ -26,7 +26,7 @@ end Rec }
  -- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0}
  F1.f3 :: Integer
  [GblId, Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 10 10}]
 -F1.f3 = GHC.Num.Integer.IS 5#
 +F1.f3 = GHC.Internal.Bignum.Integer.IS 5#
  -- RHS size: {terms: 4, types: 0, coercions: 0, joins: 0/0}
  f1 :: Integer
@@ -36,27 +36,27 @@ f1 = F1.f1_h1 F1.f3 F1.f2 F1.f3
  -- RHS size: {terms: 14, types: 5, coercions: 0, joins: 0/0}
  g :: Integer -> Integer -> Integer -> Integer -> Integer -> Integer
  [GblId, Arity=5, Str=<1L><SL><SL><SL><SL>, Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=IF_ARGS [0 0 0 0 0] 120 0}]
 -g = \ (x1 :: Integer) (x2 :: Integer) (x3 :: Integer) (x4 :: Integer) (x5 :: Integer) -> GHC.Num.Integer.integerAdd (GHC.Num.Integer.integerAdd (GHC.Num.Integer.integerAdd (GHC.Num.Integer.integerAdd x1 x2) x3) x4) x5
 +g = \ (x1 :: Integer) (x2 :: Integer) (x3 :: Integer) (x4 :: Integer) (x5 :: Integer) -> GHC.Internal.Bignum.Integer.integerAdd (GHC.Internal.Bignum.Integer.integerAdd (GHC.Internal.Bignum.Integer.integerAdd (GHC.Internal.Bignum.Integer.integerAdd x1 x2) x3) x4) x5
  -- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0}
  F1.s1 :: Integer
  [GblId, Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 10 10}]
 -F1.s1 = GHC.Num.Integer.IS 3#
 +F1.s1 = GHC.Internal.Bignum.Integer.IS 3#
  -- RHS size: {terms: 8, types: 7, coercions: 0, joins: 0/0}
  s :: forall {t1} {t2}. Num t1 => (t1 -> t2) -> t2
 -[GblId, Arity=2, Str=<MP(A,A,A,A,A,A,1C(1,L))><1C(1,L)>, Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=IF_ARGS [30 60] 50 0}]
 +[GblId, Arity=2, Str=<MP(A,A,A,A,A,A,1C(1,L))><1C(1,L)>, Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=IF_ARGS [90 60] 50 0}]
  s = \ (@t) (@t1) ($dNum :: Num t) (f :: t -> t1) -> f (fromInteger @t $dNum F1.s1)
  -- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0}
  F1.h1 :: Integer
  [GblId, Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 10 10}]
 -F1.h1 = GHC.Num.Integer.IS 24#
 +F1.h1 = GHC.Internal.Bignum.Integer.IS 24#
  -- RHS size: {terms: 4, types: 1, coercions: 0, joins: 0/0}
  h :: Integer -> Integer
  [GblId, Arity=1, Str=<SL>, Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=IF_ARGS [0] 30 0}]
 -h = \ (x5 :: Integer) -> GHC.Num.Integer.integerAdd F1.h1 x5
 +h = \ (x5 :: Integer) -> GHC.Internal.Bignum.Integer.integerAdd F1.h1 x5

testsuite/tests/arityanal/should_compile/Arity05.stderr

@@ -5,27 +5,27 @@ Result size of Tidy Core = {terms: 42, types: 44, coercions: 0, joins: 0/0}
  -- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0}
  F5.f5g1 :: Integer
  [GblId, Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 10 10}]
 -F5.f5g1 = GHC.Num.Integer.IS 1#
 +F5.f5g1 = GHC.Internal.Bignum.Integer.IS 1#
  -- RHS size: {terms: 12, types: 9, coercions: 0, joins: 0/0}
  f5g :: forall {a} {t}. Num a => (t -> a) -> t -> a
 -[GblId, Arity=3, Str=<SP(1C(1,C(1,L)),A,A,A,A,A,MC(1,L))><MC(1,L)><L>, Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=IF_ARGS [60 60 0] 90 0}]
 +[GblId, Arity=3, Str=<SP(1C(1,C(1,L)),A,A,A,A,A,MC(1,L))><MC(1,L)><L>, Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=IF_ARGS [180 60 0] 90 0}]
  f5g = \ (@a) (@t) ($dNum :: Num a) (h :: t -> a) (z :: t) -> + @a $dNum (h z) (fromInteger @a $dNum F5.f5g1)
  -- RHS size: {terms: 17, types: 12, coercions: 0, joins: 0/0}
  f5h :: forall {a} {t}. Num a => (t -> a) -> t -> (t -> a) -> a
 -[GblId, Arity=4, Str=<SP(SC(S,C(1,L)),A,A,A,A,A,MC(1,L))><MC(1,L)><L><MC(1,L)>, Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=IF_ARGS [90 60 0 60] 150 0}]
 +[GblId, Arity=4, Str=<SP(SC(S,C(1,L)),A,A,A,A,A,MC(1,L))><MC(1,L)><L><MC(1,L)>, Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=IF_ARGS [270 60 0 60] 150 0}]
  f5h = \ (@a) (@t) ($dNum :: Num a) (f :: t -> a) (x :: t) (g :: t -> a) -> + @a $dNum (f x) (+ @a $dNum (g x) (fromInteger @a $dNum F5.f5g1))
  -- RHS size: {terms: 4, types: 1, coercions: 0, joins: 0/0}
  f5y :: Integer -> Integer
  [GblId, Arity=1, Str=<1L>, Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=IF_ARGS [0] 30 0}]
 -f5y = \ (y :: Integer) -> GHC.Num.Integer.integerAdd y F5.f5g1
 +f5y = \ (y :: Integer) -> GHC.Internal.Bignum.Integer.integerAdd y F5.f5g1
  -- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0}
  f5 :: Integer
  [GblId, Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 10 10}]
 -f5 = GHC.Num.Integer.IS 3#
 +f5 = GHC.Internal.Bignum.Integer.IS 3#

testsuite/tests/arityanal/should_compile/Arity08.stderr

@@ -4,7 +4,7 @@ Result size of Tidy Core = {terms: 24, types: 18, coercions: 0, joins: 0/0}
  -- RHS size: {terms: 20, types: 10, coercions: 0, joins: 0/0}
  f8f :: forall {p}. Num p => Bool -> p -> p -> p
 -[GblId, Arity=4, Str=<LP(SC(S,C(1,L)),A,MC(1,C(1,L)),A,A,A,A)><1L><L><L>, Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=IF_ARGS [90 30 0 0] 140 0}]
 +[GblId, Arity=4, Str=<LP(SC(S,C(1,L)),A,MC(1,C(1,L)),A,A,A,A)><1L><L><L>, Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=IF_ARGS [270 30 0 0] 140 0}]
  f8f
    = \ (@p) ($dNum :: Num p) (b :: Bool) (x :: p) (y :: p) ->
        case b of {
@@ -15,7 +15,7 @@ f8f
  -- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0}
  f8 :: Integer
  [GblId, Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 10 10}]
 -f8 = GHC.Num.Integer.IS 2#
 +f8 = GHC.Internal.Bignum.Integer.IS 2#

testsuite/tests/arityanal/should_compile/Arity11.stderr

@@ -5,57 +5,23 @@ Result size of Tidy Core = {terms: 136, types: 75, coercions: 0, joins: 2/7}
  -- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0}
  F11.fib3 :: Integer
  [GblId, Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 10 10}]
 -F11.fib3 = GHC.Num.Integer.IS 1#
 +F11.fib3 = GHC.Internal.Bignum.Integer.IS 1#
  -- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0}
  F11.fib2 :: Integer
  [GblId, Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 10 10}]
 -F11.fib2 = GHC.Num.Integer.IS 2#
+-
 -Rec {
 --- RHS size: {terms: 38, types: 13, coercions: 0, joins: 2/2}
 -F11.f11_fib [Occ=LoopBreaker] :: Integer -> Integer
 -[GblId, Arity=1, Str=<SL>, Unf=OtherCon []]
 -F11.f11_fib
 -  = \ (ds :: Integer) ->
 -      join {
 -        $j [Dmd=ML] :: Integer
 -        [LclId[JoinId(0)(Nothing)]]
 -        $j
 -          = join {
 -              $j1 [Dmd=ML] :: Integer
 -              [LclId[JoinId(0)(Nothing)]]
 -              $j1 = GHC.Num.Integer.integerAdd (F11.f11_fib (GHC.Num.Integer.integerSub ds F11.fib3)) (F11.f11_fib (GHC.Num.Integer.integerSub ds F11.fib2)) } in
 -            case ds of {
 -              GHC.Num.Integer.IS x1 ->
 -                case x1 of {
 -                  __DEFAULT -> jump $j1;
 -                  1# -> F11.fib3
 -                };
 -              GHC.Num.Integer.IP x1 -> jump $j1;
 -              GHC.Num.Integer.IN x1 -> jump $j1
 -            } } in
 -      case ds of {
 -        GHC.Num.Integer.IS x1 ->
 -          case x1 of {
 -            __DEFAULT -> jump $j;
 -            0# -> F11.fib3
 -          };
 -        GHC.Num.Integer.IP x1 -> jump $j;
 -        GHC.Num.Integer.IN x1 -> jump $j
 -      }
 -end Rec }
 +F11.fib2 = GHC.Internal.Bignum.Integer.IS 2#
  -- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0}
  F11.fib1 :: Integer
  [GblId, Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 10 10}]
 -F11.fib1 = GHC.Num.Integer.IS 0#
 +F11.fib1 = GHC.Internal.Bignum.Integer.IS 0#
  -- RHS size: {terms: 54, types: 27, coercions: 0, joins: 0/5}
 -fib :: forall {t} {a}. (Eq t, Num t, Num a) => t -> a
 -[GblId, Arity=4, Str=<SP(SC(S,C(1,L)),A)><LP(A,LC(L,C(1,L)),A,A,A,A,L)><LP(LC(S,C(1,L)),A,A,A,A,A,MC(1,L))><L>, Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=IF_ARGS [60 150 60 0] 480 0}]
 +fib :: forall {t1} {t2}. (Eq t1, Num t1, Num t2) => t1 -> t2
 +[GblId, Arity=4, Str=<SP(SC(S,C(1,L)),A)><LP(A,LC(L,C(1,L)),A,A,A,A,L)><LP(LC(S,C(1,L)),A,A,A,A,A,MC(1,L))><L>, Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=IF_ARGS [180 450 180 0] 480 0}]
  fib
 -  = \ (@t) (@a) ($dEq :: Eq t) ($dNum :: Num t) ($dNum1 :: Num a) (eta :: t) ->
 +  = \ (@t) (@t1) ($dEq :: Eq t) ($dNum :: Num t) ($dNum1 :: Num t1) (eta :: t) ->
        let {
          lvl :: t
          [LclId]
@@ -65,32 +31,66 @@ fib
          [LclId]
          lvl1 = fromInteger @t $dNum F11.fib2 } in
        let {
 -        lvl2 :: a
 +        lvl2 :: t1
          [LclId]
 -        lvl2 = fromInteger @a $dNum1 F11.fib3 } in
 +        lvl2 = fromInteger @t1 $dNum1 F11.fib3 } in
        let {
          lvl3 :: t
          [LclId]
          lvl3 = fromInteger @t $dNum F11.fib1 } in
        letrec {
 -        fib4 [Occ=LoopBreaker, Dmd=SC(S,L)] :: t -> a
 +        fib4 [Occ=LoopBreaker, Dmd=SC(S,L)] :: t -> t1
          [LclId, Arity=1, Str=<L>, Unf=OtherCon []]
          fib4
            = \ (ds :: t) ->
                case == @t $dEq ds lvl3 of {
                  False ->
                    case == @t $dEq ds lvl of {
 -                    False -> + @a $dNum1 (fib4 (- @t $dNum ds lvl)) (fib4 (- @t $dNum ds lvl1));
 +                    False -> + @t1 $dNum1 (fib4 (- @t $dNum ds lvl)) (fib4 (- @t $dNum ds lvl1));
                      True -> lvl2
                    };
                  True -> lvl2
                }; } in
        fib4 eta
 +Rec {
 +-- RHS size: {terms: 38, types: 13, coercions: 0, joins: 2/2}
 +F11.f11_fib [Occ=LoopBreaker] :: Integer -> Integer
 +[GblId, Arity=1, Str=<SL>, Unf=OtherCon []]
 +F11.f11_fib
 +  = \ (ds :: Integer) ->
 +      join {
 +        $j [Dmd=ML] :: Integer
 +        [LclId[JoinId(0)(Nothing)]]
 +        $j
 +          = join {
 +              $j1 [Dmd=ML] :: Integer
 +              [LclId[JoinId(0)(Nothing)]]
 +              $j1 = GHC.Internal.Bignum.Integer.integerAdd (F11.f11_fib (GHC.Internal.Bignum.Integer.integerSub ds F11.fib3)) (F11.f11_fib (GHC.Internal.Bignum.Integer.integerSub ds F11.fib2)) } in
 +            case ds of {
 +              GHC.Internal.Bignum.Integer.IS x ->
 +                case x of {
 +                  __DEFAULT -> jump $j1;
 +                  1# -> F11.fib3
 +                };
 +              GHC.Internal.Bignum.Integer.IP x -> jump $j1;
 +              GHC.Internal.Bignum.Integer.IN x -> jump $j1
 +            } } in
 +      case ds of {
 +        GHC.Internal.Bignum.Integer.IS x ->
 +          case x of {
 +            __DEFAULT -> jump $j;
 +            0# -> F11.fib3
 +          };
 +        GHC.Internal.Bignum.Integer.IP x -> jump $j;
 +        GHC.Internal.Bignum.Integer.IN x -> jump $j
 +      }
 +end Rec }
++
  -- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0}
  F11.f3 :: Integer
  [GblId, Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 10 10}]
 -F11.f3 = GHC.Num.Integer.IS 1000#
 +F11.f3 = GHC.Internal.Bignum.Integer.IS 1000#
  -- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0}
  F11.f11_x :: Integer
@@ -100,7 +100,7 @@ F11.f11_x = F11.f11_fib F11.f3
  -- RHS size: {terms: 4, types: 1, coercions: 0, joins: 0/0}
  F11.f11f1 :: Integer -> Integer
  [GblId, Arity=1, Str=<SL>, Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=IF_ARGS [0] 30 0}]
 -F11.f11f1 = \ (y :: Integer) -> GHC.Num.Integer.integerAdd F11.f11_x y
 +F11.f11f1 = \ (y :: Integer) -> GHC.Internal.Bignum.Integer.integerAdd F11.f11_x y
  -- RHS size: {terms: 3, types: 2, coercions: 0, joins: 0/0}
  f11f :: forall {p}. p -> Integer -> Integer
@@ -110,22 +110,22 @@ f11f = \ (@p) _ [Occ=Dead] -> F11.f11f1
  -- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0}
  F11.f5 :: Integer
  [GblId, Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 10 10}]
 -F11.f5 = GHC.Num.Integer.IS 6#
 +F11.f5 = GHC.Internal.Bignum.Integer.IS 6#
  -- RHS size: {terms: 3, types: 0, coercions: 0, joins: 0/0}
  F11.f4 :: Integer
  [GblId, Unf=Unf{Src=<vanilla>, TopLvl=True, Value=False, ConLike=False, WorkFree=False, Expandable=False, Guidance=IF_ARGS [] 30 0}]
 -F11.f4 = GHC.Num.Integer.integerAdd F11.f11_x F11.f5
 +F11.f4 = GHC.Internal.Bignum.Integer.integerAdd F11.f11_x F11.f5
  -- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0}
  F11.f2 :: Integer
  [GblId, Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 10 10}]
 -F11.f2 = GHC.Num.Integer.IS 8#
 +F11.f2 = GHC.Internal.Bignum.Integer.IS 8#
  -- RHS size: {terms: 3, types: 0, coercions: 0, joins: 0/0}
  F11.f1 :: Integer
  [GblId, Unf=Unf{Src=<vanilla>, TopLvl=True, Value=False, ConLike=False, WorkFree=False, Expandable=False, Guidance=IF_ARGS [] 30 0}]
 -F11.f1 = GHC.Num.Integer.integerAdd F11.f11_x F11.f2
 +F11.f1 = GHC.Internal.Bignum.Integer.integerAdd F11.f11_x F11.f2
  -- RHS size: {terms: 3, types: 2, coercions: 0, joins: 0/0}
  f11 :: (Integer, Integer)
@@ -133,7 +133,4 @@ f11 :: (Integer, Integer)
  f11 = (F11.f4, F11.f1)
 ------- Local rules for imported ids --------
 -"SPEC fib @Integer @Integer" forall ($dEq :: Eq Integer) ($dNum :: Num Integer) ($dNum1 :: Num Integer). fib @Integer @Integer $dEq $dNum $dNum1 = F11.f11_fib
+-

testsuite/tests/arityanal/should_compile/Arity14.stderr

@@ -3,18 +3,18 @@
  Result size of Tidy Core = {terms: 44, types: 38, coercions: 0, joins: 0/3}
  -- RHS size: {terms: 3, types: 2, coercions: 0, joins: 0/0}
 -F14.f1 :: forall {t}. t -> t
 +F14.f1 :: forall t. t -> t
  [GblId, Arity=1, Str=<1L>, Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=ALWAYS_IF(arity=1,unsat_ok=True,boring_ok=True)}]
  F14.f1 = \ (@t) (y :: t) -> y
  -- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0}
  F14.f2 :: Integer
  [GblId, Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 10 10}]
 -F14.f2 = GHC.Num.Integer.IS 1#
 +F14.f2 = GHC.Internal.Bignum.Integer.IS 1#
  -- RHS size: {terms: 36, types: 23, coercions: 0, joins: 0/3}
  f14 :: forall {t}. (Ord t, Num t) => t -> t -> t -> t
 -[GblId, Arity=4, Str=<SP(A,A,SC(S,C(1,L)),A,A,A,A,A)><LP(LC(L,C(1,L)),A,A,A,A,A,MC(1,L))><L><L>, Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=IF_ARGS [30 90 0 0] 310 0}]
 +[GblId, Arity=4, Str=<SP(A,A,SC(S,C(1,L)),A,A,A,A,A)><LP(LC(L,C(1,L)),A,A,A,A,A,MC(1,L))><L><L>, Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=IF_ARGS [90 270 0 0] 310 0}]
  f14
    = \ (@t) ($dOrd :: Ord t) ($dNum :: Num t) (eta :: t) (eta1 :: t) ->
        let {
@@ -25,7 +25,7 @@ f14
          f3 [Occ=LoopBreaker, Dmd=SC(S,C(1,L))] :: t -> t -> t -> t
          [LclId, Arity=2, Str=<L><L>, Unf=OtherCon []]
          f3
 -          = \ (n :: t) (x :: t) ->
 +          = \ (n :: t) (x [OS=OneShot] :: t) ->
                case < @t $dOrd x n of {
                  False -> F14.f1 @t;
                  True ->

testsuite/tests/perf/compiler/T13960.hs

 +{-# LANGUAGE OverloadedStrings #-}
++
 +-- GHC used to run out of simplifier ticks due to inlining the internals of
 +-- `toStrict . toLazyByteString`.
 +module T13960 (breaks) where
++
 +import Data.ByteString (ByteString)
 +import Data.ByteString.Builder (Builder, stringUtf8, toLazyByteString)
 +import Data.ByteString.Lazy (toStrict)
 +import Data.String (IsString(..))
++
 +newtype Query = Query ByteString
++
 +toByteString :: Builder -> ByteString
 +toByteString x = toStrict (toLazyByteString x)
++
 +instance IsString Query where
 +  fromString = Query . toByteString . stringUtf8
++
 +breaks :: [(Query, Query)]
 +breaks =
 +  [ ("query001a", "query001b")
 +  , ("query002a", "query002b")
 +  , ("query003a", "query003b")
 +  , ("query004a", "query004b")
 +  , ("query005a", "query005b")
 +  , ("query006a", "query006b")
 +  , ("query007a", "query007b")
 +  , ("query008a", "query008b")
 +  , ("query009a", "query009b")
 +  , ("query010a", "query010b")
 +  , ("query011a", "query011b")
 +  , ("query012a", "query012b")
 +  , ("query013a", "query013b")
 +  , ("query014a", "query014b")
 +  , ("query015a", "query015b")
 +  , ("query016a", "query016b")
 +  , ("query017a", "query017b")
 +  , ("query018a", "query018b")
 +  , ("query019a", "query019b")
 +  , ("query020a", "query020b")
 +  , ("query021a", "query021b")
 +  , ("query022a", "query022b")
 +  , ("query023a", "query023b")
 +  , ("query024a", "query024b")
 +  , ("query025a", "query025b")
 +  , ("query026a", "query026b")
 +  , ("query027a", "query027b")
 +  , ("query028a", "query028b")
 +  , ("query029a", "query029b")
 +  , ("query030a", "query030b")
 +  , ("query031a", "query031b")
 +  , ("query032a", "query032b")
 +  , ("query033a", "query033b")
 +  , ("query034a", "query034b")
 +  , ("query035a", "query035b")
 +  , ("query036a", "query036b")
 +  , ("query037a", "query037b")
 +  , ("query038a", "query038b")
 +  , ("query039a", "query039b")
 +  , ("query040a", "query040b")
 +  , ("query041a", "query041b")
 +  , ("query042a", "query042b")
 +  , ("query043a", "query043b")
 +  , ("query044a", "query044b")
 +  , ("query045a", "query045b")
 +  , ("query046a", "query046b")
 +  , ("query047a", "query047b")
 +  , ("query048a", "query048b")
 +  , ("query049a", "query049b")
 +  , ("query050a", "query050b")
 +  ]

testsuite/tests/perf/compiler/all.T

@@ -686,6 +686,12 @@ test ('T13820',
        ],
        compile,
        ['-v0'])
 +test ('T13960',
 +      [ collect_compiler_stats('peak_megabytes_allocated', 20),
 +        collect_compiler_stats('bytes allocated', 2),
 +      ],
 +      compile,
 +      ['-O'])
  test ('T14766',
        [ collect_compiler_stats('bytes allocated',2),
          pre_cmd('python3 genT14766.py > T14766.hs'),

testsuite/tests/simplCore/should_compile/T15205.stderr

@@ -10,7 +10,7 @@ f :: forall a b. C a b => a -> b
   Str=<1P(A,1C(1,C(1,L)))><L>,
   Unf=Unf{Src=<vanilla>, TopLvl=True,
           Value=True, ConLike=True, WorkFree=True, Expandable=True,
 -         Guidance=IF_ARGS [30 0] 40 0}]
 +         Guidance=IF_ARGS [90 0] 40 0}]
  f = \ (@a) (@b) ($dC :: C a b) (x :: a) -> op @a @b $dC x x

testsuite/tests/wasm/should_run/control-flow/LoadCmmGroup.hs

@@ -91,12 +91,17 @@ stgify :: ModSummary -> ModGuts -> Ghc [StgTopBinding]
  stgify summary guts = do
      hsc_env <- getSession
      let dflags = hsc_dflags hsc_env
 -    prepd_binds <- liftIO $ do
 +    liftIO $ do
        cp_cfg <- initCorePrepConfig hsc_env
 -      corePrepPgm (hsc_logger hsc_env) cp_cfg (initCorePrepPgmConfig dflags (interactiveInScope $ hsc_IC hsc_env)) this_mod core_binds
 -    return $ fstOf3 $ coreToStg (initCoreToStgOpts dflags) (ms_mod summary) (ms_location summary) prepd_binds
 -  where this_mod = mg_module guts
 -        core_binds = mg_binds guts
 +      prepd_binds <- corePrepPgm (hsc_logger hsc_env) cp_cfg
 +                       (initCorePrepPgmConfig dflags (interactiveInScope $ hsc_IC hsc_env))
 +                       this_mod core_binds
 +      (binds, _, _) <- coreToStg (initCoreToStgOpts dflags) (ms_mod summary)
 +                                 (ms_location summary) prepd_binds
 +      return binds
 +  where
 +    this_mod = mg_module guts
 +    core_binds = mg_binds guts
  slurpCmm :: HscEnv -> FilePath -> IO (CmmGroup)
  slurpCmm hsc_env filename = runHsc hsc_env $ do