GitLab

Marge Bot pushed to branch wip/marge_bot_batch_merge_job at Glasgow Haskell Compiler / GHC

Commits:

352d5462

by Marc Scholten at 2025-11-22T10:33:03-05:00

Fix haddock test runner to handle UTF-8 output

xhtml 3000.4.0.0 now produces UTF-8 output instead of escaping non-ASCII characters.

When using --test-accept it previously wrote files in the wrong encoding
because they have not been decoded properly when reading the files.

48a3ed57

by Simon Peyton Jones at 2025-11-25T15:33:54+00:00

Add a fast-path for args=[] to occAnalApp

In the common case of having not arguments, occAnalApp
was doing redundant work.

951e5ed9

by Simon Peyton Jones at 2025-11-25T15:33:54+00:00

Fix a performance hole in the occurrence analyser

As #26425 showed, the clever stuff in

   Note [Occurrence analysis for join points]

does a lot of duplication of usage details.  This patch
improved matters with a little fancy footwork.  It is
described in the new (W4) of the same Note.

Compile-time allocations go down slightly. Here are the changes
of +/- 0.5% or more:

    T13253(normal)    329,369,244    326,395,544  -0.9%
T13253-spj(normal)     66,410,496     66,095,864  -0.5%
    T15630(normal)    129,797,200    128,663,136  -0.9%
   T15630a(normal)    129,212,408    128,027,560  -0.9%
    T16577(normal)  6,756,706,896  6,723,028,512  -0.5%
    T18282(normal)    128,462,070    125,808,584  -2.1% GOOD
   T18698a(normal)    208,418,305    202,037,336  -3.1% GOOD
    T18730(optasm)    136,981,756    136,208,136  -0.6%
    T18923(normal)     58,103,088     57,745,840  -0.6%
    T19695(normal)  1,386,306,272  1,365,609,416  -1.5%
    T26425(normal)  3,344,402,957  2,457,811,664 -26.5% GOOD
     T6048(optasm)     79,763,816     79,212,760  -0.7%
     T9020(optasm)    225,278,408    223,682,440  -0.7%
     T9961(normal)    303,810,717    300,729,168  -1.0% GOOD

         geo. mean                                -0.5%
         minimum                                 -26.5%
         maximum                                  +0.4%

Metric Decrease:
    T18282
    T18698a
    T26425
    T9961

f1959dfc

by Simon Peyton Jones at 2025-11-26T11:58:07+00:00

Remove a quadratic-cost assertion check in mkCoreApp

See the new Note [Assertion checking in mkCoreApp]

8cd2d857

by Simon Hengel at 2025-11-27T06:29:33-05:00

Fix typo in docs/users_guide/exts/type_families.rst

03618af8
by Simon Hengel at 2025-11-27T06:29:34-05:00
```
Fix broken RankNTypes example in user's guide
```

0d1dcf6c

by Simon Peyton Jones at 2025-11-27T06:29:34-05:00

Switch off specialisation in ExactPrint

In !15057 (where we re-introduced -fpolymoprhic-specialisation) we found
that ExactPrint's compile time blew up by a factor of 5.  It turned out
to be caused by bazillions of specialisations of `markAnnotated`.

Since ExactPrint isn't perf-critical, it does not seem worth taking
the performance hit, so this patch switches off specialisation in
this one module.

6ebe13e9

by Simon Peyton Jones at 2025-11-27T06:29:34-05:00

Switch -fpolymorphic-specialisation on by default

This patch addresses #23559.

Now that !10479 has landed and #26329 is fixed, we can switch on
polymorphic specialisation by default, addressing a bunch of other
tickets listed in #23559.

Metric changes:
  * CoOpt_Singleton: +4% compiler allocations: we just get more
    specialisations

  * info_table_map_perf: -20% decrease in compiler allocations.
    This is caused by using -fno-specialise in ExactPrint.hs
    Without that change we get a 4x blow-up in compile time;
    see !15058 for details

Metric Decrease:
    info_table_map_perf
Metric Increase:
    CoOpt_Singletons

41d5abd8

by Matthew Pickering at 2025-11-27T06:29:35-05:00

rts: Fix a deadlock with eventlog flush interval and RTS shutdown

The ghc_ticker thread attempts to flush at the eventlog tick interval, this requires
waiting to take all capabilities.

At the same time, the main thread is shutting down, the schedule is
stopped and then we wait for the ticker thread to finish.

Therefore we are deadlocked.

The solution is to use `newBoundTask/exitMyTask`, so that flushing can
cooperate with the scheduler shutdown.

Fixes #26573

38d7e185

by sheaf at 2025-11-27T06:29:50-05:00

SimpleOpt: don't subst in pushCoercionIntoLambda

It was noticed in #26589 that the change in 15b311be was incorrect:
the simple optimiser carries two different substitution-like pieces of
information: 'soe_subst' (from InVar to OutExpr) and 'soe_inl'
(from InId to InExpr). It is thus incorrect to have 'pushCoercionIntoLambda'
apply the substitution from 'soe_subst' while discarding 'soe_inl'
entirely, which is what was done in 15b311be.

Instead, we change back pushCoercionIntoLambda to take an InScopeSet,
and optimise the lambda before calling 'pushCoercionIntoLambda' to avoid
mixing InExpr with OutExpr, or mixing two InExpr with different
environments. We can then call 'soeZapSubst' without problems.

Fixes #26588 #26589

26 changed files:

compiler/GHC/Core/Make.hs
compiler/GHC/Core/Opt/Arity.hs
compiler/GHC/Core/Opt/OccurAnal.hs
compiler/GHC/Core/SimpleOpt.hs
compiler/GHC/Driver/DynFlags.hs
compiler/GHC/Driver/Flags.hs
compiler/GHC/HsToCore/Binds.hs
compiler/GHC/HsToCore/Expr.hs
compiler/GHC/HsToCore/Match.hs
compiler/GHC/Tc/Gen/App.hs
compiler/GHC/Types/Basic.hs
compiler/GHC/Types/Unique/FM.hs
compiler/GHC/Types/Unique/Set.hs
compiler/GHC/Types/Var/Env.hs
docs/users_guide/exts/rank_polymorphism.rst
docs/users_guide/exts/type_families.rst
docs/users_guide/using-optimisation.rst
rts/eventlog/EventLog.c
testsuite/tests/rts/all.T
+ testsuite/tests/simplCore/should_compile/T26588.hs
+ testsuite/tests/simplCore/should_compile/T26589.hs
testsuite/tests/simplCore/should_compile/T8331.stderr
testsuite/tests/simplCore/should_compile/all.T
testsuite/tests/unboxedsums/UbxSumUnpackedSize.hs
utils/check-exact/ExactPrint.hs
utils/haddock/haddock-test/src/Test/Haddock.hs

Changes:

compiler/GHC/Core/Make.hs

@@ -151,37 +151,28 @@ mkCoreConWrapApps con args = mkCoreApps (Var (dataConWrapId con)) args
  -- | Construct an expression which represents the application of a number of
  -- expressions to another. The leftmost expression in the list is applied first
 -mkCoreApps :: CoreExpr -- ^ function
 +-- See Note [Assertion checking in mkCoreApp]
 +mkCoreApps :: CoreExpr   -- ^ function
             -> [CoreExpr] -- ^ arguments
             -> CoreExpr
 -mkCoreApps fun args
 -  = fst $
 -    foldl' (mkCoreAppTyped doc_string) (fun, fun_ty) args
 -  where
 -    doc_string = ppr fun_ty $$ ppr fun $$ ppr args
 -    fun_ty = exprType fun
 +mkCoreApps fun args = foldl' mkCoreApp fun args
  -- | Construct an expression which represents the application of one expression
  -- to the other
 -mkCoreApp :: SDoc
 -          -> CoreExpr -- ^ function
 +-- See Note [Assertion checking in mkCoreApp]
 +mkCoreApp :: CoreExpr -- ^ function
            -> CoreExpr -- ^ argument
            -> CoreExpr
 -mkCoreApp s fun arg
 -  = fst $ mkCoreAppTyped s (fun, exprType fun) arg
+-
 --- | Construct an expression which represents the application of one expression
 --- paired with its type to an argument. The result is paired with its type. This
 --- function is not exported and used in the definition of 'mkCoreApp' and
 --- 'mkCoreApps'.
 -mkCoreAppTyped :: SDoc -> (CoreExpr, Type) -> CoreExpr -> (CoreExpr, Type)
 -mkCoreAppTyped _ (fun, fun_ty) (Type ty)
 -  = (App fun (Type ty), piResultTy fun_ty ty)
 -mkCoreAppTyped _ (fun, fun_ty) (Coercion co)
 -  = (App fun (Coercion co), funResultTy fun_ty)
 -mkCoreAppTyped d (fun, fun_ty) arg
 -  = assertPpr (isFunTy fun_ty) (ppr fun $$ ppr arg $$ d)
 -    (App fun arg, funResultTy fun_ty)
 +mkCoreApp fun arg = App fun arg
++
 +{- Note [Assertion checking in mkCoreApp]
 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 +At one time we had an assertion to check that the function and argument type match up,
 +but that turned out to take 90% of all compile time (!) when compiling test
 +`unboxedsums/UbxSumUnpackedSize.hs`. The reason was an unboxed sum constructor with
 +hundreds of foralls.   It's most straightforward just to remove the assert, and
 +rely on Lint to discover any mis-constructed terms.
 +-}
  {- *********************************************************************
  *                                                                      *

compiler/GHC/Core/Opt/Arity.hs

@@ -2993,12 +2993,12 @@ pushCoValArg co
      Pair tyL tyR = coercionKind co
  pushCoercionIntoLambda
 -    :: HasDebugCallStack => Subst -> InVar -> InExpr -> OutCoercionR -> Maybe (OutVar, OutExpr)
 +    :: HasDebugCallStack => InScopeSet -> Var -> CoreExpr -> CoercionR -> Maybe (Var, CoreExpr)
  -- This implements the Push rule from the paper on coercions
  --    (\x. e) |> co
  -- ===>
  --    (\x'. e |> co')
 -pushCoercionIntoLambda subst x e co
 +pushCoercionIntoLambda in_scope x e co
      | assert (not (isTyVar x) && not (isCoVar x)) True
      , Pair s1s2 t1t2 <- coercionKind co
      , Just {}              <- splitFunTy_maybe s1s2
@@ -3011,9 +3011,9 @@ pushCoercionIntoLambda subst x e co
            -- Should we optimize the coercions here?
            -- Otherwise they might not match too well
            x' = x `setIdType` t1 `setIdMult` w1
 -          in_scope' = substInScopeSet subst `extendInScopeSet` x'
 +          in_scope' = in_scope `extendInScopeSet` x'
            subst' =
 -            extendIdSubst (setInScope subst in_scope')
 +            extendIdSubst (setInScope emptySubst in_scope')
+               x
                (mkCast (Var x') (mkSymCo co1))
              -- We substitute x' for x, except we need to preserve types.

compiler/GHC/Core/Opt/OccurAnal.hs

@@ -66,7 +66,6 @@ import GHC.Builtin.Names( runRWKey )
  import GHC.Unit.Module( Module )
  import Data.List (mapAccumL)
 -import Data.List.NonEmpty (NonEmpty (..))
  {-
  ************************************************************************
@@ -660,18 +659,35 @@ through A, so it should have ManyOcc.  Bear this case in mind!
  * In occ_env, the new (occ_join_points :: IdEnv OccInfoEnv) maps
    each in-scope non-recursive join point, such as `j` above, to
    a "zeroed form" of its RHS's usage details. The "zeroed form"
 +    * has only occ_nested_lets in its domain  (see (W4) below)
      * deletes ManyOccs
      * maps a OneOcc to OneOcc{ occ_n_br = 0 }
 -  In our example, occ_join_points will be extended with
 +  In our example, assuming `v` is locally-let-bound, occ_join_points will
 +  be extended with
        [j :-> [v :-> OneOcc{occ_n_br=0}]]
 -  See addJoinPoint.
 +  See `addJoinPoint` and (W4) below.
  * At an occurrence of a join point, we do everything as normal, but add in the
    UsageDetails from the occ_join_points.  See mkOneOcc.
 -* Crucially, at the NonRec binding of the join point, in `occAnalBind`, we use
 -  `orUDs`, not `andUDs` to combine the usage from the RHS with the usage from
 -  the body.
 +* Crucially, at the NonRec binding of a join point `j`, in `occAnalBind`,
 +  we use `combineJoinPointUDs`, not `andUDs` to combine the usage from the
 +  RHS with the usage from the body.  `combineJoinPointUDs` behaves like this:
++
 +   * For all variables than `occ_nested_lets`, use `andUDs`, just like for
 +     any normal let-binding.
++
 +   * But for a variable `v` in `occ_nested_lets`, use `orUDs`:
 +     - If `v` occurs `ManyOcc` in the join-point RHS, the variable won't be in
 +       `occ_join_points`; but we'll get `ManyOcc` anyway.
 +     - If `v` occurs `OneOcc` in the join-point RHS, the variable will be in
 +       `occ_join_points` and we'll thereby get a `OneOcc{occ_n_br=0}` from
 +       each of j's tail calls.  We can `or` that with the `OncOcc{occ_n_br=n}`
 +       from j's RHS.
++
 +  The only reason for `occ_nested_lets` is to reduce the size of the info
 +  duplicate at each tail call; see (W4). It would sound to put *all* variables
 +  into `occ_nested_lets`.
  Here are the consequences
@@ -682,13 +698,14 @@ Here are the consequences
    There are two lexical occurrences of `v`!
    (NB: `orUDs` adds occ_n_br together, so occ_n_br=1 is impossible, too.)
 -* In the tricky (P3) we'll get an `andUDs` of
 -    * OneOcc{occ_n_br=0} from the occurrences of `j`)
 +* In the tricky (P3), when analysing `case (f v) of ...`, we'll get
 +  an `andUDs` of
 +    * OneOcc{occ_n_br=0} from the occurrences of `j`
      * OneOcc{occ_n_br=1} from the (f v)
    These are `andUDs` together in `addOccInfo`, and hence
    `v` gets ManyOccs, just as it should.  Clever!
 -There are a couple of tricky wrinkles
 +There are, of course, some tricky wrinkles
  (W1) Consider this example which shadows `j`:
            join j = rhs in
@@ -718,6 +735,8 @@ There are a couple of tricky wrinkles
       * In `postprcess_uds`, we add the chucked-out join points to the
         returned UsageDetails, with `andUDs`.
 +Wrinkles (W1) and (W2) are very similar to Note [Binder swap] (BS3).
++
  (W3) Consider this example, which shadows `j`, but this time in an argument
                join j = rhs
                in f (case x of { K j -> ...; ... })
@@ -732,12 +751,36 @@ There are a couple of tricky wrinkles
       NB: this is just about efficiency: it is always safe /not/ to zap the
       occ_join_points.
 -(W4) What if the join point binding has a stable unfolding, or RULES?
 -     They are just alternative right-hand sides, and at each call site we
 -     will use only one of them. So again, we can use `orUDs` to combine
 -     usage info from all these alternatives RHSs.
+-
 -Wrinkles (W1) and (W2) are very similar to Note [Binder swap] (BS3).
 +(W4) Other things being equal, we want keep the OccInfoEnv stored in
 +  `occ_join_points` as small as possible, because it is /duplicated/ at
 +  /every occurrence/ of the join point.  We really only want to include
 +  OccInfo for
 +       * Local, non-recursive let-bound Ids
 +       * that occur just once in the RHS of the join point
 +  particularly including
 +       * thunks (that's the original point) and
 +       * join points (so that the trick works recursively).
 +  We call these the "tracked Ids of j".
++
 +  Including lambda binders is pointless, and slows down the occurrence analyser.
++
 +  e.g.    \x. let y = x+1 in
 +              join j v = ..x..y..(f z z)..
 +              in ...
 +  In the `occ_join_points` binding for `j`, we want to track `y`, but
 +  not `x` (lambda bound) nor `z` (occurs many times).
++
 +  To exploit this:
 +     * `occ_nested_lets` tracks which Ids are
 +              nested (not-top-level), non-recursive lets
 +     * `addJoinPoint` only populates j's entry with occ-info for the "tracked Ids"
 +       of `j`; that is, that are (a) in occ_nested_lets and (b) have OneOcc.
 +     * `combineJoinPointUDs` uses
 +          orLocalOcc  for local-let Ids
 +          andLocalOcc for non-local-let Ids
++
 +  This fancy footwork can matter in extreme cases: it gave a 25% reduction in
 +  total compiler allocation in #26425..
  Note [Finding join points]
  ~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -759,45 +802,45 @@ rest of 'OccInfo' until it goes on the binder.
  Note [Join arity prediction based on joinRhsArity]
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 -In general, the join arity from tail occurrences of a join point (O) may be
 -higher or lower than the manifest join arity of the join body (M). E.g.,
 +In general, the join arity from tail occurrences of a join point (OAr) may be
 +higher or lower than the manifest join arity of the join body (MAr). E.g.,
 -  -- M > O:
 -  let f x y = x + y              -- M = 2
 -  in if b then f 1 else f 2      -- O = 1
 +  -- MAr > Oar:
 +  let f x y = x + y              -- MAr = 2
 +  in if b then f 1 else f 2      -- OAr = 1
    ==> { Contify for join arity 1 }
    join f x = \y -> x + y
    in if b then jump f 1 else jump f 2
 -  -- M < O
 -  let f = id                     -- M = 0
 -  in if ... then f 12 else f 13  -- O = 1
 +  -- MAr < Oar
 +  let f = id                     -- MAr = 0
 +  in if ... then f 12 else f 13  -- OAr = 1
    ==> { Contify for join arity 1, eta-expand f }
    join f x = id x
    in if b then jump f 12 else jump f 13
 -But for *recursive* let, it is crucial that both arities match up, consider
 +But for *recursive* let, it is crucial MAr=OAr.  Consider:
    letrec f x y = if ... then f x else True
    in f 42
 -Here, M=2 but O=1. If we settled for a joinrec arity of 1, the recursive jump
 +Here, MAr=2 but OAr=1. If we settled for a joinrec arity of 1, the recursive jump
  would not happen in a tail context! Contification is invalid here.
 -So indeed it is crucial to demand that M=O.
 +So indeed it is crucial to demand that MAr=OAr.
 -(Side note: Actually, we could be more specific: Let O1 be the join arity of
 -occurrences from the letrec RHS and O2 the join arity from the let body. Then
 -we need M=O1 and M<=O2 and could simply eta-expand the RHS to match O2 later.
 -M=O is the specific case where we don't want to eta-expand. Neither the join
 +(Side note: Actually, we could be more specific: Let OAr1 be the join arity of
 +occurrences from the letrec RHS and OAr2 the join arity from the let body. Then
 +we need MAr=OAr1 and MAr<=OAr2 and could simply eta-expand the RHS to match OAr2 later.
 +MAr=OAr is the specific case where we don't want to eta-expand. Neither the join
  points paper nor GHC does this at the moment.)
  We can capitalise on this observation and conclude that *if* f could become a
 -joinrec (without eta-expansion), it will have join arity M.
 -Now, M is just the result of 'joinRhsArity', a rather simple, local analysis.
 +joinrec (without eta-expansion), it will have join arity MAr.
 +Now, MAr is just the result of 'joinRhsArity', a rather simple, local analysis.
  It is also the join arity inside the 'TailUsageDetails' returned by
  'occAnalLamTail', so we can predict join arity without doing any fixed-point
  iteration or really doing any deep traversal of let body or RHS at all.
 -We check for M in the 'adjustTailUsage' call inside 'tagRecBinders'.
 +We check for MAr in the 'adjustTailUsage' call inside 'tagRecBinders'.
  All this is quite apparent if you look at the contification transformation in
  Fig. 5 of "Compiling without Continuations" (which does not account for
@@ -807,14 +850,14 @@ eta-expansion at all, mind you). The letrec case looks like this
      ... and a bunch of conditions establishing that f only occurs
          in app heads of join arity (len as + len xs) inside us and es ...
 -The syntactic form `/\as.\xs. L[us]` forces M=O iff `f` occurs in `us`. However,
 +The syntactic form `/\as.\xs. L[us]` forces MAr=OAr iff `f` occurs in `us`. However,
  for non-recursive functions, this is the definition of contification from the
  paper:
    let f = /\as.\xs.u in L[es]     ... conditions ...
 -Note that u could be a lambda itself, as we have seen. No relationship between M
 -and O to exploit here.
 +Note that u could be a lambda itself, as we have seen. No relationship between MAr
 +and OAr to exploit here.
  Note [Join points and unfoldings/rules]
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -954,6 +997,22 @@ of both functions, serving as a specification:
       Cyclic Recursive case:   'tagRecBinders'
       Acyclic Recursive case:  'adjustNonRecRhs'
       Non-recursive case:      'adjustNonRecRhs'
++
 +Note [Unfoldings and RULES]
 +~~~~~~~~~~~~~~~~~~~~~~~~~~~
 +For let-bindings we treat (stable) unfoldings and RULES as "alternative right hand
 +sides".  That is, it's as if we had
 +  f = case <hiatus> of
 +         1 -> <the-rhs>
 +         2 -> <the-stable-unfolding>
 +         3 -> <rhs of rule1>
 +         4 -> <rhs of rule2>
 +So we combine all these with `orUDs` (#26567).  But actually it makes
 +very little difference whether we use `andUDs` or `orUDs` because of
 +Note [Occurrences in stable unfoldings and RULES]: occurrences in an unfolding
 +or RULE are treated as ManyOcc anyway.
++
 +But NB that tail-call info is preserved so that we don't thereby lose join points.
  -}
  ------------------------------------------------------------------
@@ -991,24 +1050,24 @@ occAnalBind !env lvl ire (NonRec bndr rhs) thing_inside combine
    | mb_join@(JoinPoint {}) <- idJoinPointHood bndr
    = -- Analyse the RHS and /then/ the body
      let -- Analyse the rhs first, generating rhs_uds
 -        !(rhs_uds_s, bndr', rhs') = occAnalNonRecRhs env lvl ire mb_join bndr rhs
 -        rhs_uds = foldl1' orUDs rhs_uds_s   -- NB: orUDs.  See (W4) of
 -                                           -- Note [Occurrence analysis for join points]
 +        !(rhs_uds, bndr', rhs') = occAnalNonRecRhs env lvl ire mb_join bndr rhs
          -- Now analyse the body, adding the join point
          -- into the environment with addJoinPoint
 -        !(WUD body_uds (occ, body)) = occAnalNonRecBody env bndr' $ \env ->
 +        env_body = addLocalLet env lvl bndr
 +        !(WUD body_uds (occ, body)) = occAnalNonRecBody env_body bndr' $ \env ->
                                        thing_inside (addJoinPoint env bndr' rhs_uds)
      in
      if isDeadOcc occ     -- Drop dead code; see Note [Dead code]
      then WUD body_uds body
 -    else WUD (rhs_uds `orUDs` body_uds)    -- Note `orUDs`
 +    else WUD (combineJoinPointUDs env rhs_uds body_uds)    -- Note `orUDs`
               (combine [NonRec (fst (tagNonRecBinder lvl occ bndr')) rhs']
                        body)
    -- The normal case, including newly-discovered join points
    -- Analyse the body and /then/ the RHS
 -  | WUD body_uds (occ,body) <- occAnalNonRecBody env bndr thing_inside
 +  | let env_body = addLocalLet env lvl bndr
 +  , WUD body_uds (occ,body) <- occAnalNonRecBody env_body bndr thing_inside
    = if isDeadOcc occ   -- Drop dead code; see Note [Dead code]
      then WUD body_uds body
      else let
@@ -1017,8 +1076,8 @@ occAnalBind !env lvl ire (NonRec bndr rhs) thing_inside combine
          -- => join arity O of Note [Join arity prediction based on joinRhsArity]
          (tagged_bndr, mb_join) = tagNonRecBinder lvl occ bndr
 -        !(rhs_uds_s, final_bndr, rhs') = occAnalNonRecRhs env lvl ire mb_join tagged_bndr rhs
 -    in WUD (foldr andUDs body_uds rhs_uds_s)      -- Note `andUDs`
 +        !(rhs_uds, final_bndr, rhs') = occAnalNonRecRhs env lvl ire mb_join tagged_bndr rhs
 +    in WUD (rhs_uds `andUDs` body_uds)      -- Note `andUDs`
             (combine [NonRec final_bndr rhs'] body)
  -----------------
@@ -1033,15 +1092,21 @@ occAnalNonRecBody env bndr thing_inside
  -----------------
  occAnalNonRecRhs :: OccEnv -> TopLevelFlag -> ImpRuleEdges
 -                -> JoinPointHood -> Id -> CoreExpr
 -                 -> (NonEmpty UsageDetails, Id, CoreExpr)
 +                 -> JoinPointHood -> Id -> CoreExpr
 +                 -> (UsageDetails, Id, CoreExpr)
  occAnalNonRecRhs !env lvl imp_rule_edges mb_join bndr rhs
    | null rules, null imp_rule_infos
    =  -- Fast path for common case of no rules. This is only worth
       -- 0.1% perf on average, but it's also only a line or two of code
 -    ( adj_rhs_uds :| adj_unf_uds : [], final_bndr_no_rules, final_rhs )
 +    ( adj_rhs_uds `orUDs` adj_unf_uds
 +    , final_bndr_no_rules, final_rhs )
++
    | otherwise
 -  = ( adj_rhs_uds :| adj_unf_uds : adj_rule_uds, final_bndr_with_rules, final_rhs )
 +  = ( foldl' orUDs (adj_rhs_uds `orUDs` adj_unf_uds) adj_rule_uds
 +    , final_bndr_with_rules, final_rhs )
++
 +    -- orUDs: Combine the RHS, (stable) unfolding, and RULES with orUDs
 +    --        See Note [Unfoldings and RULES]
    where
      --------- Right hand side ---------
      -- For join points, set occ_encl to OccVanilla, via setTailCtxt.  If we have
@@ -1054,7 +1119,7 @@ occAnalNonRecRhs !env lvl imp_rule_edges mb_join bndr rhs
      rhs_ctxt = mkNonRecRhsCtxt lvl bndr unf
      -- See Note [Join arity prediction based on joinRhsArity]
 -    -- Match join arity O from mb_join_arity with manifest join arity M as
 +    -- Match join arity OAr from mb_join_arity with manifest join arity MAr as
      -- returned by of occAnalLamTail. It's totally OK for them to mismatch;
      -- hence adjust the UDs from the RHS
@@ -1764,7 +1829,7 @@ makeNode !env imp_rule_edges bndr_set (bndr, rhs)
                                 -- here because that is what we are setting!
      WTUD unf_tuds unf' = occAnalUnfolding rhs_env unf
      adj_unf_uds = adjustTailArity (JoinPoint rhs_ja) unf_tuds
 -      -- `rhs_ja` is `joinRhsArity rhs` and is the prediction for source M
 +      -- `rhs_ja` is `joinRhsArity rhs` and is the prediction for source MAr
        -- of Note [Join arity prediction based on joinRhsArity]
      --------- IMP-RULES --------
@@ -1775,7 +1840,7 @@ makeNode !env imp_rule_edges bndr_set (bndr, rhs)
      --------- All rules --------
      -- See Note [Join points and unfoldings/rules]
 -    -- `rhs_ja` is `joinRhsArity rhs'` and is the prediction for source M
 +    -- `rhs_ja` is `joinRhsArity rhs'` and is the prediction for source MAr
      -- of Note [Join arity prediction based on joinRhsArity]
      rules_w_uds :: [(CoreRule, UsageDetails, UsageDetails)]
      rules_w_uds = [ (r,l,adjustTailArity (JoinPoint rhs_ja) rhs_wuds)
@@ -2177,7 +2242,9 @@ occAnalLamTail :: OccEnv -> CoreExpr -> WithTailUsageDetails CoreExpr
  -- See Note [Adjusting right-hand sides]
  occAnalLamTail env expr
    = let !(WUD usage expr') = occ_anal_lam_tail env expr
 -    in WTUD (TUD (joinRhsArity expr) usage) expr'
 +    in WTUD (TUD (joinRhsArity expr') usage) expr'
 +       -- If expr looks like (\x. let dead = e in \y. blah), where `dead` is dead
 +       -- then joinRhsArity expr' might exceed joinRhsArity expr
  occ_anal_lam_tail :: OccEnv -> CoreExpr -> WithUsageDetails CoreExpr
  -- Does not markInsideLam etc for the outmost batch of lambdas
@@ -2281,7 +2348,7 @@ occAnalUnfolding !env unf
                WTUD (TUD rhs_ja uds) rhs' = occAnalLamTail env rhs
                unf' = unf { uf_tmpl = rhs' }
              in WTUD (TUD rhs_ja (markAllMany uds)) unf'
 -              -- markAllMany: see Note [Occurrences in stable unfoldings]
 +              -- markAllMany: see Note [Occurrences in stable unfoldings and RULES]
          | otherwise -> WTUD (TUD 0 emptyDetails) unf
                -- For non-Stable unfoldings we leave them undisturbed, but
@@ -2319,12 +2386,13 @@ occAnalRule env rule@(Rule { ru_bndrs = bndrs, ru_args = args, ru_rhs = rhs })
                            -- Note [Rules are extra RHSs]
                            -- Note [Rule dependency info]
      rhs_uds' = markAllMany rhs_uds
 +               -- markAllMany: Note [Occurrences in stable unfoldings and RULES]
      rhs_ja = length args -- See Note [Join points and unfoldings/rules]
  occAnalRule _ other_rule = (other_rule, emptyDetails, TUD 0 emptyDetails)
 -{- Note [Occurrences in stable unfoldings]
 -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 +{- Note [Occurrences in stable unfoldings and RULES]
 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  Consider
      f p = BIG
      {-# INLINE g #-}
@@ -2338,7 +2406,7 @@ preinlineUnconditionally here!
  The INLINE pragma says "inline exactly this RHS"; perhaps the
  programmer wants to expose that 'not', say. If we inline f that will make
 -the Stable unfoldign big, and that wasn't what the programmer wanted.
 +the Stable unfolding big, and that wasn't what the programmer wanted.
  Another way to think about it: if we inlined g as-is into multiple
  call sites, now there's be multiple calls to f.
@@ -2347,6 +2415,8 @@ Bottom line: treat all occurrences in a stable unfolding as "Many".
  We still leave tail call information intact, though, as to not spoil
  potential join points.
 +The same goes for RULES.
++
  Note [Unfoldings and rules]
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~
  Generally unfoldings and rules are already occurrence-analysed, so we
@@ -2598,7 +2668,7 @@ occAnalArgs :: OccEnv -> CoreExpr -> [CoreExpr]
              -> WithUsageDetails CoreExpr
  -- The `fun` argument is just an accumulating parameter,
  -- the base for building the application we return
 -occAnalArgs !env fun args !one_shots
 +occAnalArgs env fun args one_shots
    = go emptyDetails fun args one_shots
    where
      env_args = setNonTailCtxt encl env
@@ -2657,8 +2727,19 @@ Constructors are rather like lambdas in this way.
  occAnalApp :: OccEnv
             -> (Expr CoreBndr, [Arg CoreBndr], [CoreTickish])
             -> WithUsageDetails (Expr CoreBndr)
 --- Naked variables (not applied) end up here too
 -occAnalApp !env (Var fun, args, ticks)
 +occAnalApp !env (Var fun_id, [], ticks)
 +  = -- Naked variables (not applied) end up here too, and it's worth giving
 +    -- this common case special treatment, because there is so much less to do.
 +    -- This is just a specialised copy of the (Var fun_id) case below
 +    WUD fun_uds (mkTicks ticks fun')
 +  where
 +    !(fun', fun_id')  = lookupBndrSwap env fun_id
 +    !fun_uds = mkOneOcc env fun_id' int_cxt 0
 +    !int_cxt = case occ_encl env of
 +                   OccScrut -> IsInteresting
 +                   _other   -> NotInteresting
++
 +occAnalApp env (Var fun, args, ticks)
    -- Account for join arity of runRW# continuation
    -- See Note [Simplification of runRW#]
    --
@@ -2863,7 +2944,11 @@ data OccEnv
               -- Invariant: no Id maps to an empty OccInfoEnv
               -- See Note [Occurrence analysis for join points]
             , occ_join_points :: !JoinPointInfo
 -    }
++
 +           , occ_nested_lets :: IdSet    -- Non-top-level, non-rec-bound lets
 +                -- I tried making this field strict, but doing so increased
 +                -- compile-time allocation very slightly: 0.1% on average
 +           }
  type JoinPointInfo = IdEnv OccInfoEnv
@@ -2914,7 +2999,8 @@ initOccEnv
             , occ_join_points = emptyVarEnv
             , occ_bs_env = emptyVarEnv
 -           , occ_bs_rng = emptyVarSet }
 +           , occ_bs_rng = emptyVarSet
 +           , occ_nested_lets = emptyVarSet }
  noBinderSwaps :: OccEnv -> Bool
  noBinderSwaps (OccEnv { occ_bs_env = bs_env }) = isEmptyVarEnv bs_env
@@ -3154,23 +3240,26 @@ postprocess_uds bndrs bad_joins uds
        | uniq `elemVarEnvByKey` env = plusVarEnv_C andLocalOcc env join_env
        | otherwise                  = env
 +addLocalLet :: OccEnv -> TopLevelFlag -> Id -> OccEnv
 +addLocalLet env@(OccEnv { occ_nested_lets = ids }) top_lvl id
 +  | isTopLevel top_lvl = env
 +  | otherwise          = env { occ_nested_lets = ids `extendVarSet` id }
++
  addJoinPoint :: OccEnv -> Id -> UsageDetails -> OccEnv
 -addJoinPoint env bndr rhs_uds
 +addJoinPoint env@(OccEnv { occ_join_points = join_points, occ_nested_lets = nested_lets })
 +             join_bndr (UD { ud_env = rhs_occs })
    | isEmptyVarEnv zeroed_form
    = env
    | otherwise
 -  = env { occ_join_points = extendVarEnv (occ_join_points env) bndr zeroed_form }
 +  = env { occ_join_points = extendVarEnv join_points join_bndr zeroed_form }
    where
 -    zeroed_form = mkZeroedForm rhs_uds
 +    zeroed_form = mapMaybeUniqSetToUFM do_one nested_lets
 +     -- See Note [Occurrence analysis for join points] for "zeroed form"
 -mkZeroedForm :: UsageDetails -> OccInfoEnv
 --- See Note [Occurrence analysis for join points] for "zeroed form"
 -mkZeroedForm (UD { ud_env = rhs_occs })
 -  = mapMaybeUFM do_one rhs_occs
 -  where
 -    do_one :: LocalOcc -> Maybe LocalOcc
 -    do_one (ManyOccL {})    = Nothing
 -    do_one occ@(OneOccL {}) = Just (occ { lo_n_br = 0 })
 +    do_one :: Var -> Maybe LocalOcc
 +    do_one bndr = case lookupVarEnv rhs_occs bndr of
 +                    Just occ@(OneOccL {}) -> Just (occ { lo_n_br = 0 })
 +                    _                     -> Nothing
  --------------------
  transClosureFV :: VarEnv VarSet -> VarEnv VarSet
@@ -3628,7 +3717,14 @@ data LocalOcc  -- See Note [LocalOcc]
                     -- Combining (AlwaysTailCalled 2) and (AlwaysTailCalled 3)
                     -- gives NoTailCallInfo
                , lo_int_cxt :: !InterestingCxt }
++
      | ManyOccL !TailCallInfo
 +       -- Why do we need TailCallInfo on ManyOccL?
 +       -- Answer 1: recursive bindings are entered many times:
 +       --    rec { j x = ...j x'... } in j y
 +       -- See the uses of `andUDs` in `tagRecBinders`
 +       -- Answer 2: occurrences in stable unfoldings are many-ified
 +       --           See Note [Occurrences in stable unfoldings and RULES]
  instance Outputable LocalOcc where
    ppr (OneOccL { lo_n_br = n, lo_tail = tci })
@@ -3651,10 +3747,13 @@ data UsageDetails
  instance Outputable UsageDetails where
    ppr ud@(UD { ud_env = env, ud_z_tail = z_tail })
 -    = text "UD" <+> (braces $ fsep $ punctuate comma $
 -      [ ppr uq <+> text ":->" <+> ppr (lookupOccInfoByUnique ud uq)
 -      | (uq, _) <- nonDetStrictFoldVarEnv_Directly do_one [] env ])
 -      $$ nest 2 (text "ud_z_tail" <+> ppr z_tail)
 +    = text "UD" <> (braces (vcat
 +         [ -- `final` shows the result of a proper lookupOccInfo, returning OccInfo
 +           --         after accounting for `ud_z_tail` etc.
 +           text "final =" <+> (fsep $ punctuate comma $
 +                 [ ppr uq <+> text ":->" <+> ppr (lookupOccInfoByUnique ud uq)
 +                 | (uq, _) <- nonDetStrictFoldVarEnv_Directly do_one [] env ])
 +         , text "ud_z_tail" <+> ppr z_tail ] ))
      where
        do_one :: Unique -> LocalOcc -> [(Unique,LocalOcc)] -> [(Unique,LocalOcc)]
        do_one uniq occ occs = (uniq, occ) : occs
@@ -3663,7 +3762,7 @@ instance Outputable UsageDetails where
  -- | TailUsageDetails captures the result of applying 'occAnalLamTail'
  --   to a function `\xyz.body`. The TailUsageDetails pairs together
  --   * the number of lambdas (including type lambdas: a JoinArity)
 ---   * UsageDetails for the `body` of the lambda, unadjusted by `adjustTailUsage`.
 +--   * UsageDetails for the `body` of the lambda, /unadjusted/ by `adjustTailUsage`.
  -- If the binding turns out to be a join point with the indicated join
  -- arity, this unadjusted usage details is just what we need; otherwise we
  -- need to discard tail calls. That's what `adjustTailUsage` does.
@@ -3681,8 +3780,17 @@ data WithTailUsageDetails a = WTUD !TailUsageDetails !a
  andUDs:: UsageDetails -> UsageDetails -> UsageDetails
  orUDs :: UsageDetails -> UsageDetails -> UsageDetails
 -andUDs = combineUsageDetailsWith andLocalOcc
 -orUDs  = combineUsageDetailsWith orLocalOcc
 +andUDs = combineUsageDetailsWith (\_uniq -> andLocalOcc)
 +orUDs  = combineUsageDetailsWith (\_uniq -> orLocalOcc)
++
 +combineJoinPointUDs :: OccEnv -> UsageDetails -> UsageDetails -> UsageDetails
 +-- See (W4) in Note [Occurrence analysis for join points]
 +combineJoinPointUDs (OccEnv { occ_nested_lets = nested_lets }) uds1 uds2
 +  = combineUsageDetailsWith combine uds1 uds2
 +  where
 +    combine uniq occ1 occ2
 +      | uniq `elemVarSetByKey` nested_lets = orLocalOcc  occ1 occ2
 +      | otherwise                          = andLocalOcc occ1 occ2
  mkOneOcc :: OccEnv -> Id -> InterestingCxt -> JoinArity -> UsageDetails
  mkOneOcc !env id int_cxt arity
@@ -3699,7 +3807,8 @@ mkOneOcc !env id int_cxt arity
    = mkSimpleDetails (unitVarEnv id occ)
    where
 -    occ = OneOccL { lo_n_br = 1, lo_int_cxt = int_cxt
 +    occ = OneOccL { lo_n_br = 1
 +                  , lo_int_cxt = int_cxt
                    , lo_tail = AlwaysTailCalled arity }
  -- Add several occurrences, assumed not to be tail calls
@@ -3786,7 +3895,7 @@ restrictFreeVars bndrs fvs = restrictUniqSetToUFM bndrs fvs
  -------------------
  -- Auxiliary functions for UsageDetails implementation
 -combineUsageDetailsWith :: (LocalOcc -> LocalOcc -> LocalOcc)
 +combineUsageDetailsWith :: (Unique -> LocalOcc -> LocalOcc -> LocalOcc)
                          -> UsageDetails -> UsageDetails -> UsageDetails
  {-# INLINE combineUsageDetailsWith #-}
  combineUsageDetailsWith plus_occ_info
@@ -3796,9 +3905,9 @@ combineUsageDetailsWith plus_occ_info
    | isEmptyVarEnv env2 = uds1
    | otherwise
    -- See Note [Strictness in the occurrence analyser]
 -  -- Using strictPlusVarEnv here speeds up the test T26425 by about 10% by avoiding
 -  -- intermediate thunks.
 -  = UD { ud_env       = strictPlusVarEnv_C plus_occ_info env1 env2
 +  -- Using strictPlusVarEnv here speeds up the test T26425
 +  -- by about 10% by avoiding intermediate thunks.
 +  = UD { ud_env       = strictPlusVarEnv_C_Directly plus_occ_info env1 env2
         , ud_z_many    = strictPlusVarEnv z_many1   z_many2
         , ud_z_in_lam  = plusVarEnv z_in_lam1 z_in_lam2
         , ud_z_tail    = strictPlusVarEnv z_tail1   z_tail2 }
@@ -3842,8 +3951,6 @@ lookupOccInfoByUnique (UD { ud_env       = env
          | uniq `elemVarEnvByKey` z_tail = NoTailCallInfo
          | otherwise                     = ti
+-
+-
  -------------------
  -- See Note [Adjusting right-hand sides]
@@ -3853,21 +3960,22 @@ adjustNonRecRhs :: JoinPointHood
  -- ^ This function concentrates shared logic between occAnalNonRecBind and the
  -- AcyclicSCC case of occAnalRec.
  -- It returns the adjusted rhs UsageDetails combined with the body usage
 -adjustNonRecRhs mb_join_arity rhs_wuds@(WTUD _ rhs)
 -  = WUD (adjustTailUsage mb_join_arity rhs_wuds) rhs
+-
 +adjustNonRecRhs mb_join_arity (WTUD (TUD rhs_ja uds) rhs)
 +  = WUD (adjustTailUsage exact_join rhs uds) rhs
 +  where
 +    exact_join = mb_join_arity == JoinPoint rhs_ja
 -adjustTailUsage :: JoinPointHood
 -                -> WithTailUsageDetails CoreExpr    -- Rhs usage, AFTER occAnalLamTail
 +adjustTailUsage :: Bool        -- True <=> Exactly-matching join point; don't do markNonTail
 +                -> CoreExpr    -- Rhs usage, AFTER occAnalLamTail
 +                -> UsageDetails
                  -> UsageDetails
 -adjustTailUsage mb_join_arity (WTUD (TUD rhs_ja uds) rhs)
 +adjustTailUsage exact_join rhs uds
    = -- c.f. occAnal (Lam {})
      markAllInsideLamIf (not one_shot) $
      markAllNonTailIf (not exact_join) $
      uds
    where
      one_shot   = isOneShotFun rhs
 -    exact_join = mb_join_arity == JoinPoint rhs_ja
  adjustTailArity :: JoinPointHood -> TailUsageDetails -> UsageDetails
  adjustTailArity mb_rhs_ja (TUD ja usage)
@@ -3914,8 +4022,9 @@ tagNonRecBinder lvl occ bndr
  tagRecBinders :: TopLevelFlag           -- At top level?
                -> UsageDetails           -- Of body of let ONLY
                -> [NodeDetails]
 -              -> WithUsageDetails       -- Adjusted details for whole scope,
 -                                        -- with binders removed
 +              -> WithUsageDetails       -- Adjusted details for whole scope
 +                                        -- still including the binders;
 +                                        -- (they are removed by `addInScope`)
                    [IdWithOccInfo]       -- Tagged binders
  -- Substantially more complicated than non-recursive case. Need to adjust RHS
  -- details *before* tagging binders (because the tags depend on the RHSes).
@@ -3925,32 +4034,21 @@ tagRecBinders lvl body_uds details_s
       -- 1. See Note [Join arity prediction based on joinRhsArity]
       --    Determine possible join-point-hood of whole group, by testing for
 -     --    manifest join arity M.
 -     --    This (re-)asserts that makeNode had made tuds for that same arity M!
 +     --    manifest join arity MAr.
 +     --    This (re-)asserts that makeNode had made tuds for that same arity MAr!
       unadj_uds = foldr (andUDs . test_manifest_arity) body_uds details_s
 -     test_manifest_arity ND{nd_rhs = WTUD tuds rhs}
 -       = adjustTailArity (JoinPoint (joinRhsArity rhs)) tuds
 +     test_manifest_arity ND{nd_rhs = WTUD (TUD rhs_ja uds) rhs}
 +       = assertPpr (rhs_ja == joinRhsArity rhs) (ppr rhs_ja $$ ppr uds $$ ppr rhs) $
 +         uds
 +     will_be_joins :: Bool
       will_be_joins = decideRecJoinPointHood lvl unadj_uds bndrs
 -     mb_join_arity :: Id -> JoinPointHood
 -     -- mb_join_arity: See Note [Join arity prediction based on joinRhsArity]
 -     -- This is the source O
 -     mb_join_arity bndr
 -         -- Can't use willBeJoinId_maybe here because we haven't tagged
 -         -- the binder yet (the tag depends on these adjustments!)
 -       | will_be_joins
 -       , AlwaysTailCalled arity <- lookupTailCallInfo unadj_uds bndr
 -       = JoinPoint arity
 -       | otherwise
 -       = assert (not will_be_joins) -- Should be AlwaysTailCalled if
 -         NotJoinPoint               -- we are making join points!
+-
       -- 2. Adjust usage details of each RHS, taking into account the
       --    join-point-hood decision
 -     rhs_udss' = [ adjustTailUsage (mb_join_arity bndr) rhs_wuds
 +     rhs_udss' = [ adjustTailUsage will_be_joins rhs rhs_uds
                       -- Matching occAnalLamTail in makeNode
 -                 | ND { nd_bndr = bndr, nd_rhs = rhs_wuds } <- details_s ]
 +                 | ND { nd_rhs = WTUD (TUD _ rhs_uds) rhs } <- details_s ]
       -- 3. Compute final usage details from adjusted RHS details
       adj_uds = foldr andUDs body_uds rhs_udss'
@@ -3969,9 +4067,9 @@ setBinderOcc occ_info bndr
    | otherwise                  = setIdOccInfo bndr occ_info
  -- | Decide whether some bindings should be made into join points or not, based
 --- on its occurrences. This is
 +-- on its occurrences.
  -- Returns `False` if they can't be join points. Note that it's an
 +-- all-or-nothing decision: if multiple binders are given, they are
  -- assumed to be mutually recursive.
  --
  -- It must, however, be a final decision. If we say `True` for 'f',

compiler/GHC/Core/SimpleOpt.hs

@@ -393,12 +393,19 @@ simple_app env e0@(Lam {}) as0@(_:_)
        = wrapLet mb_pr $ do_beta env'' body as
        where (env', b') = subst_opt_bndr env b
 -    do_beta env e@(Lam b body) as@(CastIt co:rest)
 -      -- See Note [Desugaring unlifted newtypes]
 +    -- See Note [Eliminate casts in function position]
 +    do_beta env e@(Lam b _) as@(CastIt out_co:rest)
        | isNonCoVarId b
 -      , Just (b', body') <- pushCoercionIntoLambda (soe_subst env) b body co
 +      -- Optimise the inner lambda to make it an 'OutExpr', which makes it
 +      -- possible to call 'pushCoercionIntoLambda' with the 'OutCoercion' 'co'.
 +      -- This is kind of horrible, as for nested casted lambdas with a big body,
 +      -- we will repeatedly optimise the body (once for each binder). However,
 +      -- we need to do this to avoid mixing 'InExpr' and 'OutExpr', or two
 +      -- 'InExpr' with different environments (getting this wrong caused #26588 & #26589.)
 +      , Lam out_b out_body <- simple_app env e []
 +      , Just (b', body') <- pushCoercionIntoLambda (soeInScope env) out_b out_body out_co
        = do_beta (soeZapSubst env) (Lam b' body') rest
 -        -- soeZapSubst: pushCoercionIntoLambda applies the substitution
 +        -- soeZapSubst: we've already optimised everything (the lambda and 'rest') by now.
        | otherwise
        = rebuild_app env (simple_opt_expr env e) as
@@ -511,7 +518,31 @@ TL;DR: To avoid the rest of the compiler pipeline seeing these bad lambas, we
  rely on the simple optimiser to both inline the newtype unfolding and
  subsequently deal with the resulting lambdas (either beta-reducing them
  altogether or pushing coercions into them so that they satisfy the
 -representation-polymorphism invariants).
 +representation-polymorphism invariants). See Note [Eliminate casts in function position].
++
 +[Alternative approach] (GHC ticket #26608)
++
 +  We could instead, in the typechecker, emit a special form (a new constructor
 +  of XXExprGhcTc) for instantiations of representation-polymorphic unlifted
 +  newtypes (whether applied to a value argument or not):
++
 +    UnliftedNT :: DataCon -> [Type] -> Coercion -> XXExprGhcTc
++
 +  where "UnliftedNT nt_con [ty1, ...] co" represents the expression:
++
 +    ( nt_con @ty1 ... ) |> co
++
 +  The desugarer would then turn these AST nodes into appropriate Core, doing
 +  what the simple optimiser does today:
 +    - inline the compulsory unfolding of the newtype constructor
 +    - apply it to its type arguments and beta reduce
 +    - push the coercion into the resulting lambda
++
 +  This would have several advantages:
 +    - the desugarer would never produce "invalid" Core that needs to be
 +      tidied up by the simple optimiser,
 +    - the ugly and inefficient implementation described in
 +      Note [Eliminate casts in function position] could be removed.
  Wrinkle [Unlifted newtypes with wrappers]
@@ -717,50 +748,49 @@ rhss here.
  Note [Eliminate casts in function position]
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 -Consider the following program:
 +Due to the current implementation strategy for representation-polymorphic
 +unlifted newtypes, as described in Note [Desugaring unlifted newtypes], we rely
 +on the simple optimiser to push coercions into lambdas, such as in the following
 +example:
    type R :: Type -> RuntimeRep
 -  type family R a where { R Float = FloatRep; R Double = DoubleRep }
 -  type F :: forall (a :: Type) -> TYPE (R a)
 -  type family F a where { F Float = Float#  ; F Double = Double# }
 +  type family R a where { R Int = IntRep }
 +  type F :: forall a -> TYPE (R a)
 +  type family F a where { F Int = Int# }
 -  type N :: forall (a :: Type) -> TYPE (R a)
    newtype N a = MkN (F a)
 -As MkN is a newtype, its unfolding is a lambda which wraps its argument
 -in a cast:
+-
 -  MkN :: forall (a :: Type). F a -> N a
 -  MkN = /\a \(x::F a). x |> co_ax
 -    -- recall that F a :: TYPE (R a)
+-
 -This is a representation-polymorphic lambda, in which the binder has an unknown
 -representation (R a). We can't compile such a lambda on its own, but we can
 -compile instantiations, such as `MkN @Float` or `MkN @Double`.
 +Now, an instantiated occurrence of 'MkN', such as 'MkN @Int' (whether applied
 +to a value argument or not) will lead, after inlining the compulsory unfolding
 +of 'MkN', to a lambda fo the form:
 -Our strategy to avoid running afoul of the representation-polymorphism
 -invariants of Note [Representation polymorphism invariants] in GHC.Core is thus:
 +  ( \ ( x :: F Int ) -> body ) |> co
 -  1. Give the newtype a compulsory unfolding (it has no binding, as we can't
 -     define lambdas with representation-polymorphic value binders in source Haskell).
 -  2. Rely on the optimiser to beta-reduce away any representation-polymorphic
 -     value binders.
 +    where
 +      co :: ( F Int -> res ) ~# ( Int# -> res )
 -For example, consider the application
 +The problem is that we now have a lambda abstraction whose binder does not have a
 +fixed RuntimeRep in the sense of Note [Fixed RuntimeRep] in GHC.Tc.Utils.Concrete.
 -    MkN @Float 34.0#
 +However, if we use 'pushCoercionIntoLambda', we end up with:
 -After inlining MkN we'll get
 +  ( \ ( x' :: Int# ) -> body' )
 -   ((/\a \(x:F a). x |> co_ax) @Float) |> co 34#
 +which satisfies the representation-polymorphism invariants of
 +Note [Representation polymorphism invariants] in GHC.Core.
 -where co :: (F Float -> N Float) ~ (Float# ~ N Float)
 +In conclusion:
 -But to actually beta-reduce that lambda, we need to push the 'co'
 -inside the `\x` with pushCoercionIntoLambda.  Hence the extra
 -equation for Cast-of-Lam in simple_app.
 +  1. The simple optimiser must push casts into lambdas.
 +  2. It must also deal with a situation such as (MkN @Int) |> co, where we first
 +     inline the compulsory unfolding of N. This means the simple optimiser must
 +     "peel off" the casts and optimise the inner expression first, to determine
 +     whether it is a lambda abstraction or not.
 -This is regrettably delicate.
 +This is regrettably delicate. If we could make sure the typechecker/desugarer
 +did not produce these bad lambdas in the first place (as described in
 +[Alternative approach] in Note [Desugaring unlifted newtypes]), we could
 +get rid of this ugly logic.
  Note [Preserve join-binding arity]
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -1673,7 +1703,7 @@ exprIsLambda_maybe ise@(ISE in_scope_set _) (Cast casted_e co)
      -- this implies that x is not in scope in gamma (makes this code simpler)
      , not (isTyVar x) && not (isCoVar x)
      , assert (not $ x `elemVarSet` tyCoVarsOfCo co) True
 -    , Just (x',e') <- pushCoercionIntoLambda (mkEmptySubst in_scope_set) x e co
 +    , Just (x',e') <- pushCoercionIntoLambda in_scope_set x e co
      , let res = Just (x',e',ts)
      = --pprTrace "exprIsLambda_maybe:Cast" (vcat [ppr casted_e,ppr co,ppr res)])
        res

compiler/GHC/Driver/DynFlags.hs

@@ -1268,6 +1268,7 @@ optLevelFlags -- see Note [Documenting optimisation flags]
      , ([1,2],   Opt_CfgBlocklayout)      -- Experimental
      , ([1,2],   Opt_Specialise)
 +    , ([1,2],   Opt_PolymorphicSpecialisation)  -- Now on by default (#23559)
      , ([1,2],   Opt_CrossModuleSpecialise)
      , ([1,2],   Opt_InlineGenerics)
      , ([1,2],   Opt_Strictness)

compiler/GHC/Driver/Flags.hs

@@ -909,6 +909,7 @@ optimisationFlags = EnumSet.fromList
     , Opt_SpecialiseAggressively
     , Opt_CrossModuleSpecialise
     , Opt_StaticArgumentTransformation
 +   , Opt_PolymorphicSpecialisation
     , Opt_CSE
     , Opt_StgCSE
     , Opt_StgLiftLams

compiler/GHC/HsToCore/Binds.hs

@@ -1620,7 +1620,7 @@ ds_hs_wrapper hs_wrap
                                do { x <- newSysLocalDs (mkScaled (subMultCoRKind w_co) t)
                                   ; go c1 $ \w1 ->
                                     go c2 $ \w2 ->
 -                                   let app f a = mkCoreApp (text "dsHsWrapper") f a
 +                                   let app f a = mkCoreApp f a
                                         arg     = w1 (Var x)
                                     in k (\e -> (Lam x (w2 (app e arg)))) }

compiler/GHC/HsToCore/Expr.hs

@@ -877,8 +877,7 @@ dsHsConLike (PatSynCon ps)
    | Just (builder_name, _, add_void) <- patSynBuilder ps
    = do { builder_id <- dsLookupGlobalId builder_name
         ; return (if add_void
 -                 then mkCoreApp (text "dsConLike" <+> ppr ps)
 -                                (Var builder_id) unboxedUnitExpr
 +                 then mkCoreApp (Var builder_id) unboxedUnitExpr
                   else Var builder_id) }
    | otherwise
    = pprPanic "dsConLike" (ppr ps)

compiler/GHC/HsToCore/Match.hs

@@ -301,7 +301,7 @@ matchView (var :| vars) ty eqns@(eqn1 :| _)
           -- compile the view expressions
          ; viewExpr' <- dsExpr viewExpr
          ; return (mkViewMatchResult var'
 -                    (mkCoreApp (text "matchView") viewExpr' (Var var))
 +                    (mkCoreApp viewExpr' (Var var))
                      match_result) }
  -- decompose the first pattern and leave the rest alone

compiler/GHC/Tc/Gen/App.hs

@@ -749,13 +749,13 @@ tcInstFun do_ql inst_final (tc_fun, fun_ctxt) fun_sigma rn_args
      go1 _pos acc fun_ty []
         | XExpr (ConLikeTc (RealDataCon dc)) <- tc_fun
         , isNewDataCon dc
 -       , [Scaled _ arg_ty] <- dataConOrigArgTys dc
 +       , [Scaled _ orig_arg_ty] <- dataConOrigArgTys dc
         , n_val_args == 0
         -- If we're dealing with an unsaturated representation-polymorphic
         -- UnliftedNewype, then perform a representation-polymorphism check.
         -- See Note [Representation-polymorphism checks for unsaturated unlifted newtypes]
         -- in GHC.Tc.Utils.Concrete.
 -       , not $ typeHasFixedRuntimeRep arg_ty
 +       , not $ typeHasFixedRuntimeRep orig_arg_ty
         = do { (wrap_co, arg_ty, res_ty) <-
                    matchActualFunTy (FRRRepPolyUnliftedNewtype dc)
                      (Just $ HsExprTcThing tc_fun)

compiler/GHC/Types/Basic.hs

@@ -1333,7 +1333,7 @@ zapFragileOcc occ         = zapOccTailCallInfo occ
  instance Outputable OccInfo where
    -- only used for debugging; never parsed.  KSW 1999-07
 -  ppr (ManyOccs tails)     = pprShortTailCallInfo tails
 +  ppr (ManyOccs tails)     = text "Many" <> parens (pprShortTailCallInfo tails)
    ppr IAmDead              = text "Dead"
    ppr (IAmALoopBreaker rule_only tails)
          = text "LoopBreaker" <> pp_ro <> pprShortTailCallInfo tails

compiler/GHC/Types/Unique/FM.hs

@@ -53,7 +53,7 @@ module GHC.Types.Unique.FM (
          plusUFM,
          strictPlusUFM,
          plusUFM_C,
 -        strictPlusUFM_C,
 +        strictPlusUFM_C, strictPlusUFM_C_Directly,
          plusUFM_CD,
          plusUFM_CD2,
          mergeUFM,
@@ -281,6 +281,9 @@ plusUFM_C f (UFM x) (UFM y) = UFM (M.unionWith f x y)
  strictPlusUFM_C :: (elt -> elt -> elt) -> UniqFM key elt -> UniqFM key elt -> UniqFM key elt
  strictPlusUFM_C f (UFM x) (UFM y) = UFM (MS.unionWith f x y)
 +strictPlusUFM_C_Directly :: (Unique -> elt -> elt -> elt) -> UniqFM key elt -> UniqFM key elt -> UniqFM key elt
 +strictPlusUFM_C_Directly f (UFM x) (UFM y) = UFM (MS.unionWithKey (f . mkUniqueGrimily) x y)
++
  -- | `plusUFM_CD f m1 d1 m2 d2` merges the maps using `f` as the
  -- combinding function and `d1` resp. `d2` as the default value if
  -- there is no entry in `m1` reps. `m2`. The domain is the union of

compiler/GHC/Types/Unique/Set.hs

@@ -40,6 +40,7 @@ module GHC.Types.Unique.Set (
          lookupUniqSet_Directly,
          partitionUniqSet,
          mapUniqSet,
 +        mapUniqSetToUFM, mapMaybeUniqSetToUFM,
          unsafeUFMToUniqSet,
          nonDetEltsUniqSet,
          nonDetKeysUniqSet,
@@ -211,6 +212,14 @@ mapUniqSet f = mkUniqSet . map f . nonDetEltsUniqSet
  mapMaybeUniqSet_sameUnique :: (a -> Maybe b) -> UniqSet a -> UniqSet b
  mapMaybeUniqSet_sameUnique f (UniqSet a) = UniqSet $ mapMaybeUFM_sameUnique f a
 +mapUniqSetToUFM :: (a -> b) -> UniqSet a -> UniqFM a b
 +-- Same keys, new values
 +mapUniqSetToUFM f (UniqSet ufm) = mapUFM f ufm
++
 +mapMaybeUniqSetToUFM :: (a -> Maybe b) -> UniqSet a -> UniqFM a b
 +-- Same keys, new values
 +mapMaybeUniqSetToUFM f (UniqSet ufm) = mapMaybeUFM f ufm
++
  -- Two 'UniqSet's are considered equal if they contain the same
  -- uniques.
  instance Eq (UniqSet a) where

compiler/GHC/Types/Var/Env.hs

@@ -12,7 +12,8 @@ module GHC.Types.Var.Env (
          elemVarEnv, disjointVarEnv, anyVarEnv,
          extendVarEnv, extendVarEnv_C, extendVarEnv_Acc,
          extendVarEnvList,
 -        strictPlusVarEnv, plusVarEnv, plusVarEnv_C, strictPlusVarEnv_C,
 +        strictPlusVarEnv, plusVarEnv, plusVarEnv_C,
 +        strictPlusVarEnv_C, strictPlusVarEnv_C_Directly,
          plusVarEnv_CD, plusMaybeVarEnv_C,
          plusVarEnvList, alterVarEnv,
          delVarEnvList, delVarEnv,
@@ -525,6 +526,7 @@ delVarEnv         :: VarEnv a -> Var -> VarEnv a
  minusVarEnv       :: VarEnv a -> VarEnv b -> VarEnv a
  plusVarEnv_C      :: (a -> a -> a) -> VarEnv a -> VarEnv a -> VarEnv a
  strictPlusVarEnv_C :: (a -> a -> a) -> VarEnv a -> VarEnv a -> VarEnv a
 +strictPlusVarEnv_C_Directly :: (Unique -> a -> a -> a) -> VarEnv a -> VarEnv a -> VarEnv a
  plusVarEnv_CD     :: (a -> a -> a) -> VarEnv a -> a -> VarEnv a -> a -> VarEnv a
  plusMaybeVarEnv_C :: (a -> a -> Maybe a) -> VarEnv a -> VarEnv a -> VarEnv a
  mapVarEnv         :: (a -> b) -> VarEnv a -> VarEnv b
@@ -552,6 +554,7 @@ extendVarEnv_Acc = addToUFM_Acc
  extendVarEnvList = addListToUFM
  plusVarEnv_C     = plusUFM_C
  strictPlusVarEnv_C = strictPlusUFM_C
 +strictPlusVarEnv_C_Directly = strictPlusUFM_C_Directly
  plusVarEnv_CD    = plusUFM_CD
  plusMaybeVarEnv_C = plusMaybeUFM_C
  delVarEnvList    = delListFromUFM

docs/users_guide/exts/rank_polymorphism.rst

@@ -195,7 +195,7 @@ For example: ::
    g3c :: Int -> forall x y. y -> x -> x
    f4 :: (Int -> forall a. (Eq a, Show a) => a -> a) -> Bool
 -  g4 ::  Int -> forall x. (Show x, Eq x) => x -> x) -> Bool
 +  g4 ::  Int -> forall x. (Show x, Eq x) => x -> x
  Then the application ``f3 g3a`` is well-typed, because ``g3a`` has a type that matches the type
  expected by ``f3``.  But ``f3 g3b`` is not well typed, because the foralls are in different places.

docs/users_guide/exts/type_families.rst

@@ -680,7 +680,7 @@ thus: ::
  When doing so, we (optionally) may drop the "``family``" keyword.
  The type parameters must all be type variables, of course, and some (but
 -not necessarily all) of then can be the class parameters. Each class
 +not necessarily all) of them can be the class parameters. Each class
  parameter may only be used at most once per associated type, but some
  may be omitted and they may be in an order other than in the class head.
  Hence, the following contrived example is admissible: ::

docs/users_guide/using-optimisation.rst

@@ -1325,10 +1325,7 @@ as such you shouldn't need to set any of them explicitly. A flag
      :reverse: -fno-polymorphic-specialisation
      :category:
 -    :default: off
+-
 -    Warning, this feature is highly experimental and may lead to incorrect runtime
 -    results. Use at your own risk (:ghc-ticket:`23469`, :ghc-ticket:`23109`, :ghc-ticket:`21229`, :ghc-ticket:`23445`).
 +    :default: on
      Enable specialisation of function calls to known dictionaries with free type variables.
      The created specialisation will abstract over the type variables free in the dictionary.

rts/eventlog/EventLog.c

@@ -491,13 +491,7 @@ endEventLogging(void)
      eventlog_enabled = false;
 -    // Flush all events remaining in the buffers.
 -    //
 -    // N.B. Don't flush if shutting down: this was done in
 -    // finishCapEventLogging and the capabilities have already been freed.
 -    if (getSchedState() != SCHED_SHUTTING_DOWN) {
 -        flushEventLog(NULL);
 -    }
 +    flushEventLog(NULL);
      ACQUIRE_LOCK(&eventBufMutex);
@@ -1626,15 +1620,24 @@ void flushEventLog(Capability **cap USED_IF_THREADS)
          return;
+     }
 +    // N.B. Don't flush if shutting down: this was done in
 +    // finishCapEventLogging and the capabilities have already been freed.
 +    // This can also race against the shutdown if the flush is triggered by the
 +    // ticker thread. (#26573)
 +    if (getSchedState() == SCHED_SHUTTING_DOWN) {
 +      return;
 +    }
++
      ACQUIRE_LOCK(&eventBufMutex);
      printAndClearEventBuf(&eventBuf);
      RELEASE_LOCK(&eventBufMutex);
  #if defined(THREADED_RTS)
 -    Task *task = getMyTask();
 +    Task *task = newBoundTask();
      stopAllCapabilitiesWith(cap, task, SYNC_FLUSH_EVENT_LOG);
      flushAllCapsEventsBufs();
      releaseAllCapabilities(getNumCapabilities(), cap ? *cap : NULL, task);
 +    exitMyTask();
  #else
      flushLocalEventsBuf(getCapability(0));
  #endif

testsuite/tests/rts/all.T

@@ -2,6 +2,11 @@ test('testblockalloc',
       [c_src, only_ways(['normal','threaded1']), extra_run_opts('+RTS -I0')],
       compile_and_run, [''])
 +test('numeric_version_eventlog_flush',
 +     [ignore_stdout, req_ghc_with_threaded_rts],
 +     run_command,
 +     ['{compiler} --numeric-version +RTS -l --eventlog-flush-interval=1 -RTS'])
++
  test('testmblockalloc',
       [c_src, only_ways(['normal','threaded1']), extra_run_opts('+RTS -I0 -xr0.125T'),
        when(arch('wasm32'), skip)], # MBlocks can't be freed on wasm32, see Note [Megablock allocator on wasm] in rts

testsuite/tests/simplCore/should_compile/T26588.hs

 +module T26588 ( getOptionSettingFromText ) where
++
 +import           Control.Applicative ( Const(..) )
 +import           Data.Map (Map)
 +import qualified Data.Map.Strict as Map
++
 +------------------------------------------------------------------------
 +-- ConfigState
++
 +data ConfigLeaf
 +data ConfigTrie = ConfigTrie !(Maybe ConfigLeaf) !ConfigMap
++
 +type ConfigMap = Map Int ConfigTrie
++
 +freshLeaf :: [Int] -> ConfigLeaf -> ConfigTrie
 +freshLeaf [] l     = ConfigTrie (Just l) mempty
 +freshLeaf (a:as) l = ConfigTrie Nothing (Map.singleton a (freshLeaf as l))
++
 +adjustConfigTrie :: Functor t => [Int] -> (Maybe ConfigLeaf -> t (Maybe ConfigLeaf)) -> Maybe (ConfigTrie) -> t (Maybe ConfigTrie)
 +adjustConfigTrie     as f Nothing                 = fmap (freshLeaf as) <$> f Nothing
 +adjustConfigTrie (a:as) f (Just (ConfigTrie x m)) = Just . ConfigTrie x <$> adjustConfigMap a as f m
 +adjustConfigTrie     [] f (Just (ConfigTrie x m)) = g <$> f x
 +  where g Nothing | Map.null m = Nothing
 +        g x' = Just (ConfigTrie x' m)
++
 +adjustConfigMap :: Functor t => Int -> [Int] -> (Maybe ConfigLeaf -> t (Maybe ConfigLeaf)) -> ConfigMap -> t ConfigMap
 +adjustConfigMap a as f = Map.alterF (adjustConfigTrie as f) a
++
 +getOptionSettingFromText :: Int -> [Int] -> ConfigMap -> IO ()
 +getOptionSettingFromText p ps = getConst . adjustConfigMap p ps f
 +  where
 +    f _ = Const (return ())

testsuite/tests/simplCore/should_compile/T26589.hs

 +module T26589 ( executeTest ) where
++
 +-- base
 +import Data.Coerce ( coerce )
 +import Data.Foldable ( foldMap )
++
 +--------------------------------------------------------------------------------
++
 +newtype Traversal f = Traversal { getTraversal :: f () }
++
 +instance Applicative f => Semigroup (Traversal f) where
 +  Traversal f1 <> Traversal f2 = Traversal $ f1 *> f2
 +instance Applicative f => Monoid (Traversal f) where
 +  mempty = Traversal $ pure ()
++
 +newtype Seq a = Seq (FingerTree (Elem a))
 +newtype Elem a = Elem { getElem :: a }
++
 +data FingerTree a
 +    = EmptyT
 +    | Deep !a (FingerTree a) !a
++
 +executeTest :: Seq () -> IO ()
 +executeTest fins = destroyResources
 +  where
 +    destroyResources :: IO ()
 +    destroyResources =
 +      getTraversal $
 +        flip foldMap1 fins $ \ _ ->
 +          Traversal $ return ()
++
 +foldMap1 :: forall m a. Monoid m => (a -> m) -> Seq a -> m
 +foldMap1 = coerce (foldMap2 :: (Elem a -> m) -> FingerTree (Elem a) -> m)
++
 +foldMap2 :: Monoid m => (Elem a -> m) -> FingerTree (Elem a) -> m
 +foldMap2 _ EmptyT = mempty
 +foldMap2 f' (Deep pr' m' sf') = f' pr' <> foldMapTree f' m' <> f' sf'
 +      where
 +        foldMapTree :: Monoid m => (a -> m) -> FingerTree a -> m
 +        foldMapTree _ EmptyT = mempty
 +        foldMapTree f (Deep pr m sf) =
 +            f pr <>
 +            foldMapTree f m <>
 +            f sf

testsuite/tests/simplCore/should_compile/T8331.stderr

  ==================== Tidy Core rules ====================
 +"SPEC $c*> @(ST s) @_"
 +    forall (@s) (@r) ($dApplicative :: Applicative (ST s)).
 +      $fApplicativeReaderT_$c*> @(ST s) @r $dApplicative
 +      = ($fApplicativeReaderT2 @s @r)
 +        `cast` (forall (a ::~ <*>_N) (b ::~ <*>_N).
 +                <ReaderT r (ST s) a>_R
 +                ->_R <ReaderT r (ST s) b>_R
 +                ->_R <r>_R ->_R Sym (N:ST <s>_N <b>_R)
 +                     ; Sym (N:ReaderT <*>_N <r>_R <ST s>_R <b>_N)
 +                :: Coercible
 +                     (forall a b.
 +                      ReaderT r (ST s) a -> ReaderT r (ST s) b -> r -> STRep s b)
 +                     (forall a b.
 +                      ReaderT r (ST s) a -> ReaderT r (ST s) b -> ReaderT r (ST s) b))
 +"SPEC $c<$ @(ST s) @_"
 +    forall (@s) (@r) ($dFunctor :: Functor (ST s)).
 +      $fFunctorReaderT_$c<$ @(ST s) @r $dFunctor
 +      = ($fApplicativeReaderT6 @s @r)
 +        `cast` (forall (a ::~ <*>_N) (b ::~ <*>_N).
 +                <a>_R
 +                ->_R <ReaderT r (ST s) b>_R
 +                ->_R <r>_R ->_R Sym (N:ST <s>_N <a>_R)
 +                     ; Sym (N:ReaderT <*>_N <r>_R <ST s>_R <a>_N)
 +                :: Coercible
 +                     (forall a b. a -> ReaderT r (ST s) b -> r -> STRep s a)
 +                     (forall a b. a -> ReaderT r (ST s) b -> ReaderT r (ST s) a))
 +"SPEC $c<* @(ST s) @_"
 +    forall (@s) (@r) ($dApplicative :: Applicative (ST s)).
 +      $fApplicativeReaderT_$c<* @(ST s) @r $dApplicative
 +      = ($fApplicativeReaderT1 @s @r)
 +        `cast` (forall (a ::~ <*>_N) (b ::~ <*>_N).
 +                <ReaderT r (ST s) a>_R
 +                ->_R <ReaderT r (ST s) b>_R
 +                ->_R <r>_R ->_R Sym (N:ST <s>_N <a>_R)
 +                     ; Sym (N:ReaderT <*>_N <r>_R <ST s>_R <a>_N)
 +                :: Coercible
 +                     (forall a b.
 +                      ReaderT r (ST s) a -> ReaderT r (ST s) b -> r -> STRep s a)
 +                     (forall a b.
 +                      ReaderT r (ST s) a -> ReaderT r (ST s) b -> ReaderT r (ST s) a))
 +"SPEC $c<*> @(ST s) @_"
 +    forall (@s) (@r) ($dApplicative :: Applicative (ST s)).
 +      $fApplicativeReaderT9 @(ST s) @r $dApplicative
 +      = ($fApplicativeReaderT4 @s @r)
 +        `cast` (forall (a ::~ <*>_N) (b ::~ <*>_N).
 +                <ReaderT r (ST s) (a -> b)>_R
 +                ->_R <ReaderT r (ST s) a>_R
 +                ->_R <r>_R
 +                ->_R Sym (N:ST <s>_N <b>_R)
 +                :: Coercible
 +                     (forall a b.
 +                      ReaderT r (ST s) (a -> b) -> ReaderT r (ST s) a -> r -> STRep s b)
 +                     (forall a b.
 +                      ReaderT r (ST s) (a -> b) -> ReaderT r (ST s) a -> r -> ST s b))
 +"SPEC $c>> @(ST s) @_"
 +    forall (@s) (@r) ($dMonad :: Monad (ST s)).
 +      $fMonadReaderT1 @(ST s) @r $dMonad
 +      = $fMonadAbstractIOSTReaderT_$s$c>> @s @r
 +"SPEC $c>>= @(ST s) @_"
 +    forall (@s) (@r) ($dMonad :: Monad (ST s)).
 +      $fMonadReaderT2 @(ST s) @r $dMonad
 +      = ($fMonadAbstractIOSTReaderT2 @s @r)
 +        `cast` (forall (a ::~ <*>_N) (b ::~ <*>_N).
 +                <ReaderT r (ST s) a>_R
 +                ->_R <a -> ReaderT r (ST s) b>_R
 +                ->_R <r>_R
 +                ->_R Sym (N:ST <s>_N <b>_R)
 +                :: Coercible
 +                     (forall a b.
 +                      ReaderT r (ST s) a -> (a -> ReaderT r (ST s) b) -> r -> STRep s b)
 +                     (forall a b.
 +                      ReaderT r (ST s) a -> (a -> ReaderT r (ST s) b) -> r -> ST s b))
 +"SPEC $cfmap @(ST s) @_"
 +    forall (@s) (@r) ($dFunctor :: Functor (ST s)).
 +      $fFunctorReaderT_$cfmap @(ST s) @r $dFunctor
 +      = ($fApplicativeReaderT7 @s @r)
 +        `cast` (forall (a ::~ <*>_N) (b ::~ <*>_N).
 +                <a -> b>_R
 +                ->_R <ReaderT r (ST s) a>_R
 +                ->_R <r>_R ->_R Sym (N:ST <s>_N <b>_R)
 +                     ; Sym (N:ReaderT <*>_N <r>_R <ST s>_R <b>_N)
 +                :: Coercible
 +                     (forall a b. (a -> b) -> ReaderT r (ST s) a -> r -> STRep s b)
 +                     (forall a b. (a -> b) -> ReaderT r (ST s) a -> ReaderT r (ST s) b))
 +"SPEC $cliftA2 @(ST s) @_"
 +    forall (@s) (@r) ($dApplicative :: Applicative (ST s)).
 +      $fApplicativeReaderT_$cliftA2 @(ST s) @r $dApplicative
 +      = ($fApplicativeReaderT3 @s @r)
 +        `cast` (forall (a ::~ <*>_N) (b ::~ <*>_N) (c ::~ <*>_N).
 +                <a -> b -> c>_R
 +                ->_R <ReaderT r (ST s) a>_R
 +                ->_R <ReaderT r (ST s) b>_R
 +                ->_R <r>_R ->_R Sym (N:ST <s>_N <c>_R)
 +                     ; Sym (N:ReaderT <*>_N <r>_R <ST s>_R <c>_N)
 +                :: Coercible
 +                     (forall a b c.
 +                      (a -> b -> c)
 +                      -> ReaderT r (ST s) a -> ReaderT r (ST s) b -> r -> STRep s c)
 +                     (forall a b c.
 +                      (a -> b -> c)
 +                      -> ReaderT r (ST s) a -> ReaderT r (ST s) b -> ReaderT r (ST s) c))
 +"SPEC $cp1Applicative @(ST s) @_"
 +    forall (@s) (@r) ($dApplicative :: Applicative (ST s)).
 +      $fApplicativeReaderT_$cp1Applicative @(ST s) @r $dApplicative
 +      = $fApplicativeReaderT_$s$fFunctorReaderT @s @r
 +"SPEC $cp1Monad @(ST s) @_"
 +    forall (@s) (@r) ($dMonad :: Monad (ST s)).
 +      $fMonadReaderT_$cp1Monad @(ST s) @r $dMonad
 +      = $fApplicativeReaderT_$s$fApplicativeReaderT @s @r
 +"SPEC $cpure @(ST s) @_"
 +    forall (@s) (@r) ($dApplicative :: Applicative (ST s)).
 +      $fApplicativeReaderT_$cpure @(ST s) @r $dApplicative
 +      = ($fApplicativeReaderT5 @s @r)
 +        `cast` (forall (a ::~ <*>_N).
 +                <a>_R
 +                ->_R <r>_R ->_R Sym (N:ST <s>_N <a>_R)
 +                     ; Sym (N:ReaderT <*>_N <r>_R <ST s>_R <a>_N)
 +                :: Coercible
 +                     (forall a. a -> r -> STRep s a)
 +                     (forall a. a -> ReaderT r (ST s) a))
 +"SPEC $creturn @(ST s) @_"
 +    forall (@s) (@r) ($dMonad :: Monad (ST s)).
 +      $fMonadReaderT_$creturn @(ST s) @r $dMonad
 +      = ($fApplicativeReaderT5 @s @r)
 +        `cast` (forall (a ::~ <*>_N).
 +                <a>_R
 +                ->_R <r>_R ->_R Sym (N:ST <s>_N <a>_R)
 +                     ; Sym (N:ReaderT <*>_N <r>_R <ST s>_R <a>_N)
 +                :: Coercible
 +                     (forall a. a -> r -> STRep s a)
 +                     (forall a. a -> ReaderT r (ST s) a))
 +"SPEC $fApplicativeReaderT @(ST s) @_"
 +    forall (@s) (@r) ($dApplicative :: Applicative (ST s)).
 +      $fApplicativeReaderT @(ST s) @r $dApplicative
 +      = $fApplicativeReaderT_$s$fApplicativeReaderT @s @r
 +"SPEC $fFunctorReaderT @(ST s) @_"
 +    forall (@s) (@r) ($dFunctor :: Functor (ST s)).
 +      $fFunctorReaderT @(ST s) @r $dFunctor
 +      = $fApplicativeReaderT_$s$fFunctorReaderT @s @r
 +"SPEC $fMonadReaderT @(ST s) @_"
 +    forall (@s) (@r) ($dMonad :: Monad (ST s)).
 +      $fMonadReaderT @(ST s) @r $dMonad
 +      = $fMonadAbstractIOSTReaderT_$s$fMonadReaderT @s @r
  "USPEC useAbstractMonad @(ReaderT Int (ST s))"
      forall (@s)
             ($dMonadAbstractIOST :: MonadAbstractIOST (ReaderT Int (ST s))).

testsuite/tests/simplCore/should_compile/all.T

@@ -544,6 +544,9 @@ test('T25883b', normal, compile_grep_core, [''])
  test('T25883c', normal, compile_grep_core, [''])
  test('T25883d', [extra_files(['T25883d_import.hs'])], multimod_compile_filter, ['T25883d', '-O -ddump-simpl -dno-typeable-binds -dsuppress-all -dsuppress-uniques', r'grep -e "y ="'])
 +test('T26588', normal, compile, ['-package containers -O'])
 +test('T26589', normal, compile, ['-O'])
++
  test('T25976', [grep_errmsg('Dead Code')], compile, ['-O -ddump-simpl -dsuppress-uniques -dno-typeable-binds'])
  test('T25965', normal, compile, ['-O'])

testsuite/tests/unboxedsums/UbxSumUnpackedSize.hs

@@ -133,6 +133,124 @@ data U_E1 = U_E1 {-# UNPACK #-} !E1
                   {-# UNPACK #-} !Int8
      deriving (Show)
 +{- In `data U_E`, the {-# UNPACK #-} !E1 gives rise to a pretty clumsy expression
 +   for the wrapper for U_E1. Here is what it looks like when ther are only 16
 +   data constructors in E1, and we have just
 +       data U_E1 = U_E1 {-# UNPACK #-} !E1
 +   Blimey!
++
 +Main.$WU_E1
 +  = \ (conrep_t1N4 [Occ=Once1!] :: Main.E1) ->
 +      case case conrep_t1N4 of {
 +             Main.E1_1 ->
 +               GHC.Internal.Types.(# _| | | | | | | | | | | | | | | #)
 +                 @GHC.Internal.Types.ZeroBitRep
 +                 @GHC.Internal.Types.ZeroBitRep
 +                 @GHC.Internal.Types.ZeroBitRep
 +                 @GHC.Internal.Types.ZeroBitRep
 +                 @GHC.Internal.Types.ZeroBitRep
 +                 @GHC.Internal.Types.ZeroBitRep
 +                 @GHC.Internal.Types.ZeroBitRep
 +                 @GHC.Internal.Types.ZeroBitRep
 +                 @GHC.Internal.Types.ZeroBitRep
 +                 @GHC.Internal.Types.ZeroBitRep
 +                 @GHC.Internal.Types.ZeroBitRep
 +                 @GHC.Internal.Types.ZeroBitRep
 +                 @GHC.Internal.Types.ZeroBitRep
 +                 @GHC.Internal.Types.ZeroBitRep
 +                 @GHC.Internal.Types.ZeroBitRep
 +                 @GHC.Internal.Types.ZeroBitRep
 +                 @(# #)
 +                 @(# #)
 +                 @(# #)
 +                 @(# #)
 +                 @(# #)
 +                 @(# #)
 +                 @(# #)
 +                 @(# #)
 +                 @(# #)
 +                 @(# #)
 +                 @(# #)
 +                 @(# #)
 +                 @(# #)
 +                 @(# #)
 +                 @(# #)
 +                 @(# #)
 +                 GHC.Internal.Types.(##);
 +             Main.E1_2 ->
 +               GHC.Internal.Types.(# |_| | | | | | | | | | | | | | #)
 +                 @GHC.Internal.Types.ZeroBitRep
 +                 @GHC.Internal.Types.ZeroBitRep
 +                 @GHC.Internal.Types.ZeroBitRep
 +                 @GHC.Internal.Types.ZeroBitRep
 +                 @GHC.Internal.Types.ZeroBitRep
 +                 @GHC.Internal.Types.ZeroBitRep
 +                 @GHC.Internal.Types.ZeroBitRep
 +                 @GHC.Internal.Types.ZeroBitRep
 +                 @GHC.Internal.Types.ZeroBitRep
 +                 @GHC.Internal.Types.ZeroBitRep
 +                 @GHC.Internal.Types.ZeroBitRep
 +                 @GHC.Internal.Types.ZeroBitRep
 +                 @GHC.Internal.Types.ZeroBitRep
 +                 @GHC.Internal.Types.ZeroBitRep
 +                 @GHC.Internal.Types.ZeroBitRep
 +                 @GHC.Internal.Types.ZeroBitRep
 +                 @(# #)
 +                 @(# #)
 +                 @(# #)
 +                 @(# #)
 +                 @(# #)
 +                 @(# #)
 +                 @(# #)
 +                 @(# #)
 +                 @(# #)
 +                 @(# #)
 +                 @(# #)
 +                 @(# #)
 +                 @(# #)
 +                 @(# #)
 +                 @(# #)
 +                 @(# #)
 +                 GHC.Internal.Types.(##);
 +             Main.E1_3 ->
 +               GHC.Internal.Types.(# | |_| | | | | | | | | | | | | #)
 +                 @GHC.Internal.Types.ZeroBitRep
 +                 @GHC.Internal.Types.ZeroBitRep
 +                 @GHC.Internal.Types.ZeroBitRep
 +                 @GHC.Internal.Types.ZeroBitRep
 +                 @GHC.Internal.Types.ZeroBitRep
 +                 @GHC.Internal.Types.ZeroBitRep
 +                 @GHC.Internal.Types.ZeroBitRep
 +                 @GHC.Internal.Types.ZeroBitRep
 +                 @GHC.Internal.Types.ZeroBitRep
 +                 @GHC.Internal.Types.ZeroBitRep
 +                 @GHC.Internal.Types.ZeroBitRep
 +                 @GHC.Internal.Types.ZeroBitRep
 +                 @GHC.Internal.Types.ZeroBitRep
 +                 @GHC.Internal.Types.ZeroBitRep
 +                 @GHC.Internal.Types.ZeroBitRep
 +                 @GHC.Internal.Types.ZeroBitRep
 +                 @(# #)
 +                 @(# #)
 +                 @(# #)
 +                 @(# #)
 +                 @(# #)
 +                 @(# #)
 +                 @(# #)
 +                 @(# #)
 +                 @(# #)
 +                 @(# #)
 +                 @(# #)
 +                 @(# #)
 +                 @(# #)
 +                 @(# #)
 +                 @(# #)
 +                 @(# #)
 +                 GHC.Internal.Types.(##);
++
 +       ... etc ....
 +-}
++
  data U_E2 = U_E2 {-# UNPACK #-} !E2
                   {-# UNPACK #-} !Int8
                   {-# UNPACK #-} !Int8

utils/check-exact/ExactPrint.hs

@@ -19,6 +19,13 @@
  {-# LANGUAGE UndecidableInstances  #-} -- For the (StmtLR GhcPs GhcPs (LocatedA (body GhcPs))) ExactPrint instance
  {-# OPTIONS_GHC -Wno-incomplete-uni-patterns -Wno-incomplete-record-updates #-}
 +-- We switch off specialisation in this module. Otherwise we get lots of functions
 +-- specialised on lots of (GHC syntax tree) data types.  Compilation time allocation
 +-- (at least with -fpolymorphic-specialisation; see !15058) blows up from 17G to 108G.
 +-- Bad! ExactPrint is not a performance-critical module so it's not worth taking the
 +-- largely-fruitless hit in compile time.
 +{-# OPTIONS_GHC -fno-specialise #-}
++
  module ExactPrint
+   (
      ExactPrint(..)

utils/haddock/haddock-test/src/Test/Haddock.hs

@@ -8,7 +8,6 @@ module Test.Haddock
    ) where
  import Control.Monad
 -import qualified Data.ByteString.Char8 as BS
  import qualified Data.Map.Strict as Map
  import Data.Foldable (for_)
  import Data.Maybe
@@ -211,7 +210,7 @@ checkFile cfg file = do
      ccfg = cfgCheckConfig cfg
      dcfg = cfgDirConfig cfg
 --- We use ByteString here to ensure that no lazy I/O is performed.
 +-- We use readFile' here to ensure that no lazy I/O is performed.
  -- This way to ensure that the reference file isn't held open in
  -- case after `diffFile` (which is problematic if we need to rewrite
  -- the reference file in `maybeAcceptFile`)
@@ -219,8 +218,8 @@ checkFile cfg file = do
  -- | Read the reference artifact for a test
  readRef :: Config c -> FilePath -> IO (Maybe c)
  readRef cfg file =
 -  ccfgRead ccfg . BS.unpack
 -    <$> BS.readFile (refFile dcfg file)
 +  ccfgRead ccfg
 +    <$> readFile' (refFile dcfg file)
    where
      ccfg = cfgCheckConfig cfg
      dcfg = cfgDirConfig cfg
@@ -228,8 +227,8 @@ readRef cfg file =
  -- | Read (and clean) the test output artifact for a test
  readOut :: Config c -> (DirConfig -> FilePath) -> FilePath -> IO c
  readOut cfg dcfgDir file = do
 -  res <- fmap (ccfgClean ccfg file) . ccfgRead ccfg . BS.unpack
 -    <$> BS.readFile outFile
 +  res <- fmap (ccfgClean ccfg file) . ccfgRead ccfg
 +    <$> readFile' outFile
    case res of
      Just out -> return out
      Nothing -> error $ "Failed to parse output file: " ++ outFile