[Git][ghc/ghc][wip/T23109a] 10 commits: Fix lexing "\^\" (#25937)

1 May 2025

Simon Peyton Jones pushed to branch wip/T23109a at Glasgow Haskell Compiler / GHC


Commits:
6467d61e by Brandon Chinn at 2025-04-29T18:36:03-04:00
Fix lexing "\^\" (#25937)

This broke in the refactor in !13128, where the old code parsed escape
codes and collapsed string gaps at the same time, but the new code
collapsed gaps first, then resolved escape codes. The new code used a
naive heuristic to skip escaped backslashes, but didn't account for
"\^\".

- - - - -
99868a86 by Jens Petersen at 2025-04-29T18:36:44-04:00
hadrian: default selftest to disabled
- - - - -
aba2a4a5 by Zubin Duggal at 2025-04-30T06:35:59-04:00
get-win32-tarballs.py: List tarball files to be downloaded if we cannot find them

Fixes #25929

- - - - -
d99a617b by Ben Gamari at 2025-04-30T06:36:40-04:00
Move Data ModuleName instance to Language.Haskell.Syntax.Module.Name

Fixes #25968.

- - - - -
c28c68ed by Simon Peyton Jones at 2025-05-01T16:04:46+01:00
Improve the Simplifier

While working on #23109, I made two improvements to the Simplifier

* I found that the Simplifier was sometimes iterating more than it should.
  I fixed this by improving postInlineUnconditionally.

* I refactored tryCastWorkerWrapper.  It is now clearer, and does less
  repeated work.  This allowed me to call it from makeTrivial, which again
  does a bit more in one pass, elminating a potential extra Simplifier
  iteration

More care in postInline

Don't inline data con apps so vigorously

needs more docs

Wibbles

Be more careful in mkDupableContWithDmds

to not create a binding that will immediately be inlined

Do post-inline used-once bindings

This makes cacheprof not regress and seems generally a good plan

More eperiments

* Don't inline toplevel things so much
* Don't float constants so vigorously in the first float-out

Comments only

Refator GHC.Core.Opt.SetLevels.notWorthFloating

I refactored `notWorthFloating` while I was doing something else.

I don't think there's a change in behaviour, but if so it's very much
a corner case.

Always float bottoming expressions to the top

...regardless of floatConsts

Comments only

Wibble SetLevels

Try getting rid of this early-phase business

Don't float PAPs to top level

...and treat case alternatives as strict contexts

Wibble to postInlineUnconditionally

Small wibbles

Don't make error calls interesting.
Literals say True too isSaturatedConApp

Import wibble

Tiny change to saves_alloc

Float lambdas (and PAPs) out of lambdas to top level
This improves spectral/cse

But the old comment was
        -- is_con_app: don't float PAPs to the top; they may well end
        -- up getting eta-expanded and re-inlined
        -- E.g.   f = \x -> (++) ys
        -- If we float, then eta-expand we get
        --      lvl = (++) ys
        --      f = \x \zs -> lvl zs
        -- and now we'll inline lvl.  Silly.

Let's see what CI says

- - - - -
48a86d23 by Simon Peyton Jones at 2025-05-01T16:04:46+01:00
Specialise the (higher order) showSignedFloat

- - - - -
6ac35704 by Simon Peyton Jones at 2025-05-01T16:04:46+01:00
Eta reduce augment and its rules

... to match foldr.  I found this reduced some simplifer iterations

Fix `augment`!

- - - - -
d76c1929 by Simon Peyton Jones at 2025-05-01T16:04:46+01:00
Subtle change to OccurAnal

See (JP2) in Note [Occurrence analysis for join points]

- - - - -
fed7616e by Simon Peyton Jones at 2025-05-01T16:04:46+01:00
Accept some error message changes

* break011, break024: GHCi debugger output.  Not quite so good but
  @alt-romes says it's fine.  Very delicate tests, depend on fluky
  inlining.

* inline-check: an improvement!  After this patch we do one fewer
  iterations of the Simplifier.

- - - - -
fbdf66f1 by Simon Peyton Jones at 2025-05-01T16:04:46+01:00
Accept diff

* T18793: good: code is simpler and better

- - - - -


26 changed files:

- compiler/GHC/Core/Opt/Exitify.hs
- compiler/GHC/Core/Opt/OccurAnal.hs
- compiler/GHC/Core/Opt/Pipeline.hs
- compiler/GHC/Core/Opt/SetLevels.hs
- compiler/GHC/Core/Opt/Simplify/Env.hs
- compiler/GHC/Core/Opt/Simplify/Iteration.hs
- compiler/GHC/Core/Opt/Simplify/Utils.hs
- compiler/GHC/Core/Utils.hs
- compiler/GHC/Parser/Errors/Types.hs
- compiler/GHC/Parser/String.hs
- compiler/GHC/Unit/Types.hs
- compiler/Language/Haskell/Syntax/Module/Name.hs
- hadrian/README.md
- hadrian/hadrian.cabal
- libraries/ghc-internal/src/GHC/Internal/Base.hs
- libraries/ghc-internal/src/GHC/Internal/Float.hs
- m4/fp_setup_windows_toolchain.m4
- testsuite/tests/arityanal/should_compile/T18793.stderr
- testsuite/tests/driver/inline-check.stderr
- testsuite/tests/ghci.debugger/scripts/break011.stdout
- testsuite/tests/ghci.debugger/scripts/break024.stdout
- + testsuite/tests/parser/should_run/T25937.hs
- + testsuite/tests/parser/should_run/T25937.stdout
- testsuite/tests/parser/should_run/all.T
- + testsuite/tests/parser/should_run/parser_unit_tests.hs
- testsuite/tests/simplCore/should_run/simplrun009.hs


Changes:

=====================================
compiler/GHC/Core/Opt/Exitify.hs
=====================================
@@ -1,9 +1,7 @@
 module GHC.Core.Opt.Exitify ( exitifyProgram ) where
 
-{-
-Note [Exitification]
-~~~~~~~~~~~~~~~~~~~~
-
+{- Note [Exitification]
+~~~~~~~~~~~~~~~~~~~~~~~
 This module implements Exitification. The goal is to pull as much code out of
 recursive functions as possible, as the simplifier is better at inlining into
 call-sites that are not in recursive functions.
@@ -33,6 +31,9 @@ Example result:
   in …
 
 Now `t` is no longer in a recursive function, and good things happen!
+
+There is also a tricky connectionn with occurrence analysis:
+see (JP2) in Note [Occurrence analysis for join points] in GHC.Core.Opt.OccurAnal.
 -}
 
 import GHC.Prelude


=====================================
compiler/GHC/Core/Opt/OccurAnal.hs
=====================================
@@ -55,7 +55,7 @@ import GHC.Types.Tickish
 import GHC.Types.Var.Set
 import GHC.Types.Var.Env
 import GHC.Types.Var
-import GHC.Types.Demand ( argOneShots, argsOneShots, isDeadEndSig )
+import GHC.Types.Demand ( argOneShots, argsOneShots {- , isDeadEndSig -} )
 
 import GHC.Utils.Outputable
 import GHC.Utils.Panic
@@ -619,7 +619,7 @@ notice that `v` occurs at most once in any case branch; the occurrence analyser
 spots this and returns a OneOcc{ occ_n_br = 3 } for `v`.  Then the code in
 GHC.Core.Opt.Simplify.Utils.postInlineUnconditionally inlines `v` at its three
 use sites, and discards the let-binding.  That way, we avoid allocating `v` in
-the A,B,C branches (though we still compute it of course), and branch D
+*the A,B,C branches (though we still compute it of course), and branch D
 doesn't involve <small thunk> at all.  This sometimes makes a Really Big
 Difference.
 
@@ -687,9 +687,39 @@ Here are the consequences
   These are `andUDs` together in `addOccInfo`, and hence
   `v` gets ManyOccs, just as it should.  Clever!
 
-There are a couple of tricky wrinkles
+There are some tricky wrinkles
 
-(W1) Consider this example which shadows `j`:
+(JP1) What if the join point binding has a stable unfolding, or RULES?
+     They are just alternative right-hand sides, and at each call site we
+     will use only one of them. So again, we can use `orUDs` to combine
+     usage info from all these alternatives RHSs.
+
+(JP2) Consider this (test T21128):
+         joinrec jr x ys = case ys of
+                             []     -> v+1
+                             (x:xs) -> jr x xs
+   The exitification pass will carefully float out the exit path, thus:
+         join j x = v + 1 in
+         joinrec jr x ys = case ys of
+                             []     -> j x
+                             (x:xs) -> jr x xs
+   The idea is that now `v` may occur once, not under a lambda, and so may
+   be inlined.  See Note [Exitification] in GHC.Core.Opt.Exitify.
+
+   BUT if we "virtually inline" `j` at its occurrence side in `jr`, it'll
+   look as if `v` occurs under a lambda. Boo!  That defeats the entire
+   purpose of exitification!
+
+   Fortunately it is easy to fix.  In `lookupOccInfo` we can see if `n_br=0`.
+   If so, all the ocurrences of this Id came from non-recursive join points
+   (via the mechanism above) and so can't be involved in a loop.  So we do
+   not need to mark them as IsInsideLam.
+
+   This is a pretty subtle point!
+
+There are some other wrinkles to do with shadowing:
+
+(SW1) Consider this example which shadows `j`:
           join j = rhs in
           in case x of { K j -> ..j..; ... }
      Clearly when we come to the pattern `K j` we must drop the `j`
@@ -697,7 +727,7 @@ There are a couple of tricky wrinkles
 
      This is done by `drop_shadowed_joins` in `addInScope`.
 
-(W2) Consider this example which shadows `v`:
+(SW2) Consider this example which shadows `v`:
           join j = ...v...
           in case x of { K v -> ..j..; ... }
 
@@ -717,13 +747,13 @@ There are a couple of tricky wrinkles
      * In `postprcess_uds`, we add the chucked-out join points to the
        returned UsageDetails, with `andUDs`.
 
-(W3) Consider this example, which shadows `j`, but this time in an argument
+(SW3) Consider this example, which shadows `j`, but this time in an argument
               join j = rhs
               in f (case x of { K j -> ...; ... })
      We can zap the entire occ_join_points when looking at the argument,
      because `j` can't posibly occur -- it's a join point!  And the smaller
      occ_join_points is, the better.  Smaller to look up in mkOneOcc, and
-     more important, less looking-up when checking (W2).
+     more important, less looking-up when checking (SW2).
 
      This is done in setNonTailCtxt.  It's important /not/ to do this for
      join-point RHS's because of course `j` can occur there!
@@ -731,12 +761,7 @@ There are a couple of tricky wrinkles
      NB: this is just about efficiency: it is always safe /not/ to zap the
      occ_join_points.
 
-(W4) What if the join point binding has a stable unfolding, or RULES?
-     They are just alternative right-hand sides, and at each call site we
-     will use only one of them. So again, we can use `orUDs` to combine
-     usage info from all these alternatives RHSs.
-
-Wrinkles (W1) and (W2) are very similar to Note [Binder swap] (BS3).
+Wrinkles (SW1) and (SW2) are very similar to Note [Binder swap] (BS3).
 
 Note [Finding join points]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -985,7 +1010,7 @@ occAnalBind !env lvl ire (NonRec bndr rhs) thing_inside combine
   = -- Analyse the RHS and /then/ the body
     let -- Analyse the rhs first, generating rhs_uds
         !(rhs_uds_s, bndr', rhs') = occAnalNonRecRhs env lvl ire mb_join bndr rhs
-        rhs_uds = foldr1 orUDs rhs_uds_s   -- NB: orUDs.  See (W4) of
+        rhs_uds = foldr1 orUDs rhs_uds_s   -- NB: orUDs.  See (JP1) of
                                            -- Note [Occurrence analysis for join points]
 
         -- Now analyse the body, adding the join point
@@ -1096,14 +1121,14 @@ mkNonRecRhsCtxt lvl bndr unf
     certainly_inline -- See Note [Cascading inlines]
       = -- mkNonRecRhsCtxt is only used for non-join points, so occAnalBind
         -- has set the OccInfo for this binder before calling occAnalNonRecRhs
+        -- Distressing delicacy ... has to line up with preInlineUnconditionally
         case idOccInfo bndr of
           OneOcc { occ_in_lam = NotInsideLam, occ_n_br = 1 }
-            -> active && not stable_unf && not top_bottoming
+            -> active && not (isTopLevel lvl) && not stable_unf
           _ -> False
 
     active     = isAlwaysActive (idInlineActivation bndr)
     stable_unf = isStableUnfolding unf
-    top_bottoming = isTopLevel lvl && isDeadEndId bndr
 
 -----------------
 occAnalRecBind :: OccEnv -> TopLevelFlag -> ImpRuleEdges -> [(Var,CoreExpr)]
@@ -2580,8 +2605,9 @@ occAnalArgs !env fun args !one_shots
 
     -- Make bottoming functions interesting
     -- See Note [Bottoming function calls]
-    encl | Var f <- fun, isDeadEndSig (idDmdSig f) = OccScrut
-         | otherwise                               = OccVanilla
+--    encl | Var f <- fun, isDeadEndSig (idDmdSig f) = OccScrut
+--         | otherwise                               = OccVanilla
+    encl = OccVanilla
 
     go uds fun [] _ = WUD uds fun
     go uds fun (arg:args) one_shots
@@ -2606,7 +2632,8 @@ Consider
    let x = (a,b) in
    case p of
       A -> ...(error x)..
-      B -> ...(ertor x)...
+      B -> ...(error x)...
+      C -> ..blah...
 
 postInlineUnconditionally may duplicate x's binding, but sometimes it
 does so only if the use site IsInteresting.  Pushing allocation into error
@@ -2616,6 +2643,9 @@ setting occ_encl = OccScrut for such calls.
 The slightly-artificial test T21128 is a good example.  It's probably
 not a huge deal.
 
+ToDo!!!  Fix comment.   Now postinlineUnconditionally ignores intersting-ness for
+non-top-level things.
+
 Note [Arguments of let-bound constructors]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Consider
@@ -2979,7 +3009,7 @@ mkRhsOccEnv env@(OccEnv { occ_one_shots = ctxt_one_shots, occ_join_points = ctxt
 
 zapJoinPointInfo :: JoinPointInfo -> JoinPointInfo
 -- (zapJoinPointInfo jp_info) basically just returns emptyVarEnv (hence zapped).

--- See (W3) of Note [Occurrence analysis for join points]
+-- See (SW3) of Note [Occurrence analysis for join points]
 --
 -- Zapping improves efficiency, slightly, if you accidentally introduce a bug,
 -- in which you zap [jx :-> uds] and then find an occurrence of jx anyway, you
@@ -3086,7 +3116,7 @@ preprocess_env env@(OccEnv { occ_join_points = join_points
       = env { occ_bs_env = swap_env `minusUFM` bndr_fm }
 
     drop_shadowed_joins :: OccEnv -> OccEnv
-    -- See Note [Occurrence analysis for join points] wrinkle2 (W1) and (W2)
+    -- See Note [Occurrence analysis for join points] wrinkles (SW1) and (SW2)
     drop_shadowed_joins env = env { occ_join_points = emptyVarEnv }
 
     -- bad_joins is true if it would be wrong to push occ_join_points inwards
@@ -3111,7 +3141,7 @@ postprocess_uds bndrs bad_joins uds
     add_bad_joins :: UsageDetails -> UsageDetails
     -- Add usage info for occ_join_points that we cannot push inwards
     -- because of shadowing
-    -- See Note [Occurrence analysis for join points] wrinkle (W2)
+    -- See Note [Occurrence analysis for join points] wrinkle (SW2)
     add_bad_joins uds
        | isEmptyVarEnv bad_joins = uds
        | otherwise               = modifyUDEnv extend_with_bad_joins uds
@@ -3792,8 +3822,11 @@ lookupOccInfoByUnique (UD { ud_env       = env
                     , occ_int_cxt = int_cxt
                     , occ_tail    = mk_tail_info tail_info }
          where
-           in_lam | uniq `elemVarEnvByKey` z_in_lam = IsInsideLam
-                  | otherwise                       = NotInsideLam
+           in_lam | uniq `elemVarEnvByKey` z_in_lam
+                  , n_br > 0     -- n_br>0: see (JP2) in
+                  = IsInsideLam  -- Note [Occurrence analysis for join points]
+                  | otherwise
+                  = NotInsideLam
 
       Just (ManyOccL tail_info) -> ManyOccs { occ_tail = mk_tail_info tail_info }
   where
@@ -3801,8 +3834,6 @@ lookupOccInfoByUnique (UD { ud_env       = env
         | uniq `elemVarEnvByKey` z_tail = NoTailCallInfo
         | otherwise                     = ti
 
-
-
 -------------------
 -- See Note [Adjusting right-hand sides]
 


=====================================
compiler/GHC/Core/Opt/Pipeline.hs
=====================================
@@ -217,7 +217,7 @@ getCoreToDo dflags hpt_rule_base extra_vars
         if full_laziness then
            CoreDoFloatOutwards $ FloatOutSwitches
                 { floatOutLambdas     = Just 0
-                , floatOutConstants   = True
+                , floatOutConstants   = False  -- Initially
                 , floatOutOverSatApps = False
                 , floatToTopLevelOnly = False
                 , floatJoinsToTop     = False  -- Initially, don't float join points at all
@@ -284,7 +284,7 @@ getCoreToDo dflags hpt_rule_base extra_vars
         --        f_el22 (f_el21 r_midblock)
         runWhen full_laziness $ CoreDoFloatOutwards $ FloatOutSwitches
                { floatOutLambdas     = floatLamArgs dflags
-               , floatOutConstants   = True
+               , floatOutConstants   = True  -- For SpecConstr and CSE
                , floatOutOverSatApps = True
                , floatToTopLevelOnly = False
                , floatJoinsToTop     = True },


=====================================
compiler/GHC/Core/Opt/SetLevels.hs
=====================================
@@ -406,7 +406,7 @@ lvlApp env orig_expr ((_,AnnVar fn), args)
   , arity < n_val_args
   , Nothing <- isClassOpId_maybe fn
   =  do { rargs' <- mapM (lvlNonTailMFE env False) rargs
-        ; lapp'  <- lvlNonTailMFE env False lapp
+        ; lapp'  <- lvlNonTailMFE env True lapp
         ; return (foldl' App lapp' rargs') }
 
   | otherwise
@@ -482,14 +482,14 @@ Consider this:
   f :: T Int -> blah
   f x vs = case x of { MkT y ->
              let f vs = ...(case y of I# w -> e)...f..
-             in f vs
+             in f vs }
 
 Here we can float the (case y ...) out, because y is sure
 to be evaluated, to give
   f x vs = case x of { MkT y ->
-           case y of I# w ->
+           case y of { I# w ->
              let f vs = ...(e)...f..
-             in f vs
+             in f vs }}
 
 That saves unboxing it every time round the loop.  It's important in
 some DPH stuff where we really want to avoid that repeated unboxing in
@@ -614,7 +614,8 @@ lvlMFE env strict_ctxt e@(_, AnnCase {})
   = lvlExpr env e     -- See Note [Case MFEs]
 
 lvlMFE env strict_ctxt ann_expr
-  | not float_me
+  |  notWorthFloating expr abs_vars
+  || not float_me
   || floatTopLvlOnly env && not (isTopLvl dest_lvl)
          -- Only floating to the top level is allowed.
   || hasFreeJoin env fvs   -- If there is a free join, don't float
@@ -623,9 +624,6 @@ lvlMFE env strict_ctxt ann_expr
          -- We can't let-bind an expression if we don't know
          -- how it will be represented at runtime.
          -- See Note [Representation polymorphism invariants] in GHC.Core
-  || notWorthFloating expr abs_vars
-         -- Test notWorhtFloating last;
-         -- See Note [Large free-variable sets]
   = -- Don't float it out
     lvlExpr env ann_expr
 
@@ -676,12 +674,11 @@ lvlMFE env strict_ctxt ann_expr
     is_function  = isFunction ann_expr
     mb_bot_str   = exprBotStrictness_maybe expr
                            -- See Note [Bottoming floats]
-                           -- esp Bottoming floats (2)
+                           -- esp Bottoming floats (BF2)
     expr_ok_for_spec = exprOkForSpeculation expr
     abs_vars = abstractVars dest_lvl env fvs
     dest_lvl = destLevel env fvs fvs_ty is_function is_bot_lam
-               -- NB: is_bot_lam not is_bot; see (3) in
-               --     Note [Bottoming floats]
+               -- NB: is_bot_lam not is_bot; see (BF2) in Note [Bottoming floats]
 
     -- float_is_new_lam: the floated thing will be a new value lambda
     -- replacing, say (g (x+4)) by (lvl x).  No work is saved, nor is
@@ -698,20 +695,32 @@ lvlMFE env strict_ctxt ann_expr
 
         -- A decision to float entails let-binding this thing, and we only do
         -- that if we'll escape a value lambda, or will go to the top level.
+        -- Never float trivial expressions;
+        --   notably, save_work might be true of a lone evaluated variable.
     float_me = saves_work || saves_alloc || is_mk_static
 
     -- See Note [Saving work]
-    saves_work = escapes_value_lam        -- (a)
-                 && not (exprIsHNF expr)  -- (b)
-                 && not float_is_new_lam  -- (c)
+    is_hnf = exprIsHNF expr
+    saves_work = escapes_value_lam        -- (SW-a)
+                 && not is_hnf            -- (SW-b)
+                 && not float_is_new_lam  -- (SW-c)
     escapes_value_lam = dest_lvl `ltMajLvl` (le_ctxt_lvl env)
 
-    -- See Note [Saving allocation] and Note [Floating to the top]
-    saves_alloc =  isTopLvl dest_lvl
-                && floatConsts env
-                && (   not strict_ctxt                     -- (a)
-                    || exprIsHNF expr                      -- (b)
-                    || (is_bot_lam && escapes_value_lam))  -- (c)
+    -- See Note [Floating to the top]
+--    is_con_app = isSaturatedConApp expr  -- True of literal strings too
+    saves_alloc = isTopLvl dest_lvl
+               && (escapes_value_lam || floatConsts env)
+                  -- Always float allocation out of a value lambda
+                  --    if it gets to top level
+               && (not strict_ctxt || is_hnf || is_bot_lam)
+        -- is_con_app: don't float PAPs to the top; they may well end
+        -- up getting eta-expanded and re-inlined
+        -- E.g.   f = \x -> (++) ys
+        -- If we float, then eta-expand we get
+        --      lvl = (++) ys
+        --      f = \x \zs -> lvl zs
+        -- and now we'll inline lvl.  Silly.
+
 
 hasFreeJoin :: LevelEnv -> DVarSet -> Bool
 -- Has a free join point which is not being floated to top level.
@@ -726,22 +735,22 @@ hasFreeJoin env fvs
 The key idea in let-floating is to
   * float a redex out of a (value) lambda
 Doing so can save an unbounded amount of work.
-But see also Note [Saving allocation].
+But see also Note [Floating to the top].
 
 So we definitely float an expression out if
-(a) It will escape a value lambda (escapes_value_lam)
-(b) The expression is not a head-normal form (exprIsHNF); see (SW1, SW2).
-(c) Floating does not require wrapping it in value lambdas (float_is_new_lam).
+(SW-a) It will escape a value lambda (escapes_value_lam)
+(SW-b) The expression is not a head-normal form (exprIsHNF); see (SW1, SW2).
+(SW-c) Floating does not require wrapping it in value lambdas (float_is_new_lam).
     See (SW3) below
 
 Wrinkles:
 
-(SW1) Concerning (b) I experimented with using `exprIsCheap` rather than
+(SW1) Concerning (SW-b) I experimented with using `exprIsCheap` rather than
       `exprIsHNF` but the latter seems better, according to nofib
       (`spectral/mate` got 10% worse with exprIsCheap).  It's really a bit of a
       heuristic.
 
-(SW2) What about omitting (b), and hence floating HNFs as well?  The danger of
+(SW2) What about omitting (SW-b), and hence floating HNFs as well?  The danger of
       doing so is that we end up floating out a HNF from a cold path (where it
       might never get allocated at all) and allocating it all the time
       regardless.  Example
@@ -760,7 +769,7 @@ Wrinkles:
        - Occasionally decreases runtime allocation (T12996 -2.5%)
        - Slightly mixed effect on nofib: (puzzle -10%, mate -5%, cichelli +5%)
          but geometric mean is -0.09%.
-      Overall, a win.
+      Overall, a small win.
 
 (SW3) Concerning (c), if we are wrapping the thing in extra value lambdas (in
       abs_vars), then nothing is saved.  E.g.
@@ -771,10 +780,12 @@ Wrinkles:
       we have saved nothing: one pair will still be allocated for each
       call of `f`.  Hence the (not float_is_new_lam) in saves_work.
 
-Note [Saving allocation]
-~~~~~~~~~~~~~~~~~~~~~~~~
+Note [Floating to the top]
+~~~~~~~~~~~~~~~~~~~~~~~~~~
 Even if `saves_work` is false, we we may want to float even cheap/HNF
-expressions out of value lambdas, for several reasons:
+expressions out of value lambdas. Data suggests, however, that it is better
+/only/ to do so, /if/ they can go to top level. If the expression goes to top
+level we don't pay the cost of allocating cold-path thunks described in (SW2).
 
 * Doing so may save allocation. Consider
         f = \x.  .. (\y.e) ...
@@ -782,6 +793,11 @@ expressions out of value lambdas, for several reasons:
   (assuming e does not mention x). An example where this really makes a
   difference is simplrun009.
 
+* In principle this would be true even if the (\y.e) didn't go to top level; but
+  in practice we only float a HNF if it goes all way to the top.  We don't pay
+  /any/ allocation cost for a top-level floated expression; it just becomes
+  static data.
+
 * It may allow SpecContr to fire on functions. Consider
         f = \x. ....(f (\y.e))....
   After floating we get
@@ -793,21 +809,7 @@ expressions out of value lambdas, for several reasons:
   a big difference for string literals and bottoming expressions: see Note
   [Floating to the top]
 
-Data suggests, however, that it is better /only/ to float HNFS, /if/ they can go
-to top level. See (SW2) of Note [Saving work].  If the expression goes to top
-level we don't pay the cost of allocating cold-path thunks described in (SW2).
-
-Hence `isTopLvl dest_lvl` in `saves_alloc`.
-
-Note [Floating to the top]
-~~~~~~~~~~~~~~~~~~~~~~~~~~
-Even though Note [Saving allocation] suggests that we should not, in
-general, float HNFs, the balance change if it goes to the top:
-
-* We don't pay an allocation cost for the floated expression; it
-  just becomes static data.
-
-* Floating string literal is valuable -- no point in duplicating the
+* Floating string literals is valuable -- no point in duplicating the
   at each call site!
 
 * Floating bottoming expressions is valuable: they are always cold
@@ -815,32 +817,32 @@ general, float HNFs, the balance change if it goes to the top:
   can be quite big, inhibiting inlining. See Note [Bottoming floats]
 
 So we float an expression to the top if:
-  (a) the context is lazy (so we get allocation), or
-  (b) the expression is a HNF (so we get allocation), or
-  (c) the expression is bottoming and floating would escape a
-      value lambda (NB: if the expression itself is a lambda, (b)
-      will apply; so this case only catches bottoming thunks)
+  (FT1) the context is lazy (so we get allocation), or
+  (FT2) the expression is a HNF (so we get allocation), or
+  (FT3) the expression is bottoming and floating would escape a
+         value lambda (NB: if the expression itself is a lambda, (b)
+         will apply; so this case only catches bottoming thunks)
 
 Examples:
 
-* (a) Strict.  Case scrutinee
+* (FT1) Strict.  Case scrutinee
       f = case g True of ....
   Don't float (g True) to top level; then we have the admin of a
   top-level thunk to worry about, with zero gain.
 
-* (a) Strict.  Case alternative
+* (FT1) Strict.  Case alternative
       h = case y of
              True  -> g True
              False -> False
   Don't float (g True) to the top level
 
-* (b) HNF
+* (FT2) HNF
       f = case y of
             True  -> p:q
             False -> blah
   We may as well float the (p:q) so it becomes a static data structure.
 
-* (c) Bottoming expressions; see also Note [Bottoming floats]
+* (FT3) Bottoming expressions; see also Note [Bottoming floats]
       f x = case x of
               0 -> error <big thing>
               _ -> x+1
@@ -853,7 +855,7 @@ Examples:
   'foo' anyway.  So float bottoming things only if they escape
   a lambda.
 
-* Arguments
+* (FT4) Arguments
      t = f (g True)
   Prior to Apr 22 we didn't float (g True) to the top if f was strict.
   But (a) this only affected CAFs, because if it escapes a value lambda
@@ -868,28 +870,6 @@ early loses opportunities for RULES which (needless to say) are
 important in some nofib programs (gcd is an example).  [SPJ note:
 I think this is obsolete; the flag seems always on.]
 
-Note [Large free-variable sets]
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-In #24471 we had something like
-     x1 = I# 1
-     ...
-     x1000 = I# 1000
-     foo = f x1 (f x2 (f x3 ....))
-So every sub-expression in `foo` has lots and lots of free variables.  But
-none of these sub-expressions float anywhere; the entire float-out pass is a
-no-op.
-
-In lvlMFE, we want to find out quickly if the MFE is not-floatable; that is
-the common case.  In #24471 it turned out that we were testing `abs_vars` (a
-relatively complicated calculation that takes at least O(n-free-vars) time to
-compute) for every sub-expression.
-
-Better instead to test `float_me` early. That still involves looking at
-dest_lvl, which means looking at every free variable, but the constant factor
-is a lot better.
-
-ToDo: find a way to fix the bad asymptotic complexity.
-
 Note [Floating join point bindings]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Mostly we don't float join points at all -- we want them to /stay/ join points.
@@ -1053,30 +1033,36 @@ we'd like to float the call to error, to get
 
 But, as ever, we need to be careful:
 
-(1) We want to float a bottoming
+(BF1) We want to float a bottoming
     expression even if it has free variables:
         f = \x. g (let v = h x in error ("urk" ++ v))
     Then we'd like to abstract over 'x', and float the whole arg of g:
         lvl = \x. let v = h x in error ("urk" ++ v)
         f = \x. g (lvl x)
-    To achieve this we pass is_bot to destLevel
-
-(2) We do not do this for lambdas that return
-    bottom.  Instead we treat the /body/ of such a function specially,
-    via point (1).  For example:
+    To achieve this we pass `is_bot` to destLevel
+
+(BF2) We do the same for /lambdas/ that return bottom.
+    Suppose the original lambda had /no/ free vars:
+        f = \x. ....(\y z. error (y++z))...
+    then we'd like to float that whole lambda
+        lvl = \y z. error (y++z)
+        f = \x. ....lvl....
+    If we just floated its bottom-valued body, we might abstract the arguments in
+    the "wrong" order and end up with this bad result
+        lvl = \z y. error (y++z)
+        f = \x. ....(\y z. lvl z y)....
+
+    If the lambda does have free vars, this will happen:
         f = \x. ....(\y z. if x then error y else error z)....
-    If we float the whole lambda thus
+    We float the whole lambda thus
         lvl = \x. \y z. if x then error y else error z
         f = \x. ...(lvl x)...
-    we may well end up eta-expanding that PAP to
+    And we may well end up eta-expanding that PAP to
+        lvl = \x. \y z. if b then error y else error z
         f = \x. ...(\y z. lvl x y z)...
+    so we get a (small) closure.  So be it.
 
-    ===>
-        lvl = \x z y. if b then error y else error z
-        f = \x. ...(\y z. lvl x z y)...
-    (There is no guarantee that we'll choose the perfect argument order.)
-
-(3) If we have a /binding/ that returns bottom, we want to float it to top
+(BF3) If we have a /binding/ that returns bottom, we want to float it to top
     level, even if it has free vars (point (1)), and even it has lambdas.
     Example:
        ... let { v = \y. error (show x ++ show y) } in ...
@@ -1092,7 +1078,6 @@ But, as ever, we need to be careful:
     join points (#24768), and floating to the top would abstract over those join
     points, which we should never do.
 
-
 See Maessen's paper 1999 "Bottom extraction: factoring error handling out
 of functional programs" (unpublished I think).
 
@@ -1135,7 +1120,6 @@ float the case (as advocated here) we won't float the (build ...y..)
 either, so fusion will happen.  It can be a big effect, esp in some
 artificial benchmarks (e.g. integer, queens), but there is no perfect
 answer.
-
 -}
 
 annotateBotStr :: Id -> Arity -> Maybe (Arity, DmdSig, CprSig) -> Id
@@ -1152,69 +1136,124 @@ annotateBotStr id n_extra mb_bot_str
   = id
 
 notWorthFloating :: CoreExpr -> [Var] -> Bool
--- Returns True if the expression would be replaced by
--- something bigger than it is now.  For example:
---   abs_vars = tvars only:  return True if e is trivial,
---                           but False for anything bigger
---   abs_vars = [x] (an Id): return True for trivial, or an application (f x)
---                           but False for (f x x)
---
--- One big goal is that floating should be idempotent.  Eg if
--- we replace e with (lvl79 x y) and then run FloatOut again, don't want
--- to replace (lvl79 x y) with (lvl83 x y)!
-
+--  See Note [notWorthFloating]
 notWorthFloating e abs_vars
-  = go e (count isId abs_vars)
+  = go e 0
   where
-    go (Var {}) n               = n >= 0
-    go (Lit lit) n              = assert (n==0) $
-                                  litIsTrivial lit   -- Note [Floating literals]
-    go (Type {}) _              = True
-    go (Coercion {}) _          = True
+    n_abs_vars = count isId abs_vars  -- See (NWF5)
+
+    go :: CoreExpr -> Int -> Bool
+    -- (go e n) return True if (e x1 .. xn) is not worth floating
+    -- where `e` has n trivial value arguments x1..xn
+    -- See (NWF4)
+    go (Lit lit) n         = assert (n==0) $
+                             litIsTrivial lit   -- See (NWF1)
+    go (Type {}) _         = True
+    go (Tick t e) n        = not (tickishIsCode t) && go e n
+    go (Cast e _) n        = n==0 || go e n     -- See (NWF3)
+    go (Coercion {}) _     = True
     go (App e arg) n
-       -- See Note [Floating applications to coercions]
-       | not (isRuntimeArg arg) = go e n
-       | n==0                   = False
-       | exprIsTrivial arg      = go e (n-1) -- NB: exprIsTrivial arg = go arg 0
-       | otherwise              = False
-    go (Tick t e) n             = not (tickishIsCode t) && go e n
-    go (Cast e _)  n            = go e n
-    go (Case e b _ as) n
+       | Type {} <- arg    = go e n    -- Just types, not coercions (NWF2)
+       | exprIsTrivial arg = go e (n+1)
+       | otherwise         = False  -- (f non-triv) is worth floating
+
+    go (Case e b _ as) _
+      -- Do not float the `case` part of trivial cases (NWF3)
+      -- We'll have a look at the RHS when we get there
       | null as
-      = go e n     -- See Note [Empty case is trivial]
-      | Just rhs <- isUnsafeEqualityCase e b as
-      = go rhs n   -- See (U2) of Note [Implementing unsafeCoerce] in base:Unsafe.Coerce
-    go _ _                      = False
+      = True   -- See Note [Empty case is trivial]
+      | Just {} <- isUnsafeEqualityCase e b as
+      = True   -- See (U2) of Note [Implementing unsafeCoerce] in base:Unsafe.Coerce
+      | otherwise
+      = False
 
-{-
-Note [Floating literals]
-~~~~~~~~~~~~~~~~~~~~~~~~
-It's important to float Integer literals, so that they get shared,
-rather than being allocated every time round the loop.
-Hence the litIsTrivial.
+    go (Var _) n
+      | n==0             = True   -- Naked variable
+      | n <= n_abs_vars  = True   -- (f a b c) is not worth floating if
+      | otherwise        = False  -- a,b,c are all abstracted; see (NWF5)
 
-Ditto literal strings (LitString), which we'd like to float to top
-level, which is now possible.
+    go _ _ = False  -- Let etc is worth floating
 
-Note [Floating applications to coercions]
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-We don’t float out variables applied only to type arguments, since the
-extra binding would be pointless: type arguments are completely erased.
-But *coercion* arguments aren’t (see Note [Coercion tokens] in
-"GHC.CoreToStg" and Note [inlineBoringOk] in"GHC.Core.Unfold"),
-so we still want to float out variables applied only to
-coercion arguments.
+{- Note [notWorthFloating]
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+`notWorthFloating` returns True if the expression would be replaced by something
+bigger than it is now.  One big goal is that floating should be idempotent.  Eg
+if we replace e with (lvl79 x y) and then run FloatOut again, don't want to
+replace (lvl79 x y) with (lvl83 x y)!
 
+For example:
+  abs_vars = tvars only:  return True if e is trivial,
+                          but False for anything bigger
+  abs_vars = [x] (an Id): return True for trivial, or an application (f x)
+                          but False for (f x x)
+
+(NWF1) It's important to float Integer literals, so that they get shared, rather
+  than being allocated every time round the loop.  Hence the litIsTrivial.
+
+  Ditto literal strings (LitString), which we'd like to float to top
+  level, which is now possible.
+
+(NWF2) We don’t float out variables applied only to type arguments, since the
+  extra binding would be pointless: type arguments are completely erased.
+  But *coercion* arguments aren’t (see Note [Coercion tokens] in
+  "GHC.CoreToStg" and Note [inlineBoringOk] in"GHC.Core.Unfold"),
+  so we still want to float out variables applied only to
+  coercion arguments.
+
+(NWF3) Some expressions have trivial wrappers:
+     - Casts (e |> co)
+     - Unary-class applications:
+          - Dictionary applications (MkC meth)
+          - Class-op applictions    (op dict)
+     - Case of empty alts
+     - Unsafe-equality case
+  In all these cases we say "not worth floating", and we do so /regardless/
+  of the wrapped expression.  The SetLevels stuff may subsequently float the
+  components of the expression.
+
+  Example:  is it worth floating (f x |> co)?  No!  If we did we'd get
+     lvl = f x |> co
+     ...lvl....
+  Then we'd do cast worker/wrapper and end up with.
+     lvl' = f x
+     ...(lvl' |> co)...
+  Silly!  Better not to float it in the first place.  If we say "no" here,
+  we'll subsequently say "yes" for (f x) and get
+     lvl = f x
+     ....(lvl |> co)...
+  which is what we want.  In short: don't float trivial wrappers.
+
+(NWF4) The only non-trivial expression that we say "not worth floating" for
+  is an application
+             f x y z
+  where the number of value arguments is <= the number of abstracted Ids.
+  This is what makes floating idempotent.  Hence counting the number of
+  value arguments in `go`
+
+(NWF5) In #24471 we had something like
+     x1 = I# 1
+     ...
+     x1000 = I# 1000
+     foo = f x1 (f x2 (f x3 ....))
+  So every sub-expression in `foo` has lots and lots of free variables.  But
+  none of these sub-expressions float anywhere; the entire float-out pass is a
+  no-op.
 
-************************************************************************
-*                                                                      *
-\subsection{Bindings}
-*                                                                      *
-************************************************************************
+  So `notWorthFloating` tries to avoid evaluating `n_abs_vars`, in cases where
+  it obviously /is/ worth floating.  (In #24471 it turned out that we were
+  testing `abs_vars` (a relatively complicated calculation that takes at least
+  O(n-free-vars) time to compute) for every sub-expression.)
 
-The binding stuff works for top level too.
+  Hence testing `n_abs_vars only` at the very end.
 -}
 
+{- *********************************************************************
+*                                                                      *
+                       Bindings
+        This binding stuff works for top level too.
+*                                                                      *
+********************************************************************* -}
+
 lvlBind :: LevelEnv
         -> CoreBindWithFVs
         -> LvlM (LevelledBind, LevelEnv)
@@ -1261,7 +1300,7 @@ lvlBind env (AnnNonRec bndr rhs)
         -- is_bot_lam: looks like (\xy. bot), maybe zero lams
         -- NB: not isBottomThunk!
         -- NB: not is_join: don't send bottoming join points to the top.
-        -- See Note [Bottoming floats] point (3)
+        -- See Note [Bottoming floats] (BF3)
 
     is_top_bindable = exprIsTopLevelBindable deann_rhs bndr_ty
     n_extra       = count isId abs_vars
@@ -1552,9 +1591,8 @@ destLevel env fvs fvs_ty is_function is_bot
                               -- See Note [Floating join point bindings]
   = tOP_LEVEL
 
-  | is_bot              -- Send bottoming bindings to the top
-  = as_far_as_poss      -- regardless; see Note [Bottoming floats]
-                        -- Esp Bottoming floats (1) and (3)
+  | is_bot              -- Send bottoming bindings to the top regardless;
+  = as_far_as_poss      -- see (BF1) and (BF2) in Note [Bottoming floats]
 
   | Just n_args <- floatLams env
   , n_args > 0  -- n=0 case handled uniformly by the 'otherwise' case
@@ -1568,8 +1606,13 @@ destLevel env fvs fvs_ty is_function is_bot
     max_fv_id_level = maxFvLevel isId env fvs -- Max over Ids only; the
                                               -- tyvars will be abstracted
 
+    -- as_far_as_poss: destination level depends only on the free Ids (more
+    -- precisely, free CoVars) of the /type/, not the free Ids of the /term/.
+    -- Why worry about the free CoVars?  See Note [Floating and kind casts]
+    --
+    -- There may be free Ids in the term, but then we'll just
+    -- lambda-abstract over them
     as_far_as_poss = maxFvLevel' isId env fvs_ty
-                     -- See Note [Floating and kind casts]
 
 {- Note [Floating and kind casts]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -1732,10 +1775,9 @@ maxFvLevel max_me env var_set
     -- It's OK to use a non-deterministic fold here because maxIn commutes.
 
 maxFvLevel' :: (Var -> Bool) -> LevelEnv -> TyCoVarSet -> Level
--- Same but for TyCoVarSet
+-- Precisely the same as `maxFvLevel` but for TyCoVarSet rather than DVarSet
 maxFvLevel' max_me env var_set
   = nonDetStrictFoldUniqSet (maxIn max_me env) tOP_LEVEL var_set
-    -- It's OK to use a non-deterministic fold here because maxIn commutes.
 
 maxIn :: (Var -> Bool) -> LevelEnv -> InVar -> Level -> Level
 maxIn max_me (LE { le_lvl_env = lvl_env, le_env = id_env }) in_var lvl


=====================================
compiler/GHC/Core/Opt/Simplify/Env.hs
=====================================
@@ -11,7 +11,7 @@ module GHC.Core.Opt.Simplify.Env (
         SimplMode(..), updMode, smPlatform,
 
         -- * Environments
-        SimplEnv(..), pprSimplEnv,   -- Temp not abstract
+        SimplEnv(..), StaticEnv, pprSimplEnv,
         seArityOpts, seCaseCase, seCaseFolding, seCaseMerge, seCastSwizzle,
         seDoEtaReduction, seEtaExpand, seFloatEnable, seInline, seNames,
         seOptCoercionOpts, sePhase, sePlatform, sePreInline,
@@ -170,6 +170,8 @@ coercion we don't apply optCoercion to it if seInlineDepth>0.
 Reason: it has already been optimised once, no point in doing so again.
 -}
 
+type StaticEnv = SimplEnv       -- Just the static part is relevant
+
 data SimplEnv
   = SimplEnv {
      ----------- Static part of the environment -----------
@@ -407,7 +409,6 @@ data SimplSR
        -- and  ja = Just a <=> x is a join-point of arity a
        -- See Note [Join arity in SimplIdSubst]
 
-
   | DoneId OutId
        -- If  x :-> DoneId v   is in the SimplIdSubst
        -- then replace occurrences of x by v
@@ -778,7 +779,7 @@ emptyJoinFloats = nilOL
 isEmptyJoinFloats :: JoinFloats -> Bool
 isEmptyJoinFloats = isNilOL
 
-unitLetFloat :: OutBind -> LetFloats
+unitLetFloat :: HasDebugCallStack => OutBind -> LetFloats
 -- This key function constructs a singleton float with the right form
 unitLetFloat bind = assert (all (not . isJoinId) (bindersOf bind)) $
                     LetFloats (unitOL bind) (flag bind)


=====================================
compiler/GHC/Core/Opt/Simplify/Iteration.hs
=====================================
@@ -447,7 +447,7 @@ we want to do something very similar to worker/wrapper:
 
 We call this making a cast worker/wrapper in tryCastWorkerWrapper.
 
-The main motivaiton is that x can be inlined freely.  There's a chance
+The main motivation is that x can be inlined freely.  There's a chance
 that e will be a constructor application or function, or something
 like that, so moving the coercion to the usage site may well cancel
 the coercions and lead to further optimisation.  Example:
@@ -576,11 +576,13 @@ Note [Concrete types] in GHC.Tc.Utils.Concrete.
 -}
 
 tryCastWorkerWrapper :: SimplEnv -> BindContext
-                     -> InId -> OutId -> OutExpr
-                     -> SimplM (SimplFloats, SimplEnv)
+                     -> OutId -> OutExpr
+                     -> SimplM (Maybe (LetFloats, OutId, OutExpr))
 -- See Note [Cast worker/wrapper]
-tryCastWorkerWrapper env bind_cxt old_bndr bndr (Cast rhs co)
-  | BC_Let top_lvl is_rec <- bind_cxt  -- Not join points
+-- Given input x = rhs |> co, the result will be
+--    (x' = rhs, x, x' |> co))
+tryCastWorkerWrapper env bind_cxt bndr (Cast rhs co)
+  | BC_Let top_lvl _ <- bind_cxt  -- Not join points
   , not (isDFunId bndr) -- nor DFuns; cast w/w is no help, and we can't transform
                         --            a DFunUnfolding in mk_worker_unfolding
   , not (exprIsTrivial rhs)        -- Not x = y |> co; Wrinkle 1
@@ -588,38 +590,23 @@ tryCastWorkerWrapper env bind_cxt old_bndr bndr (Cast rhs co)
   , typeHasFixedRuntimeRep work_ty    -- Don't peel off a cast if doing so would
                                       -- lose the underlying runtime representation.
                                       -- See Note [Preserve RuntimeRep info in cast w/w]
-  , not (isOpaquePragma (idInlinePragma old_bndr)) -- Not for OPAQUE bindings
-                                                   -- See Note [OPAQUE pragma]
+  , not (isOpaquePragma (idInlinePragma bndr)) -- Not for OPAQUE bindings
+                                               -- See Note [OPAQUE pragma]
   = do  { uniq <- getUniqueM
         ; let work_name = mkSystemVarName uniq occ_fs
               work_id   = mkLocalIdWithInfo work_name ManyTy work_ty work_info
-              is_strict = isStrictId bndr
 
-        ; (rhs_floats, work_rhs) <- prepareBinding env top_lvl is_rec is_strict
-                                                   work_id (emptyFloats env) rhs
-
-        ; work_unf <- mk_worker_unfolding top_lvl work_id work_rhs
+        ; work_unf <- mk_worker_unfolding top_lvl work_id rhs
         ; let  work_id_w_unf = work_id `setIdUnfolding` work_unf
-               floats   = rhs_floats `addLetFloats`
-                          unitLetFloat (NonRec work_id_w_unf work_rhs)
-
-               triv_rhs = Cast (Var work_id_w_unf) co
-
-        ; if postInlineUnconditionally env bind_cxt old_bndr bndr triv_rhs
-             -- Almost always True, because the RHS is trivial
-             -- In that case we want to eliminate the binding fast
-             -- We conservatively use postInlineUnconditionally so that we
-             -- check all the right things
-          then do { tick (PostInlineUnconditionally bndr)
-                  ; return ( floats
-                           , extendIdSubst (setInScopeFromF env floats) old_bndr $
-                             DoneEx triv_rhs NotJoinPoint ) }
-
-          else do { wrap_unf <- mkLetUnfolding env top_lvl VanillaSrc bndr False triv_rhs
-                  ; let bndr' = bndr `setInlinePragma` mkCastWrapperInlinePrag (idInlinePragma bndr)
-                                `setIdUnfolding`  wrap_unf
-                        floats' = floats `extendFloats` NonRec bndr' triv_rhs
-                  ; return ( floats', setInScopeFromF env floats' ) } }
+               work_bind     = NonRec work_id_w_unf rhs
+               triv_rhs      = Cast (Var work_id_w_unf) co
+
+        ; wrap_unf <- mkLetUnfolding env top_lvl VanillaSrc bndr False triv_rhs
+        ; let wrap_prag = mkCastWrapperInlinePrag (inlinePragInfo info)
+              bndr' = bndr `setInlinePragma` wrap_prag
+                           `setIdUnfolding`  wrap_unf
+
+        ; return (Just (unitLetFloat work_bind, bndr', triv_rhs)) }
   where
     -- Force the occ_fs so that the old Id is not retained in the new Id.
     !occ_fs = getOccFS bndr
@@ -647,10 +634,10 @@ tryCastWorkerWrapper env bind_cxt old_bndr bndr (Cast rhs co)
              | isStableSource src -> return (unf { uf_tmpl = mkCast unf_rhs (mkSymCo co) })
            _ -> mkLetUnfolding env top_lvl VanillaSrc work_id False work_rhs
 
-tryCastWorkerWrapper env _ _ bndr rhs  -- All other bindings
+tryCastWorkerWrapper _ _ bndr rhs  -- All other bindings
   = do { traceSmpl "tcww:no" (vcat [ text "bndr:" <+> ppr bndr
                                    , text "rhs:" <+> ppr rhs ])
-        ; return (mkFloatBind env (NonRec bndr rhs)) }
+       ; return Nothing }
 
 mkCastWrapperInlinePrag :: InlinePragma -> InlinePragma
 -- See Note [Cast worker/wrapper]
@@ -810,39 +797,40 @@ makeTrivial :: HasDebugCallStack
 -- Binds the expression to a variable, if it's not trivial, returning the variable
 -- For the Demand argument, see Note [Keeping demand info in StrictArg Plan A]
 makeTrivial env top_lvl dmd occ_fs expr
-  | exprIsTrivial expr                          -- Already trivial
-  || not (bindingOk top_lvl expr expr_ty)       -- Cannot trivialise
-                                                --   See Note [Cannot trivialise]
+  | exprIsTrivial expr                  -- Already trivial
   = return (emptyLetFloats, expr)
 
-  | Cast expr' co <- expr
-  = do { (floats, triv_expr) <- makeTrivial env top_lvl dmd occ_fs expr'
-       ; return (floats, Cast triv_expr co) }
+  | not (bindingOk top_lvl expr expr_ty)  -- Cannot trivialise
+  = return (emptyLetFloats, expr)         -- See Note [Cannot trivialise]
 
-  | otherwise -- 'expr' is not of form (Cast e co)
+  | otherwise
   = do  { (floats, expr1) <- prepareRhs env top_lvl occ_fs expr
         ; uniq <- getUniqueM
         ; let name = mkSystemVarName uniq occ_fs
-              var  = mkLocalIdWithInfo name ManyTy expr_ty id_info
+              bndr = mkLocalIdWithInfo name ManyTy expr_ty id_info
+              bind_ctxt = BC_Let top_lvl NonRecursive
 
         -- Now something very like completeBind,
         -- but without the postInlineUnconditionally part
-        ; (arity_type, expr2) <- tryEtaExpandRhs env (BC_Let top_lvl NonRecursive) var expr1
+        ; (arity_type, expr2) <- tryEtaExpandRhs env bind_ctxt bndr expr1
           -- Technically we should extend the in-scope set in 'env' with
           -- the 'floats' from prepareRHS; but they are all fresh, so there is
           -- no danger of introducing name shadowing in eta expansion
 
-        ; unf <- mkLetUnfolding env top_lvl VanillaSrc var False expr2
-
-        ; let final_id = addLetBndrInfo var arity_type unf
-              bind     = NonRec final_id expr2
+        ; unf <- mkLetUnfolding env top_lvl VanillaSrc bndr False expr2
+        ; let bndr'    = addLetBndrInfo bndr arity_type unf
+              anf_bind = NonRec bndr' expr2
 
-        ; traceSmpl "makeTrivial" (vcat [text "final_id" <+> ppr final_id, text "rhs" <+> ppr expr2 ])
-        ; return ( floats `addLetFlts` unitLetFloat bind, Var final_id ) }
+        ; mb_cast_ww <- tryCastWorkerWrapper env bind_ctxt bndr' expr2
+        ; case mb_cast_ww of
+            Nothing -> return (floats `addLetFlts` unitLetFloat anf_bind, Var bndr')
+            Just (work_flts, _, triv_rhs)
+                    -> return (floats `addLetFlts` work_flts, triv_rhs) }
   where
     id_info = vanillaIdInfo `setDemandInfo` dmd
     expr_ty = exprType expr
 
+
 bindingOk :: TopLevelFlag -> CoreExpr -> Type -> Bool
 -- True iff we can have a binding of this expression at this level
 -- Precondition: the type is the type of the expression
@@ -936,26 +924,50 @@ completeBind bind_cxt (old_bndr, unf_se) (new_bndr, new_rhs, env)
                             eta_rhs (idType new_bndr) new_arity old_unf
 
       ; let new_bndr_w_info = addLetBndrInfo new_bndr new_arity new_unfolding
-        -- See Note [In-scope set as a substitution]
+               -- See Note [In-scope set as a substitution]
+            occ_anald_rhs = maybeUnfoldingTemplate new_unfolding `orElse` eta_rhs
+               -- occ_anald_rhs: see Note [Use occ-anald RHS in postInlineUnconditionally]
 
+      -- Try postInlineUnconditionally for (x = rhs)
+      -- If that succeeds we don't want to do tryCastWorkerWrapper
       ; if postInlineUnconditionally env bind_cxt old_bndr new_bndr_w_info eta_rhs
-
-        then -- Inline and discard the binding
-             do  { tick (PostInlineUnconditionally old_bndr)
-                 ; let unf_rhs = maybeUnfoldingTemplate new_unfolding `orElse` eta_rhs
-                          -- See Note [Use occ-anald RHS in postInlineUnconditionally]
-                 ; simplTrace "PostInlineUnconditionally" (ppr new_bndr <+> ppr unf_rhs) $
-                   return ( emptyFloats env
-                          , extendIdSubst env old_bndr $
-                            DoneEx unf_rhs (idJoinPointHood new_bndr)) }
+        then post_inline_it (emptyFloats env) occ_anald_rhs
+        else
+
+   do { -- Try cast worker-wrapper
+        mb_cast_ww <- tryCastWorkerWrapper env bind_cxt new_bndr_w_info eta_rhs
+      ; case mb_cast_ww of
+          Nothing -> no_post_inline (emptyFloats env) new_bndr_w_info eta_rhs
+
+          Just (cast_let_flts, new_bndr, new_rhs)
+            -- Try postInlineUnconditionally for (new_bndr = new_rhs)
+            -- Almost always fires, because `new_rhs` is small, but we conservatively
+            -- use `postInlineUnconditionally` so that we check all the right things
+            | postInlineUnconditionally env bind_cxt old_bndr new_bndr new_rhs
+            -> post_inline_it cast_floats new_rhs
+                   -- new_rhs is (x |> co) so no need to occ-anal
+            | otherwise
+            -> no_post_inline cast_floats new_bndr new_rhs
+            where
+              cast_floats = emptyFloats env `addLetFloats` cast_let_flts
+    } }
+  where
+    no_post_inline floats new_bndr new_rhs
+      = do { let the_bind = NonRec new_bndr new_rhs
+                 floats'  = floats `extendFloats` the_bind
+                 env'     = env `setInScopeFromF` floats'
+           ; return (floats', env') }
+
+    post_inline_it floats rhs
+      = do  { simplTrace "PostInlineUnconditionally" (ppr old_bndr <+> ppr rhs) $
+              tick (PostInlineUnconditionally old_bndr)
+            ; let env' = env `setInScopeFromF` floats
+            ; return ( floats
+                     , extendIdSubst env' old_bndr $
+                       DoneEx rhs (idJoinPointHood old_bndr)) }
                 -- Use the substitution to make quite, quite sure that the
                 -- substitution will happen, since we are going to discard the binding
 
-        else -- Keep the binding; do cast worker/wrapper
---             simplTrace "completeBind" (vcat [ text "bndrs" <+> ppr old_bndr <+> ppr new_bndr
---                                             , text "eta_rhs" <+> ppr eta_rhs ]) $
-             tryCastWorkerWrapper env bind_cxt old_bndr new_bndr_w_info eta_rhs }
-
 addLetBndrInfo :: OutId -> ArityType -> Unfolding -> OutId
 addLetBndrInfo new_bndr new_arity_type new_unf
   = new_bndr `setIdInfo` info5
@@ -3955,7 +3967,17 @@ mkDupableContWithDmds env dmds
         ; (floats1, cont') <- mkDupableContWithDmds env cont_dmds cont
         ; let env' = env `setInScopeFromF` floats1
         ; (_, se', arg') <- simplLazyArg env' dup hole_ty Nothing se arg
-        ; (let_floats2, arg'') <- makeTrivial env NotTopLevel dmd (fsLit "karg") arg'
+
+        -- Make the argument duplicable. Danger: if arg is small and we let-bind
+        -- it, then postInlineUnconditionally will just inline it again, perhaps
+        -- taking an extra Simplifier iteration (e.g. in test T21839c). So make
+        -- a `let` only if `couldBeSmallEnoughToInline` says that it is big enough
+        ; let uf_opts = seUnfoldingOpts env
+        ; (let_floats2, arg'')
+              <- if couldBeSmallEnoughToInline uf_opts (unfoldingUseThreshold uf_opts) arg'
+                 then return (emptyLetFloats, arg')
+                 else makeTrivial env NotTopLevel dmd (fsLit "karg") arg'
+
         ; let all_floats = floats1 `addLetFloats` let_floats2
         ; return ( all_floats
                  , ApplyToVal { sc_arg = arg''
@@ -4592,7 +4614,8 @@ mkLetUnfolding :: SimplEnv -> TopLevelFlag -> UnfoldingSource
                -> InId -> Bool    -- True <=> this is a join point
                -> OutExpr -> SimplM Unfolding
 mkLetUnfolding env top_lvl src id is_join new_rhs
-  = return (mkUnfolding uf_opts src is_top_lvl is_bottoming is_join new_rhs Nothing)
+  = -- Monadic to force those where-bindings
+    return (mkUnfolding uf_opts src is_top_lvl is_bottoming is_join new_rhs Nothing)
             -- We make an  unfolding *even for loop-breakers*.
             -- Reason: (a) It might be useful to know that they are WHNF
             --         (b) In GHC.Iface.Tidy we currently assume that, if we want to


=====================================
compiler/GHC/Core/Opt/Simplify/Utils.hs
=====================================
@@ -216,8 +216,6 @@ data SimplCont
         CoreTickish     -- Tick tickish <hole>
         SimplCont
 
-type StaticEnv = SimplEnv       -- Just the static part is relevant
-
 data FromWhat = FromLet | FromBeta Levity
 
 -- See Note [DupFlag invariants]
@@ -723,7 +721,6 @@ which it is on the LHS of a rule (see updModeForRules), then don't
 make use of the strictness info for the function.
 -}
 
-
 {-
 ************************************************************************
 *                                                                      *
@@ -1423,8 +1420,12 @@ preInlineUnconditionally for
 Note [Top-level bottoming Ids]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Don't inline top-level Ids that are bottoming, even if they are used just
-once, because FloatOut has gone to some trouble to extract them out.
-Inlining them won't make the program run faster!
+once, because FloatOut has gone to some trouble to extract them out. e.g.
+    report x y = error (..lots of stuff...)
+    f x y z = if z then report x y else ...blah...
+Here `f` might be small enough to inline; but if we put all the `report`
+stuff inside it, it'll look to big.  In general we don't want to duplicate
+all the error-reporting goop.
 
 Note [Do not inline CoVars unconditionally]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -1460,51 +1461,25 @@ preInlineUnconditionally env top_lvl bndr rhs rhs_env
     extend_subst_with inl_rhs = extendIdSubst env bndr $! (mkContEx rhs_env inl_rhs)
 
     one_occ IAmDead = True -- Happens in ((\x.1) v)
-    one_occ OneOcc{ occ_n_br   = 1
-                  , occ_in_lam = NotInsideLam }   = isNotTopLevel top_lvl || early_phase
+    one_occ OneOcc{ occ_n_br    = 1
+                  , occ_in_lam  = NotInsideLam
+                  , occ_int_cxt = int_cxt }
+      =  isNotTopLevel top_lvl      -- Get rid of allocation
+      || (int_cxt==IsInteresting)   -- Function is applied
+   --   || (early_phase && not (isConLikeUnfolding unf))  -- See early_phase
     one_occ OneOcc{ occ_n_br   = 1
                   , occ_in_lam = IsInsideLam
-                  , occ_int_cxt = IsInteresting } = canInlineInLam rhs
-    one_occ _                                     = False
+                  , occ_int_cxt = IsInteresting }
+      = canInlineInLam rhs
+    one_occ _
+      = False
 
     pre_inline_unconditionally = sePreInline env
     active = isActive (sePhase env) (inlinePragmaActivation inline_prag)
              -- See Note [pre/postInlineUnconditionally in gentle mode]
     inline_prag = idInlinePragma bndr
 
--- Be very careful before inlining inside a lambda, because (a) we must not
--- invalidate occurrence information, and (b) we want to avoid pushing a
--- single allocation (here) into multiple allocations (inside lambda).
--- Inlining a *function* with a single *saturated* call would be ok, mind you.
---      || (if is_cheap && not (canInlineInLam rhs) then pprTrace "preinline" (ppr bndr <+> ppr rhs) ok else ok)
---      where
---              is_cheap = exprIsCheap rhs
---              ok = is_cheap && int_cxt
-
-        --      int_cxt         The context isn't totally boring
-        -- E.g. let f = \ab.BIG in \y. map f xs
-        --      Don't want to substitute for f, because then we allocate
-        --      its closure every time the \y is called
-        -- But: let f = \ab.BIG in \y. map (f y) xs
-        --      Now we do want to substitute for f, even though it's not
-        --      saturated, because we're going to allocate a closure for
-        --      (f y) every time round the loop anyhow.
-
-        -- canInlineInLam => free vars of rhs are (Once in_lam) or Many,
-        -- so substituting rhs inside a lambda doesn't change the occ info.
-        -- Sadly, not quite the same as exprIsHNF.
-    canInlineInLam (Lit _)    = True
-    canInlineInLam (Lam b e)  = isRuntimeVar b || canInlineInLam e
-    canInlineInLam (Tick t e) = not (tickishIsCode t) && canInlineInLam e
-    canInlineInLam (Var v)    = case idOccInfo v of
-                                  OneOcc { occ_in_lam = IsInsideLam } -> True
-                                  ManyOccs {}                         -> True
-                                  _                                   -> False
-    canInlineInLam _          = False
-      -- not ticks.  Counting ticks cannot be duplicated, and non-counting
-      -- ticks around a Lam will disappear anyway.
-
-    early_phase = sePhase env /= FinalPhase
+--    early_phase = sePhase env /= FinalPhase
     -- If we don't have this early_phase test, consider
     --      x = length [1,2,3]
     -- The full laziness pass carefully floats all the cons cells to
@@ -1532,6 +1507,52 @@ preInlineUnconditionally env top_lvl bndr rhs rhs_env
     -- (Nor can we check for `exprIsExpandable rhs`, because that needs to look
     -- at the non-existent unfolding for the `I# 2#` which is also floated out.)
 
+
+-- Be very careful before inlining inside a lambda, because (a) we must not
+-- invalidate occurrence information, and (b) we want to avoid pushing a
+-- single allocation (here) into multiple allocations (inside lambda).
+-- Inlining a *function* with a single *saturated* call would be ok, mind you.
+--      || (if is_cheap && not (canInlineInLam rhs) then pprTrace "preinline" (ppr bndr <+> ppr rhs) ok else ok)
+--      where
+--              is_cheap = exprIsCheap rhs
+--              ok = is_cheap && int_cxt
+
+        --      int_cxt         The context isn't totally boring
+        -- E.g. let f = \ab.BIG in \y. map f xs
+        --      Don't want to substitute for f, because then we allocate
+        --      its closure every time the \y is called
+        -- But: let f = \ab.BIG in \y. map (f y) xs
+        --      Now we do want to substitute for f, even though it's not
+        --      saturated, because we're going to allocate a closure for
+        --      (f y) every time round the loop anyhow.
+
+        -- canInlineInLam => free vars of rhs are (Once in_lam) or Many,
+        -- so substituting rhs inside a lambda doesn't change the occ info.
+        -- Sadly, not quite the same as exprIsHNF.
+canInlineInLam ::CoreExpr -> Bool
+canInlineInLam e
+  = go e
+  where
+    go (Lit _)    = True
+    go (Lam b e)  = isRuntimeVar b || go e
+    go (Cast e _) = go e
+    go (Tick t e) = not (tickishIsCode t) && go e
+    -- This matters only for:
+    --     x = y            -- or y|>co
+    --     f = \p. ..x..    -- One occurrence of x
+    --     ..y..            -- Multiple other occurrences of y
+    -- Then it is safe to inline x unconditionally
+    -- For postInlineUncondionally we have already tested exprIsTrivial
+    -- so this Var case never arises
+    go (Var v)    = case idOccInfo v of
+                       OneOcc { occ_in_lam = IsInsideLam } -> True
+                       ManyOccs {}                         -> True
+                       _                                   -> False
+    go _          = False
+      -- not ticks.  Counting ticks cannot be duplicated, and non-counting
+      -- ticks around a Lam will disappear anyway.
+
+
 {-
 ************************************************************************
 *                                                                      *
@@ -1582,71 +1603,77 @@ postInlineUnconditionally
 -- Reason: we don't want to inline single uses, or discard dead bindings,
 --         for unlifted, side-effect-ful bindings
 postInlineUnconditionally env bind_cxt old_bndr bndr rhs
-  | not active                  = False
-  | isWeakLoopBreaker occ_info  = False -- If it's a loop-breaker of any kind, don't inline
-                                        -- because it might be referred to "earlier"
-  | isStableUnfolding unfolding = False -- Note [Stable unfoldings and postInlineUnconditionally]
-  | isTopLevel (bindContextLevel bind_cxt)
-                                = False -- Note [Top level and postInlineUnconditionally]
-  | exprIsTrivial rhs           = True
-  | BC_Join {} <- bind_cxt      = False -- See point (1) of Note [Duplicating join points]
+  | not active                   = False
+  | isWeakLoopBreaker occ_info   = False -- If it's a loop-breaker of any kind, don't inline
+                                         -- because it might be referred to "earlier"
+  | isStableUnfolding unfolding  = False -- Note [Stable unfoldings and postInlineUnconditionally]
+  | BC_Join {} <- bind_cxt       = exprIsTrivial rhs
+                                        -- See point (DJ1) of Note [Duplicating join points]
                                         --     in GHC.Core.Opt.Simplify.Iteration
+  | is_top_lvl, isDeadEndId bndr = False -- Note [Top-level bottoming Ids]
   | otherwise
   = case occ_info of
-      OneOcc { occ_in_lam = in_lam, occ_int_cxt = int_cxt, occ_n_br = n_br }
-        -- See Note [Inline small things to avoid creating a thunk]
+      IAmALoopBreaker {} -> False
+      ManyOccs {} | is_top_lvl -> False  -- Note [Top level and postInlineUnconditionally]
+                  | otherwise  -> exprIsTrivial rhs
 
-        | n_br >= 100 -> False  -- See #23627
+      OneOcc { occ_in_lam = in_lam, occ_int_cxt = int_cxt, occ_n_br = n_br }
+              | exprIsTrivial rhs -> True
+              | otherwise         -> check_one_occ in_lam int_cxt n_br
 
-        | n_br == 1, NotInsideLam <- in_lam  -- One syntactic occurrence
-        -> True                              -- See Note [Post-inline for single-use things]
+      IAmDead -> True   -- This happens; for example, the case_bndr during case of
+                        -- known constructor:  case (a,b) of x { (p,q) -> ... }
+                        -- Here x isn't mentioned in the RHS, so we don't want to
+                        -- create the (dead) let-binding  let x = (a,b) in ...
+  where
+    is_top_lvl  = isTopLevel (bindContextLevel bind_cxt)
+    is_demanded = isStrUsedDmd (idDemandInfo bndr)
+    occ_info    = idOccInfo old_bndr
+    unfolding   = idUnfolding bndr
+    arity       = idArity bndr
+--    is_cheap    = isCheapUnfolding unfolding
+    uf_opts     = seUnfoldingOpts env
+    phase       = sePhase env
+    active      = isActive phase (idInlineActivation bndr)
+        -- See Note [pre/postInlineUnconditionally in gentle mode]
 
+    -- Check for code-size blow-up from inlining in multiple places
+    code_dup_ok n_br
+      | n_br == 1   = True  -- No duplication
+      | n_br >= 100 = False -- See #23627
+      | is_demanded = False -- Demanded => no allocation (it'll be a case expression
+                            -- in the end) so inlining duplicates code but nothing more
+      | otherwise   = smallEnoughToInline uf_opts unfolding
+
+    -- See Note [Post-inline for single-use things]
+    check_one_occ NotInsideLam NotInteresting n_br = not is_top_lvl && code_dup_ok n_br
+    check_one_occ NotInsideLam IsInteresting  n_br = code_dup_ok n_br
+    check_one_occ IsInsideLam  NotInteresting _    = False
+    check_one_occ IsInsideLam  IsInteresting  n_br = arity > 0 && code_dup_ok n_br
+      -- IsInteresting: inlining inside a lambda only with good reason
+      --    See the notes on int_cxt in preInlineUnconditionally
+      -- arity>0: do not inline data strutures under lambdas, only functions
+
+---------------
+-- A wrong bit of code, left here in case you are tempted to do this
 --        | is_unlifted                        -- Unlifted binding, hence ok-for-spec
 --        -> True                              -- hence cheap to inline probably just a primop
---                                             -- Not a big deal either way
 -- No, this is wrong.  {v = p +# q; x = K v}.
 -- Don't inline v; it'll just get floated out again. Stupid.
+---------------
 
-        | is_demanded
-        -> False                            -- No allocation (it'll be a case expression in the end)
-                                            -- so inlining duplicates code but nothing more
 
-        | otherwise
-        -> work_ok in_lam int_cxt && smallEnoughToInline uf_opts unfolding
-              -- Multiple syntactic occurences; but lazy, and small enough to dup
-              -- ToDo: consider discount on smallEnoughToInline if int_cxt is true
-
-      IAmDead -> True   -- This happens; for example, the case_bndr during case of
-                        -- known constructor:  case (a,b) of x { (p,q) -> ... }
-                        -- Here x isn't mentioned in the RHS, so we don't want to
-                        -- create the (dead) let-binding  let x = (a,b) in ...
-
-      _ -> False
-
-  where
-    work_ok NotInsideLam _              = True
-    work_ok IsInsideLam  IsInteresting  = isCheapUnfolding unfolding
-    work_ok IsInsideLam  NotInteresting = False
-      -- NotInsideLam: outside a lambda, we want to be reasonably aggressive
-      -- about inlining into multiple branches of case
+      -- NotInsideLam: outside a lambda, when not at top-level we want to be
+      -- reasonably aggressive about inlining into multiple branches of case
       -- e.g. let x = <non-value>
       --      in case y of { C1 -> ..x..; C2 -> ..x..; C3 -> ... }
       -- Inlining can be a big win if C3 is the hot-spot, even if
       -- the uses in C1, C2 are not 'interesting'
       -- An example that gets worse if you add int_cxt here is 'clausify'
 
-      -- InsideLam: check for acceptable work duplication, using isCheapUnfoldign
-      -- int_cxt to prevent us inlining inside a lambda without some
-      -- good reason.  See the notes on int_cxt in preInlineUnconditionally
+      -- InsideLam:
 
 --    is_unlifted = isUnliftedType (idType bndr)
-    is_demanded = isStrUsedDmd (idDemandInfo bndr)
-    occ_info    = idOccInfo old_bndr
-    unfolding   = idUnfolding bndr
-    uf_opts     = seUnfoldingOpts env
-    phase       = sePhase env
-    active      = isActive phase (idInlineActivation bndr)
-        -- See Note [pre/postInlineUnconditionally in gentle mode]
 
 {- Note [Inline small things to avoid creating a thunk]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -1669,23 +1696,24 @@ where GHC.Driver.CmdLine.$wprocessArgs allocated hugely more.
 Note [Post-inline for single-use things]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 If we have
-
    let x = rhs in ...x...
-
 and `x` is used exactly once, and not inside a lambda, then we will usually
 preInlineUnconditinally. But we can still get this situation in
 postInlineUnconditionally:
-
   case K rhs of K x -> ...x....
-
 Here we'll use `simplAuxBind` to bind `x` to (the already-simplified) `rhs`;
 and `x` is used exactly once.  It's beneficial to inline right away; otherwise
 we risk creating
-
    let x = rhs in ...x...
+which will take another iteration of the Simplifier to eliminate.
 
-which will take another iteration of the Simplifier to eliminate.  We do this in
-two places
+A similar, but less frequent, case is
+    let f = \x.blah in ...(\y. ...(f e)...) ...
+Again `preInlineUnconditionally will usually inline `f`, but it can arise
+via `simplAuxBind` if we have something like
+    (\f \y. ...(f e)..) (\x.blah)
+
+We do unconditional post-inlining in two places:
 
 1. In the full `postInlineUnconditionally` look for the special case
    of "one occurrence, not under a lambda", and inline unconditionally then.
@@ -1714,24 +1742,20 @@ Alas!
 
 Note [Top level and postInlineUnconditionally]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-We don't do postInlineUnconditionally for top-level things (even for
-ones that are trivial):
+We must take care when considering postInlineUnconditionally for top-level things
 
-  * Doing so will inline top-level error expressions that have been
-    carefully floated out by FloatOut.  More generally, it might
-    replace static allocation with dynamic.
+  * Don't inline top-level error expressions that have been carefully floated
+    out by FloatOut.  See Note [Top-level bottoming Ids].
 
-  * Even for trivial expressions there's a problem.  Consider
+  * Even for trivial expressions we need to take care: we must not
+    postInlineUnconditionally a top-level ManyOccs binder, even if its
+    RHS is trivial. Consider
       {-# RULE "foo" forall (xs::[T]). reverse xs = ruggle xs #-}
       blah xs = reverse xs
       ruggle = sort
-    In one simplifier pass we might fire the rule, getting
+    We must not postInlineUnconditionally `ruggle`, because in the same
+    simplifier pass we might fire the rule, getting
       blah xs = ruggle xs
-    but in *that* simplifier pass we must not do postInlineUnconditionally
-    on 'ruggle' because then we'll have an unbound occurrence of 'ruggle'
-
-    If the rhs is trivial it'll be inlined by callSiteInline, and then
-    the binding will be dead and discarded by the next use of OccurAnal
 
   * There is less point, because the main goal is to get rid of local
     bindings used in multiple case branches.


=====================================
compiler/GHC/Core/Utils.hs
=====================================
@@ -397,10 +397,12 @@ mkTicks :: [CoreTickish] -> CoreExpr -> CoreExpr
 mkTicks ticks expr = foldr mkTick expr ticks
 
 isSaturatedConApp :: CoreExpr -> Bool
+-- Also includes literals
 isSaturatedConApp e = go e []
   where go (App f a) as = go f (a:as)
         go (Var fun) args
            = isConLikeId fun && idArity fun == valArgCount args
+        go (Lit {})   _  = True
         go (Cast f _) as = go f as
         go _ _ = False
 


=====================================
compiler/GHC/Parser/Errors/Types.hs
=====================================
@@ -611,6 +611,7 @@ data LexErr
    | LexUnterminatedComment -- ^ Unterminated `{-'
    | LexUnterminatedOptions -- ^ Unterminated OPTIONS pragma
    | LexUnterminatedQQ      -- ^ Unterminated quasiquotation
+   deriving (Show,Eq,Ord)
 
 -- | Errors from the Cmm parser
 data CmmParserError


=====================================
compiler/GHC/Parser/String.hs
=====================================
@@ -36,6 +36,7 @@ import GHC.Utils.Panic (panic)
 
 type BufPos = Int
 data StringLexError = StringLexError LexErr BufPos
+  deriving (Show, Eq)
 
 lexString :: Int -> StringBuffer -> Either StringLexError String
 lexString = lexStringWith processChars processChars
@@ -122,20 +123,49 @@ bufferLocatedChars initialBuf len = go initialBuf
 -- -----------------------------------------------------------------------------
 -- Lexing phases
 
+-- | Collapse all string gaps in the given input.
+--
+-- Iterates through the input in `go` until we encounter a backslash. The
+-- @stringchar Alex regex only allows backslashes in two places: escape codes
+-- and string gaps.
+--
+--   * If the next character is a space, it has to be the start of a string gap
+--     AND it must end, since the @gap Alex regex will only match if it ends.
+--     Collapse the gap and continue the main iteration loop.
+--
+--   * Otherwise, this is an escape code. If it's an escape code, there are
+--     ONLY three possibilities (see the @escape Alex regex):
+--       1. The escape code is "\\"
+--       2. The escape code is "\^\"
+--       3. The escape code does not have a backslash, other than the initial
+--          backslash
+--
+--     In the first two possibilities, just skip them and continue the main
+--     iteration loop ("skip" as in "keep in the list as-is"). In the last one,
+--     we can just skip the backslash, then continue the main iteration loop.
+--     the rest of the escape code will be skipped as normal characters in the
+--     string; no need to fully parse a proper escape code.
 collapseGaps :: HasChar c => [c] -> [c]
 collapseGaps = go
   where
     go = \case
-      c1@(Char '\\') : c2@(Char c) : cs
-        -- #25784: string gaps are semantically equivalent to "\&"
+      -- Match the start of a string gap + drop gap
+      -- #25784: string gaps are semantically equivalent to "\&"
+      c1@(Char '\\') : Char c : cs
         | is_space c -> c1 : setChar '&' c1 : go (dropGap cs)
-        | otherwise  -> c1 : c2 : go cs
+      -- Match all possible escape characters that include a backslash
+      c1@(Char '\\') : c2@(Char '\\') : cs
+        -> c1 : c2 : go cs
+      c1@(Char '\\') : c2@(Char '^') : c3@(Char '\\') : cs
+        -> c1 : c2 : c3 : go cs
+      -- Otherwise, just keep looping
       c : cs -> c : go cs
       [] -> []
 
     dropGap = \case
       Char '\\' : cs -> cs
       _ : cs -> dropGap cs
+      -- Unreachable since gaps must end; see docstring
       [] -> panic "gap unexpectedly ended"
 
 resolveEscapes :: HasChar c => [c] -> Either (c, LexErr) [c]


=====================================
compiler/GHC/Unit/Types.hs
=====================================
@@ -1,5 +1,3 @@
-{-# OPTIONS_GHC -Wno-orphans #-} -- instance Data ModuleName
-
 {-# LANGUAGE FlexibleInstances #-}
 {-# LANGUAGE DeriveDataTypeable #-}
 {-# LANGUAGE DeriveTraversable #-}
@@ -117,13 +115,6 @@ data GenModule unit = Module
    }
    deriving (Eq,Ord,Data,Functor)
 
--- TODO: should be moved back into Language.Haskell.Syntax.Module.Name
-instance Data ModuleName where
-  -- don't traverse?
-  toConstr _   = abstractConstr "ModuleName"
-  gunfold _ _  = error "gunfold"
-  dataTypeOf _ = mkNoRepType "ModuleName"
-
 -- | A Module is a pair of a 'Unit' and a 'ModuleName'.
 type Module = GenModule Unit
 


=====================================
compiler/Language/Haskell/Syntax/Module/Name.hs
=====================================
@@ -3,6 +3,7 @@ module Language.Haskell.Syntax.Module.Name where
 import Prelude
 
 import Data.Char (isAlphaNum)
+import Data.Data
 import Control.DeepSeq
 import qualified Text.ParserCombinators.ReadP as Parse
 import System.FilePath
@@ -12,6 +13,14 @@ import GHC.Data.FastString
 -- | A ModuleName is essentially a simple string, e.g. @Data.List@.
 newtype ModuleName = ModuleName FastString deriving (Show, Eq)
 
+instance Data ModuleName where
+  -- don't traverse?
+  toConstr x   = constr
+    where
+      constr = mkConstr (dataTypeOf x) "{abstract:ModuleName}" [] Prefix
+  gunfold _ _  = error "gunfold"
+  dataTypeOf _ = mkNoRepType "ModuleName"
+
 instance Ord ModuleName where
   nm1 `compare` nm2 = stableModuleNameCmp nm1 nm2
 


=====================================
hadrian/README.md
=====================================
@@ -226,8 +226,8 @@ tested in CI. If you use an untested flavour such as "Quick" then you run the
 risk that not all tests will pass. In particular you can rely on the `validate`
 and `perf` flavours being tested but no others.
 
-`build selftest` runs tests of the build system. The current test coverage
-is close to zero (see [#197][test-issue]).
+`build selftest` (no longer enabled by default) runs tests of the build system.
+The current test coverage is close to zero (see [#197][test-issue]).
 
 #### Running linters
 


=====================================
hadrian/hadrian.cabal
=====================================
@@ -27,7 +27,7 @@ flag threaded
 -- See also #21913
 flag selftest
      manual:             True
-     default:            True
+     default:            False
      description:        Enables the hadrian selftest rules which require
                          QuickCheck. Disabling it thus saves on a few dependencies
                          which can be problematic when bootstrapping.


=====================================
libraries/ghc-internal/src/GHC/Internal/Base.hs
=====================================
@@ -1809,7 +1809,12 @@ build g = g (:) []
 
 augment :: forall a. (forall b. (a->b->b) -> b -> b) -> [a] -> [a]
 {-# INLINE [1] augment #-}
-augment g xs = g (:) xs
+-- Give it one argument so that it inlines with one arg
+-- But (crucially) the body is a lambda so that `g` is visibly applied
+-- to two args, and hence we know that in a call
+--      augment (\c n. blah)
+-- both c and n are OneShot
+augment g = \xs -> g (:) xs
 
 {-# RULES
 "fold/build"    forall k z (g::forall b. (a->b->b) -> b -> b) .
@@ -1975,7 +1980,7 @@ The rules for map work like this.
 "++/literal_utf8" forall x. (++) (unpackCStringUtf8# x) = unpackAppendCStringUtf8# x #-}
 
 {-# RULES
-"++"    [~1] forall xs ys. xs ++ ys = augment (\c n -> foldr c n xs) ys
+"++"    [~1] forall xs. (++) xs = augment (\c n -> foldr c n xs)
   #-}
 
 


=====================================
libraries/ghc-internal/src/GHC/Internal/Float.hs
=====================================
@@ -5,6 +5,7 @@
            , MagicHash
            , UnboxedTuples
            , UnliftedFFITypes
+           , TypeApplications
   #-}
 {-# LANGUAGE CApiFFI #-}
 -- We believe we could deorphan this module, by moving lots of things
@@ -1696,6 +1697,16 @@ showSignedFloat showPos p x
        = showParen (p > 6) (showChar '-' . showPos (-x))
    | otherwise = showPos x
 
+
+-- Speicialise showSignedFloat for (a) the type and (b) the argument function
+-- The particularly targets are the calls in `instance Show Float` and
+--     `instance Show Double`
+-- Specialising for both (a) and (b) is obviously more efficient; and if you
+-- don't you find that the `x` argument is strict, but boxed, and that can cause
+-- functions calling showSignedFloat to have box their argument.
+{-# SPECIALISE showSignedFloat @Float  showFloat #-}
+{-# SPECIALISE showSignedFloat @Double showFloat #-}
+
 {-
 We need to prevent over/underflow of the exponent in encodeFloat when
 called from scaleFloat, hence we clamp the scaling parameter.


=====================================
m4/fp_setup_windows_toolchain.m4
=====================================
@@ -17,12 +17,13 @@ AC_DEFUN([FP_INSTALL_WINDOWS_TOOLCHAIN],[
         else
             action="download"
         fi
-        $PYTHON mk/get-win32-tarballs.py $action $mingw_arch > missing-win32-tarballs
+        $PYTHON mk/get-win32-tarballs.py $action $mingw_arch
         case $? in
             0)
             rm missing-win32-tarballs
             ;;
             2)
+            $PYTHON mk/get-win32-tarballs.py list $mingw_arch > missing-win32-tarballs
             echo
             echo "Error:"
             echo "Needed msys2 tarballs are missing. You have a few options to get them,"


=====================================
testsuite/tests/arityanal/should_compile/T18793.stderr
=====================================
@@ -1,54 +1,44 @@
 
 ==================== Tidy Core ====================
-Result size of Tidy Core = {terms: 64, types: 40, coercions: 0, joins: 0/0}
+Result size of Tidy Core = {terms: 60, types: 34, coercions: 0, joins: 0/0}
 
 -- RHS size: {terms: 17, types: 7, coercions: 0, joins: 0/0}
 stuff [InlPrag=NOINLINE] :: Int -> [Int]
 [GblId, Arity=1, Str=<1L>, Unf=OtherCon []]
-stuff = \ (i :: Int) -> case i of i1 { GHC.Types.I# ipv -> GHC.Types.: @Int i1 (GHC.Types.: @Int (GHC.Types.I# (GHC.Prim.+# ipv 1#)) (GHC.Types.: @Int (GHC.Types.I# (GHC.Prim.+# ipv 2#)) (GHC.Types.[] @Int))) }
+stuff = \ (i :: Int) -> case i of i1 { GHC.Internal.Types.I# ipv -> GHC.Internal.Types.: @Int i1 (GHC.Internal.Types.: @Int (GHC.Internal.Types.I# (GHC.Internal.Prim.+# ipv 1#)) (GHC.Internal.Types.: @Int (GHC.Internal.Types.I# (GHC.Internal.Prim.+# ipv 2#)) (GHC.Internal.Types.[] @Int))) }
 
 Rec {
 -- RHS size: {terms: 23, types: 11, coercions: 0, joins: 0/0}
-T18793.$wgo1 [InlPrag=[2], Occ=LoopBreaker] :: [Int] -> GHC.Prim.Int# -> GHC.Prim.Int#
+T18793.$wgo1 [InlPrag=[2], Occ=LoopBreaker] :: [Int] -> GHC.Internal.Prim.Int# -> GHC.Internal.Prim.Int#
 [GblId[StrictWorker([!])], Arity=2, Str=<1L><L>, Unf=OtherCon []]
 T18793.$wgo1
-  = \ (ds :: [Int]) (ww :: GHC.Prim.Int#) ->
+  = \ (ds :: [Int]) (ww :: GHC.Internal.Prim.Int#) ->
       case ds of {
         [] -> ww;
         : y ys ->
-          case y of { GHC.Types.I# x ->
-          case GHC.Prim.># x 42# of {
+          case y of { GHC.Internal.Types.I# x ->
+          case GHC.Internal.Prim.># x 42# of {
             __DEFAULT -> T18793.$wgo1 ys ww;
-            1# -> T18793.$wgo1 ys (GHC.Prim.negateInt# ww)
+            1# -> T18793.$wgo1 ys (GHC.Internal.Prim.negateInt# ww)
           }
           }
       }
 end Rec }
 
--- RHS size: {terms: 12, types: 6, coercions: 0, joins: 0/0}
-T18793.f_go1 [InlPrag=[2]] :: [Int] -> Int -> Int
-[GblId,
- Arity=2,
- Str=<1L><1!P(L)>,
- Cpr=1,
- Unf=Unf{Src=InlineStable, TopLvl=True, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=ALWAYS_IF(arity=2,unsat_ok=True,boring_ok=False)
-         Tmpl= \ (ds [Occ=Once1] :: [Int]) (eta [Occ=Once1!, OS=OneShot] :: Int) -> case eta of { GHC.Types.I# ww [Occ=Once1] -> case T18793.$wgo1 ds ww of ww1 [Occ=Once1] { __DEFAULT -> GHC.Types.I# ww1 } }}]
-T18793.f_go1 = \ (ds :: [Int]) (eta [OS=OneShot] :: Int) -> case eta of { GHC.Types.I# ww -> case T18793.$wgo1 ds ww of ww1 { __DEFAULT -> GHC.Types.I# ww1 } }
-
 -- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0}
 T18793.f2 :: Int
 [GblId, Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 10 10}]
-T18793.f2 = GHC.Types.I# 1#
+T18793.f2 = GHC.Internal.Types.I# 1#
 
 -- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0}
 T18793.f1 :: [Int]
 [GblId, Unf=Unf{Src=<vanilla>, TopLvl=True, Value=False, ConLike=False, WorkFree=False, Expandable=False, Guidance=IF_ARGS [] 20 0}]
 T18793.f1 = stuff T18793.f2
 
--- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0}
+-- RHS size: {terms: 11, types: 4, coercions: 0, joins: 0/0}
 f :: Int -> Int
-[GblId, Arity=1, Str=<1!P(L)>, Cpr=1, Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 20 60}]
-f = T18793.f_go1 T18793.f1
+[GblId, Arity=1, Str=<1!P(L)>, Cpr=1, Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=IF_ARGS [20] 60 10}]
+f = \ (eta [OS=OneShot] :: Int) -> case eta of { GHC.Internal.Types.I# ww -> case T18793.$wgo1 T18793.f1 ww of ww1 { __DEFAULT -> GHC.Internal.Types.I# ww1 } }
 
 
 


=====================================
testsuite/tests/driver/inline-check.stderr
=====================================
@@ -17,7 +17,6 @@ Inactive unfolding: foo1
 Inactive unfolding: foo1
 Inactive unfolding: foo1
 Inactive unfolding: foo1
-Inactive unfolding: foo1
 Considering inlining: foo
   arg infos []
   interesting continuation RhsCtxt(NonRecursive)


=====================================
testsuite/tests/ghci.debugger/scripts/break011.stdout
=====================================
@@ -29,9 +29,9 @@ HasCallStack backtrace:
   error, called at Test7.hs:2:18 in main:Main
 
 Stopped in <exception thrown>, <unknown>
-_exception :: e = SomeException (ErrorCall "foo")
+_exception :: e = _
 Stopped in <exception thrown>, <unknown>
-_exception :: e = SomeException (ErrorCall "foo")
+_exception :: e = _
 *** Exception: foo
 
 HasCallStack backtrace:


=====================================
testsuite/tests/ghci.debugger/scripts/break024.stdout
=====================================
@@ -17,9 +17,7 @@ _exception = SomeException
                   Nothing GHC.Internal.IO.Exception.UserError [] "error" Nothing
                   Nothing)
 Stopped in <exception thrown>, <unknown>
-_exception :: e = SomeException
-                    (GHC.Internal.IO.Exception.IOError
-                       Nothing GHC.Internal.IO.Exception.UserError ....)
+_exception :: e = _
 Stopped in <exception thrown>, <unknown>
 _exception :: e = _
 _exception = SomeException


=====================================
testsuite/tests/parser/should_run/T25937.hs
=====================================
@@ -0,0 +1,2 @@
+main :: IO ()
+main = print "\^\ "


=====================================
testsuite/tests/parser/should_run/T25937.stdout
=====================================
@@ -0,0 +1 @@
+"\FS "


=====================================
testsuite/tests/parser/should_run/all.T
=====================================
@@ -1,3 +1,8 @@
+test('parser_unit_tests',
+     normal,
+     compile_and_run,
+     ['-package ghc'])
+
 test('readRun001', normal, compile_and_run, [''])
 test('readRun002', normal, compile_and_run, [''])
 test('readRun003', normal, compile_and_run, [''])
@@ -21,6 +26,7 @@ test('RecordDotSyntax3', [extra_files(['RecordDotSyntaxA.hs'])], multimod_compil
 test('RecordDotSyntax4', [extra_files(['RecordDotSyntaxA.hs'])], multimod_compile_and_run, ['RecordDotSyntax4', ''])
 test('RecordDotSyntax5', normal, compile_and_run, [''])
 test('ListTuplePunsConstraints', extra_files(['ListTuplePunsConstraints.hs']), ghci_script, ['ListTuplePunsConstraints.script'])
+test('T25937', normal, compile_and_run, [''])
 
 # Multiline strings
 test('MultilineStrings', normal, compile_and_run, [''])


=====================================
testsuite/tests/parser/should_run/parser_unit_tests.hs
=====================================
@@ -0,0 +1,82 @@
+import GHC.Data.StringBuffer (stringToStringBuffer)
+import qualified GHC.Data.StringBuffer as StringBuffer (StringBuffer (..))
+import GHC.Parser.String (lexString, lexMultilineString)
+
+import qualified Control.Exception as E
+import Control.Monad (forM_, unless)
+
+main :: IO ()
+main = do
+  forM_ tests $ \(testName, test) -> do
+    result <- E.try test
+    case result of
+      Right () -> pure ()
+      Left (e :: E.SomeException)
+        | Just e' <- E.asyncExceptionFromException e -> do
+            E.throwIO (e' :: E.AsyncException)
+        | otherwise -> do
+            putStrLn $ ">>> FAIL: " ++ testName
+            putStrLn $ E.displayException e
+
+{----- Test infrastructure -----}
+
+data TestFailure = TestFailure String
+  deriving (Show)
+
+instance E.Exception TestFailure where
+  displayException (TestFailure msg) = "Test failure:\n" ++ msg
+
+testFailure :: String -> IO a
+testFailure = E.throwIO . TestFailure
+
+shouldBe :: (Eq a, Show a) => a -> a -> IO ()
+shouldBe actual expected =
+  unless (actual == expected) $
+    testFailure $
+      "Got:      " ++ show actual ++ "\n" ++
+      "Expected: " ++ show expected
+
+type TestCase = (String, IO ())
+
+testCase :: String -> IO () -> TestCase
+testCase = (,)
+
+{----- Tests -----}
+
+tests :: [TestCase]
+tests = concat
+  [ stringTests
+  ]
+
+-- | Unit tests for GHC.Parser.String
+stringTests :: [TestCase]
+stringTests = concat
+  [ escapedBackslashTests
+  ]
+  where
+    withBuf f s = let buf = stringToStringBuffer s in f (StringBuffer.len buf) buf
+
+    -- Test all situations where backslashes can appear in escape characters (#25937)
+    escapedBackslashTests =
+      [ testCase label $ do
+          withBuf lexStr input `shouldBe` Right output
+      | (lexLabel, lexStr) <- [("strings", lexString), ("multiline strings", lexMultilineString)]
+      , (label, input, output) <-
+          [ ( "escaped backslashes in " ++ lexLabel ++ " not mistaken for string gaps"
+            , [' ', '\\', '\\', ' ', '\\', '\\', ' ']
+            , " \\ \\ "
+            )
+          , ( "escaped \\FS in " ++ lexLabel ++ " not mistaken for beginning of string gap"
+            , ['\\', '^', '\\']
+            , "\FS"
+            )
+          , ( "escaped \\FS in " ++ lexLabel ++ " not mistaken for unterminated string gap"
+            , ['\\', '^', '\\', ' ']
+            , "\FS "
+            )
+          , ( "escaped \\FS in " ++ lexLabel ++ " does not collapse mistaken string gap"
+            , ['\\', '^', '\\', ' ', '\\', 'n']
+            , "\FS \n"
+            )
+          ]
+      ]


=====================================
testsuite/tests/simplCore/should_run/simplrun009.hs
=====================================
@@ -6,7 +6,10 @@
 -- It produces a nested unfold that should look something
 -- like the code below.  Note the 'lvl1_shW'.  It is BAD
 -- if this is a lambda instead; you get a lot more allocation
--- See Note [Saving allocation] in GHC.Core.Opt.SetLevels
+--
+-- LATER (2025): But in the end it seems better NOT to float lambdas,
+-- unless they go to top level.
+-- See (SW2) in Note [Saving work] in GHC.Core.Opt.SetLevels
 
 
 {-



View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/e35271b8eba4e5f056014f2cebc2b7e...

-- 
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/e35271b8eba4e5f056014f2cebc2b7e...
You're receiving this email because of your account on gitlab.haskell.org.

    

[Git][ghc/ghc][wip/T23109a] 10 commits: Fix lexing "\^\" (#25937)

Simon Peyton Jones (＠simonpj)