[Git][ghc/ghc][wip/T26903] Fix subtle bug in cast worker/wrapper
Simon Peyton Jones pushed to branch wip/T26903 at Glasgow Haskell Compiler / GHC
Commits:
99d8c146 by Simon Peyton Jones at 2026-02-12T17:36:59+00:00
Fix subtle bug in cast worker/wrapper
See (CWw4) in Note [Cast worker/wrapper].
The true payload is in the change to the definition of
GHC.Types.Id.Info.hasInlineUnfolding
Everthing else is just documentation.
There is a 2% compile time decrease for T13056;
I'll take the win!
Metric Decrease:
T13056
- - - - -
8 changed files:
- compiler/GHC/Core/Opt/Simplify/Iteration.hs
- compiler/GHC/Core/Opt/WorkWrap.hs
- compiler/GHC/Driver/Config/Core/Lint.hs
- compiler/GHC/Types/Id/Info.hs
- + testsuite/tests/simplCore/should_compile/T26903.hs
- + testsuite/tests/simplCore/should_compile/T26903.stderr
- testsuite/tests/simplCore/should_compile/T8331.stderr
- testsuite/tests/simplCore/should_compile/all.T
Changes:
=====================================
compiler/GHC/Core/Opt/Simplify/Iteration.hs
=====================================
@@ -474,14 +474,14 @@ leaving a simpler job for demand-analysis worker/wrapper. See #19874.
Wrinkles
-1. We must /not/ do cast w/w on
+(CWW1) We must /not/ do cast w/w on
f = g |> co
otherwise it'll just keep repeating forever! You might think this
is avoided because the call to tryCastWorkerWrapper is guarded by
- preInlineUnconditinally, but I'm worried that a loop-breaker or an
- exported Id might say False to preInlineUnonditionally.
+ preInlineUnconditionally, but I'm worried that a loop-breaker or an
+ exported Id might say False to preInlineUnconditionally.
-2. We need to be careful with inline/noinline pragmas:
+(CWW2) We need to be careful with inline/noinline pragmas:
rec { {-# NOINLINE f #-}
f = (...g...) |> co
; g = ...f... }
@@ -496,15 +496,15 @@ Wrinkles
f = $wf |> co
; g = ...f... }
and that is bad: the whole point is that we want to inline that
- cast! We want to transfer the pagma to $wf:
+ cast! We want to transfer the pragma to $wf:
rec { {-# NOINLINE $wf #-}
$wf = ...g...
; f = $wf |> co
; g = ...f... }
c.f. Note [Worker/wrapper for NOINLINE functions] in GHC.Core.Opt.WorkWrap.
-3. We should still do cast w/w even if `f` is INLINEABLE. E.g.
- {- f: Stable unfolding = <stable-big> -}
+(CWW3) We should still do cast w/w even if `f` is INLINEABLE. E.g.
+ {- f: Stable unfolding (arity 2) = <stable-big> -}
f = (\xy. <big-body>) |> co
Then we want to w/w to
{- $wf: Stable unfolding = <stable-big> |> sym co -}
@@ -513,15 +513,43 @@ Wrinkles
Notice that the stable unfolding moves to the worker! Now demand analysis
will work fine on $wf, whereas it has trouble with the original f.
c.f. Note [Worker/wrapper for INLINABLE functions] in GHC.Core.Opt.WorkWrap.
- This point also applies to strong loopbreakers with INLINE pragmas, see
- wrinkle (4).
-4. We should /not/ do cast w/w for non-loop-breaker INLINE functions (hence
- hasInlineUnfolding in tryCastWorkerWrapper, which responds False to
- loop-breakers) because they'll definitely be inlined anyway, cast and
- all. And if we do cast w/w for an INLINE function with arity zero, we get
+(CWW4) We should /not/ do cast w/w for INLINE functions (hence `hasInlineUnfolding`
+ in `tryCastWorkerWrapper`) because they'll definitely be inlined anyway, cast
+ and all.
+
+ Moreover, if we do cast w/w for an INLINE function with arity zero, we get
something really silly: we inline that "worker" right back into the wrapper!
- Worse than a no-op, because we have then lost the stable unfolding.
+ In fact it is Much Worse than a no-op, because we have then lost the stable
+ unfolding --- aargh (see #26903). E.g. similar example to (CWW3)
+ {- g: Stable unfolding (arity 0) = <stable-big> -} NB arity 0!
+ g = (\xy. <big-body>) |> co
+ If we w/w to this:
+ {- $wg: Stable unfolding (arity 0) = <stable-big> |> sym co -}
+ $wg = \xy. <big-body>
+ g = $wg |> co
+ then we'll inline $wg at the call site in `g` giving
+ {- $wg: Stable unfolding (arity 0) = <stable-big> |> sym co -}
+ $wg = \xy. <big-body>
+ g = (<stable-big> |> sym co) |> co
+ and now we'll drop `$wg` as dead and we have lost the unfolding on `g`.
+ (We could /also/ give the binding `g = $wf |> co` a stable unfolding. Then
+ things would work right; but there is also no point in doing the cast
+ worker/wrapper in the first place.)
+
+ NB: you might wonder about a loop-breaker with an INLINE pragma; after all, a
+ loop breaker won't "definitely be inlined anyway", so arguably we should not
+ disable cast w/w/ for it. But a Rec group can /look/ recursive at an early
+ stage, and subsequently /become/ non-recursive after some simplification.
+ (This is common in instance decls; see Note [Checking for INLINE loop breakers]
+ in GHC.Core.Lint.) So the danger is that we'll permanently lose that stable
+ unfolding that we specifically wanted (#26903). Simple solution: disable cast
+ w/w for /any/ INLINE function. See the defn
+ of `GHC.Types.Id.Info.hasInlineUnfolding`.
+
+ The danger is that an INLINE pragma on a genuninely-recursive function
+ will kill worker-wrapper. Well, so be it. They are pretty suspicious anyway;
+ see Note [Checking for INLINE loop breakers].
All these wrinkles are exactly like worker/wrapper for strictness analysis:
f is the wrapper and must inline like crazy
@@ -586,11 +614,11 @@ tryCastWorkerWrapper env bind_cxt old_bndr bndr (Cast rhs co)
| BC_Let top_lvl is_rec <- bind_cxt -- Not join points
, not (isDFunId bndr) -- nor DFuns; cast w/w is no help, and we can't transform
-- a DFunUnfolding in mk_worker_unfolding
- , not (exprIsTrivial rhs) -- Not x = y |> co; Wrinkle 1
- , not (hasInlineUnfolding info) -- Not INLINE things: Wrinkle 4
- , typeHasFixedRuntimeRep work_ty -- Don't peel off a cast if doing so would
- -- lose the underlying runtime representation.
- -- See Note [Preserve RuntimeRep info in cast w/w]
+ , not (exprIsTrivial rhs) -- Not x = y |> co; see (CWW1)
+ , not (hasInlineUnfolding info) -- Not INLINE things: see (CWW4)
+ , typeHasFixedRuntimeRep work_ty -- Don't peel off a cast if doing so would
+ -- lose the underlying runtime representation.
+ -- See Note [Preserve RuntimeRep info in cast w/w]
, not (isOpaquePragma (idInlinePragma old_bndr)) -- Not for OPAQUE bindings
-- See Note [OPAQUE pragma]
= do { uniq <- getUniqueM
@@ -637,13 +665,13 @@ tryCastWorkerWrapper env bind_cxt old_bndr bndr (Cast rhs co)
`setArityInfo` work_arity
-- We do /not/ want to transfer OccInfo, Rules
-- Note [Preserve strictness in cast w/w]
- -- and Wrinkle 2 of Note [Cast worker/wrapper]
+ -- and (CWW2) of Note [Cast worker/wrapper]
----------- Worker unfolding -----------
-- Stable case: if there is a stable unfolding we have to compose with (Sym co);
-- the next round of simplification will do the job
-- Non-stable case: use work_rhs
- -- Wrinkle 3 of Note [Cast worker/wrapper]
+ -- See (CWW4) of Note [Cast worker/wrapper]
mk_worker_unfolding top_lvl work_id work_rhs
= case realUnfoldingInfo info of -- NB: the real one, even for loop-breakers
unf@(CoreUnfolding { uf_tmpl = unf_rhs, uf_src = src })
=====================================
compiler/GHC/Core/Opt/WorkWrap.hs
=====================================
@@ -179,8 +179,9 @@ several liked-named Ids bouncing around at the same time---absolute
mischief.)
Notice that we refrain from w/w'ing an INLINE function even if it is
-in a recursive group. It might not be the loop breaker. (We could
-test for loop-breaker-hood, but I'm not sure that ever matters.)
+in a recursive group. It might not be the loop breaker. (We used to
+test for loop-breaker-hood, but see (CWW4) in Note [Cast worker/wrapper]
+in GHC.Core.Opt.Simplify.Iteration.)
Note [Worker/wrapper for INLINABLE functions]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
=====================================
compiler/GHC/Driver/Config/Core/Lint.hs
=====================================
@@ -132,8 +132,6 @@ perPassFlags dflags pass
-- there may be some INLINE knots still tied, which is tiresomely noisy
CoreDoSimplify cfg
| SimplPhase InitialPhase <- sm_phase (so_mode cfg)
- -> True
- | otherwise
-> False
_ -> True
=====================================
compiler/GHC/Types/Id/Info.hs
=====================================
@@ -610,7 +610,12 @@ hasInlineUnfolding :: IdInfo -> Bool
-- ^ True of a /non-loop-breaker/ Id that has a /stable/ unfolding that is
-- (a) always inlined; that is, with an `UnfWhen` guidance, or
-- (b) a DFunUnfolding which never needs to be inlined
-hasInlineUnfolding info = isInlineUnfolding (unfoldingInfo info)
+--
+-- Very important that this work with `realUnfoldingInfo` and so returns
+-- True even for a loop-breaker that has an INLINE pragma.
+-- See (CWW4) in Note [Cast worker/wrapper] in GHC.Core.Opt.Simplify.Iteration
+-- for discussion, and #26903 for the dire consequences of getting this wrong.
+hasInlineUnfolding info = isInlineUnfolding (realUnfoldingInfo info)
setArityInfo :: IdInfo -> ArityInfo -> IdInfo
setArityInfo info ar =
=====================================
testsuite/tests/simplCore/should_compile/T26903.hs
=====================================
@@ -0,0 +1,23 @@
+{-# LANGUAGE DefaultSignatures #-}
+module T26903 where
+
+newtype T a = MkT [a]
+
+class C a where
+ op :: [a] -> [a] -> T a
+
+ -- This default method
+ -- * Has an INLINE pragma
+ -- * Is too big to inline without a pragma
+ -- * Has arity zero
+ {-# INLINE[1] op #-}
+ default op :: Ord a => [a] -> [a] -> T a
+ op = \xs ys -> MkT $ if xs>ys then reverse (reverse (reverse (reverse xs)))
+ else reverse (reverse (reverse (reverse (xs ++ ys))))
+
+instance C Int where {}
+
+test :: [Int] -> T Int
+test xs = op [] xs
+ -- We expect to see `op` inlined into the RHS of `test`
+
=====================================
testsuite/tests/simplCore/should_compile/T26903.stderr
=====================================
@@ -0,0 +1,52 @@
+
+==================== Tidy Core ====================
+Result size of Tidy Core
+ = {terms: 127, types: 130, coercions: 48, joins: 0/0}
+
+$dmop
+ = (\ @a _ $dOrd xs ys ->
+ case $fOrdList_$ccompare $dOrd xs ys of {
+ __DEFAULT ->
+ reverse1 (reverse1 (reverse1 (reverse1 (++ xs ys) []) []) []) [];
+ GT -> reverse1 (reverse1 (reverse1 (reverse xs) []) []) []
+ })
+ `cast` Co:20 :: ...
+
+$fCInt_$cop
+ = (\ xs ys ->
+ case $fOrdList_$s$ccompare xs ys of {
+ __DEFAULT ->
+ reverse1 (reverse1 (reverse1 (reverse1 (++ xs ys) []) []) []) [];
+ GT -> reverse1 (reverse1 (reverse1 (reverse xs) []) []) []
+ })
+ `cast` Co:11 :: ...
+
+$fCInt1
+ = \ xs ys ->
+ case $fOrdList_$s$ccompare xs ys of {
+ __DEFAULT ->
+ reverse1 (reverse1 (reverse1 (reverse1 (++ xs ys) []) []) []) [];
+ GT -> reverse1 (reverse1 (reverse1 (reverse xs) []) []) []
+ }
+
+$fCInt = C:C ($fCInt1 `cast` Co:11 :: ...)
+
+test4 = reverse1 [] []
+
+test3 = reverse1 test4 []
+
+test2 = reverse1 test3 []
+
+test1 = reverse1 test2 []
+
+test
+ = \ xs ->
+ case $fOrdList_$s$ccompare [] xs of {
+ __DEFAULT ->
+ (reverse1 (reverse1 (reverse1 (reverse1 (++ [] xs) []) []) []) [])
+ `cast` Co:3 :: ...;
+ GT -> test1 `cast` Co:3 :: ...
+ }
+
+
+
=====================================
testsuite/tests/simplCore/should_compile/T8331.stderr
=====================================
@@ -56,11 +56,11 @@
ReaderT r (ST s) (a -> b) -> ReaderT r (ST s) a -> r -> ST s b))
"SPEC $c>> @(ST s) @_"
forall (@s) (@r) ($dMonad :: Monad (ST s)).
- $fMonadReaderT1 @(ST s) @r $dMonad
+ $fMonadReaderT_$c>> @(ST s) @r $dMonad
= $fMonadAbstractIOSTReaderT_$s$c>> @s @r
"SPEC $c>>= @(ST s) @_"
forall (@s) (@r) ($dMonad :: Monad (ST s)).
- $fMonadReaderT2 @(ST s) @r $dMonad
+ $fMonadReaderT1 @(ST s) @r $dMonad
= ($fMonadAbstractIOSTReaderT2 @s @r)
`cast` (forall (a ::~ <*>_N) (b ::~ <*>_N).
participants (1)
-
Simon Peyton Jones (@simonpj)