[Git][ghc/ghc][wip/marge_bot_batch_merge_job] 6 commits: Fix typo in docs/users_guide/exts/type_families.rst
Marge Bot pushed to branch wip/marge_bot_batch_merge_job at Glasgow Haskell Compiler / GHC
Commits:
fc555f08 by Simon Hengel at 2025-11-27T07:42:43-05:00
Fix typo in docs/users_guide/exts/type_families.rst
- - - - -
8d213a1c by Simon Hengel at 2025-11-27T07:42:44-05:00
Fix broken RankNTypes example in user's guide
- - - - -
2bae26ae by Simon Peyton Jones at 2025-11-27T07:42:44-05:00
Switch off specialisation in ExactPrint
In !15057 (where we re-introduced -fpolymoprhic-specialisation) we found
that ExactPrint's compile time blew up by a factor of 5. It turned out
to be caused by bazillions of specialisations of `markAnnotated`.
Since ExactPrint isn't perf-critical, it does not seem worth taking
the performance hit, so this patch switches off specialisation in
this one module.
- - - - -
eddea2d2 by Simon Peyton Jones at 2025-11-27T07:42:44-05:00
Switch -fpolymorphic-specialisation on by default
This patch addresses #23559.
Now that !10479 has landed and #26329 is fixed, we can switch on
polymorphic specialisation by default, addressing a bunch of other
tickets listed in #23559.
Metric changes:
* CoOpt_Singleton: +4% compiler allocations: we just get more
specialisations
* info_table_map_perf: -20% decrease in compiler allocations.
This is caused by using -fno-specialise in ExactPrint.hs
Without that change we get a 4x blow-up in compile time;
see !15058 for details
Metric Decrease:
info_table_map_perf
Metric Increase:
CoOpt_Singletons
- - - - -
f274cc23 by Matthew Pickering at 2025-11-27T07:42:45-05:00
rts: Fix a deadlock with eventlog flush interval and RTS shutdown
The ghc_ticker thread attempts to flush at the eventlog tick interval, this requires
waiting to take all capabilities.
At the same time, the main thread is shutting down, the schedule is
stopped and then we wait for the ticker thread to finish.
Therefore we are deadlocked.
The solution is to use `newBoundTask/exitMyTask`, so that flushing can
cooperate with the scheduler shutdown.
Fixes #26573
- - - - -
8b566aa6 by sheaf at 2025-11-27T07:42:58-05:00
SimpleOpt: don't subst in pushCoercionIntoLambda
It was noticed in #26589 that the change in 15b311be was incorrect:
the simple optimiser carries two different substitution-like pieces of
information: 'soe_subst' (from InVar to OutExpr) and 'soe_inl'
(from InId to InExpr). It is thus incorrect to have 'pushCoercionIntoLambda'
apply the substitution from 'soe_subst' while discarding 'soe_inl'
entirely, which is what was done in 15b311be.
Instead, we change back pushCoercionIntoLambda to take an InScopeSet,
and optimise the lambda before calling 'pushCoercionIntoLambda' to avoid
mixing InExpr with OutExpr, or mixing two InExpr with different
environments. We can then call 'soeZapSubst' without problems.
Fixes #26588 #26589
- - - - -
15 changed files:
- compiler/GHC/Core/Opt/Arity.hs
- compiler/GHC/Core/SimpleOpt.hs
- compiler/GHC/Driver/DynFlags.hs
- compiler/GHC/Driver/Flags.hs
- compiler/GHC/Tc/Gen/App.hs
- docs/users_guide/exts/rank_polymorphism.rst
- docs/users_guide/exts/type_families.rst
- docs/users_guide/using-optimisation.rst
- rts/eventlog/EventLog.c
- testsuite/tests/rts/all.T
- + testsuite/tests/simplCore/should_compile/T26588.hs
- + testsuite/tests/simplCore/should_compile/T26589.hs
- testsuite/tests/simplCore/should_compile/T8331.stderr
- testsuite/tests/simplCore/should_compile/all.T
- utils/check-exact/ExactPrint.hs
Changes:
=====================================
compiler/GHC/Core/Opt/Arity.hs
=====================================
@@ -2993,12 +2993,12 @@ pushCoValArg co
Pair tyL tyR = coercionKind co
pushCoercionIntoLambda
- :: HasDebugCallStack => Subst -> InVar -> InExpr -> OutCoercionR -> Maybe (OutVar, OutExpr)
+ :: HasDebugCallStack => InScopeSet -> Var -> CoreExpr -> CoercionR -> Maybe (Var, CoreExpr)
-- This implements the Push rule from the paper on coercions
-- (\x. e) |> co
-- ===>
-- (\x'. e |> co')
-pushCoercionIntoLambda subst x e co
+pushCoercionIntoLambda in_scope x e co
| assert (not (isTyVar x) && not (isCoVar x)) True
, Pair s1s2 t1t2 <- coercionKind co
, Just {} <- splitFunTy_maybe s1s2
@@ -3011,9 +3011,9 @@ pushCoercionIntoLambda subst x e co
-- Should we optimize the coercions here?
-- Otherwise they might not match too well
x' = x `setIdType` t1 `setIdMult` w1
- in_scope' = substInScopeSet subst `extendInScopeSet` x'
+ in_scope' = in_scope `extendInScopeSet` x'
subst' =
- extendIdSubst (setInScope subst in_scope')
+ extendIdSubst (setInScope emptySubst in_scope')
x
(mkCast (Var x') (mkSymCo co1))
-- We substitute x' for x, except we need to preserve types.
=====================================
compiler/GHC/Core/SimpleOpt.hs
=====================================
@@ -393,12 +393,19 @@ simple_app env e0@(Lam {}) as0@(_:_)
= wrapLet mb_pr $ do_beta env'' body as
where (env', b') = subst_opt_bndr env b
- do_beta env e@(Lam b body) as@(CastIt co:rest)
- -- See Note [Desugaring unlifted newtypes]
+ -- See Note [Eliminate casts in function position]
+ do_beta env e@(Lam b _) as@(CastIt out_co:rest)
| isNonCoVarId b
- , Just (b', body') <- pushCoercionIntoLambda (soe_subst env) b body co
+ -- Optimise the inner lambda to make it an 'OutExpr', which makes it
+ -- possible to call 'pushCoercionIntoLambda' with the 'OutCoercion' 'co'.
+ -- This is kind of horrible, as for nested casted lambdas with a big body,
+ -- we will repeatedly optimise the body (once for each binder). However,
+ -- we need to do this to avoid mixing 'InExpr' and 'OutExpr', or two
+ -- 'InExpr' with different environments (getting this wrong caused #26588 & #26589.)
+ , Lam out_b out_body <- simple_app env e []
+ , Just (b', body') <- pushCoercionIntoLambda (soeInScope env) out_b out_body out_co
= do_beta (soeZapSubst env) (Lam b' body') rest
- -- soeZapSubst: pushCoercionIntoLambda applies the substitution
+ -- soeZapSubst: we've already optimised everything (the lambda and 'rest') by now.
| otherwise
= rebuild_app env (simple_opt_expr env e) as
@@ -511,7 +518,31 @@ TL;DR: To avoid the rest of the compiler pipeline seeing these bad lambas, we
rely on the simple optimiser to both inline the newtype unfolding and
subsequently deal with the resulting lambdas (either beta-reducing them
altogether or pushing coercions into them so that they satisfy the
-representation-polymorphism invariants).
+representation-polymorphism invariants). See Note [Eliminate casts in function position].
+
+[Alternative approach] (GHC ticket #26608)
+
+ We could instead, in the typechecker, emit a special form (a new constructor
+ of XXExprGhcTc) for instantiations of representation-polymorphic unlifted
+ newtypes (whether applied to a value argument or not):
+
+ UnliftedNT :: DataCon -> [Type] -> Coercion -> XXExprGhcTc
+
+ where "UnliftedNT nt_con [ty1, ...] co" represents the expression:
+
+ ( nt_con @ty1 ... ) |> co
+
+ The desugarer would then turn these AST nodes into appropriate Core, doing
+ what the simple optimiser does today:
+ - inline the compulsory unfolding of the newtype constructor
+ - apply it to its type arguments and beta reduce
+ - push the coercion into the resulting lambda
+
+ This would have several advantages:
+ - the desugarer would never produce "invalid" Core that needs to be
+ tidied up by the simple optimiser,
+ - the ugly and inefficient implementation described in
+ Note [Eliminate casts in function position] could be removed.
Wrinkle [Unlifted newtypes with wrappers]
@@ -717,50 +748,49 @@ rhss here.
Note [Eliminate casts in function position]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-Consider the following program:
+Due to the current implementation strategy for representation-polymorphic
+unlifted newtypes, as described in Note [Desugaring unlifted newtypes], we rely
+on the simple optimiser to push coercions into lambdas, such as in the following
+example:
type R :: Type -> RuntimeRep
- type family R a where { R Float = FloatRep; R Double = DoubleRep }
- type F :: forall (a :: Type) -> TYPE (R a)
- type family F a where { F Float = Float# ; F Double = Double# }
+ type family R a where { R Int = IntRep }
+ type F :: forall a -> TYPE (R a)
+ type family F a where { F Int = Int# }
- type N :: forall (a :: Type) -> TYPE (R a)
newtype N a = MkN (F a)
-As MkN is a newtype, its unfolding is a lambda which wraps its argument
-in a cast:
-
- MkN :: forall (a :: Type). F a -> N a
- MkN = /\a \(x::F a). x |> co_ax
- -- recall that F a :: TYPE (R a)
-
-This is a representation-polymorphic lambda, in which the binder has an unknown
-representation (R a). We can't compile such a lambda on its own, but we can
-compile instantiations, such as `MkN @Float` or `MkN @Double`.
+Now, an instantiated occurrence of 'MkN', such as 'MkN @Int' (whether applied
+to a value argument or not) will lead, after inlining the compulsory unfolding
+of 'MkN', to a lambda fo the form:
-Our strategy to avoid running afoul of the representation-polymorphism
-invariants of Note [Representation polymorphism invariants] in GHC.Core is thus:
+ ( \ ( x :: F Int ) -> body ) |> co
- 1. Give the newtype a compulsory unfolding (it has no binding, as we can't
- define lambdas with representation-polymorphic value binders in source Haskell).
- 2. Rely on the optimiser to beta-reduce away any representation-polymorphic
- value binders.
+ where
+ co :: ( F Int -> res ) ~# ( Int# -> res )
-For example, consider the application
+The problem is that we now have a lambda abstraction whose binder does not have a
+fixed RuntimeRep in the sense of Note [Fixed RuntimeRep] in GHC.Tc.Utils.Concrete.
- MkN @Float 34.0#
+However, if we use 'pushCoercionIntoLambda', we end up with:
-After inlining MkN we'll get
+ ( \ ( x' :: Int# ) -> body' )
- ((/\a \(x:F a). x |> co_ax) @Float) |> co 34#
+which satisfies the representation-polymorphism invariants of
+Note [Representation polymorphism invariants] in GHC.Core.
-where co :: (F Float -> N Float) ~ (Float# ~ N Float)
+In conclusion:
-But to actually beta-reduce that lambda, we need to push the 'co'
-inside the `\x` with pushCoercionIntoLambda. Hence the extra
-equation for Cast-of-Lam in simple_app.
+ 1. The simple optimiser must push casts into lambdas.
+ 2. It must also deal with a situation such as (MkN @Int) |> co, where we first
+ inline the compulsory unfolding of N. This means the simple optimiser must
+ "peel off" the casts and optimise the inner expression first, to determine
+ whether it is a lambda abstraction or not.
-This is regrettably delicate.
+This is regrettably delicate. If we could make sure the typechecker/desugarer
+did not produce these bad lambdas in the first place (as described in
+[Alternative approach] in Note [Desugaring unlifted newtypes]), we could
+get rid of this ugly logic.
Note [Preserve join-binding arity]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -1673,7 +1703,7 @@ exprIsLambda_maybe ise@(ISE in_scope_set _) (Cast casted_e co)
-- this implies that x is not in scope in gamma (makes this code simpler)
, not (isTyVar x) && not (isCoVar x)
, assert (not $ x `elemVarSet` tyCoVarsOfCo co) True
- , Just (x',e') <- pushCoercionIntoLambda (mkEmptySubst in_scope_set) x e co
+ , Just (x',e') <- pushCoercionIntoLambda in_scope_set x e co
, let res = Just (x',e',ts)
= --pprTrace "exprIsLambda_maybe:Cast" (vcat [ppr casted_e,ppr co,ppr res)])
res
=====================================
compiler/GHC/Driver/DynFlags.hs
=====================================
@@ -1268,6 +1268,7 @@ optLevelFlags -- see Note [Documenting optimisation flags]
, ([1,2], Opt_CfgBlocklayout) -- Experimental
, ([1,2], Opt_Specialise)
+ , ([1,2], Opt_PolymorphicSpecialisation) -- Now on by default (#23559)
, ([1,2], Opt_CrossModuleSpecialise)
, ([1,2], Opt_InlineGenerics)
, ([1,2], Opt_Strictness)
=====================================
compiler/GHC/Driver/Flags.hs
=====================================
@@ -909,6 +909,7 @@ optimisationFlags = EnumSet.fromList
, Opt_SpecialiseAggressively
, Opt_CrossModuleSpecialise
, Opt_StaticArgumentTransformation
+ , Opt_PolymorphicSpecialisation
, Opt_CSE
, Opt_StgCSE
, Opt_StgLiftLams
=====================================
compiler/GHC/Tc/Gen/App.hs
=====================================
@@ -749,13 +749,13 @@ tcInstFun do_ql inst_final (tc_fun, fun_ctxt) fun_sigma rn_args
go1 _pos acc fun_ty []
| XExpr (ConLikeTc (RealDataCon dc)) <- tc_fun
, isNewDataCon dc
- , [Scaled _ arg_ty] <- dataConOrigArgTys dc
+ , [Scaled _ orig_arg_ty] <- dataConOrigArgTys dc
, n_val_args == 0
-- If we're dealing with an unsaturated representation-polymorphic
-- UnliftedNewype, then perform a representation-polymorphism check.
-- See Note [Representation-polymorphism checks for unsaturated unlifted newtypes]
-- in GHC.Tc.Utils.Concrete.
- , not $ typeHasFixedRuntimeRep arg_ty
+ , not $ typeHasFixedRuntimeRep orig_arg_ty
= do { (wrap_co, arg_ty, res_ty) <-
matchActualFunTy (FRRRepPolyUnliftedNewtype dc)
(Just $ HsExprTcThing tc_fun)
=====================================
docs/users_guide/exts/rank_polymorphism.rst
=====================================
@@ -195,7 +195,7 @@ For example: ::
g3c :: Int -> forall x y. y -> x -> x
f4 :: (Int -> forall a. (Eq a, Show a) => a -> a) -> Bool
- g4 :: Int -> forall x. (Show x, Eq x) => x -> x) -> Bool
+ g4 :: Int -> forall x. (Show x, Eq x) => x -> x
Then the application ``f3 g3a`` is well-typed, because ``g3a`` has a type that matches the type
expected by ``f3``. But ``f3 g3b`` is not well typed, because the foralls are in different places.
=====================================
docs/users_guide/exts/type_families.rst
=====================================
@@ -680,7 +680,7 @@ thus: ::
When doing so, we (optionally) may drop the "``family``" keyword.
The type parameters must all be type variables, of course, and some (but
-not necessarily all) of then can be the class parameters. Each class
+not necessarily all) of them can be the class parameters. Each class
parameter may only be used at most once per associated type, but some
may be omitted and they may be in an order other than in the class head.
Hence, the following contrived example is admissible: ::
=====================================
docs/users_guide/using-optimisation.rst
=====================================
@@ -1325,10 +1325,7 @@ as such you shouldn't need to set any of them explicitly. A flag
:reverse: -fno-polymorphic-specialisation
:category:
- :default: off
-
- Warning, this feature is highly experimental and may lead to incorrect runtime
- results. Use at your own risk (:ghc-ticket:`23469`, :ghc-ticket:`23109`, :ghc-ticket:`21229`, :ghc-ticket:`23445`).
+ :default: on
Enable specialisation of function calls to known dictionaries with free type variables.
The created specialisation will abstract over the type variables free in the dictionary.
=====================================
rts/eventlog/EventLog.c
=====================================
@@ -491,13 +491,7 @@ endEventLogging(void)
eventlog_enabled = false;
- // Flush all events remaining in the buffers.
- //
- // N.B. Don't flush if shutting down: this was done in
- // finishCapEventLogging and the capabilities have already been freed.
- if (getSchedState() != SCHED_SHUTTING_DOWN) {
- flushEventLog(NULL);
- }
+ flushEventLog(NULL);
ACQUIRE_LOCK(&eventBufMutex);
@@ -1626,15 +1620,24 @@ void flushEventLog(Capability **cap USED_IF_THREADS)
return;
}
+ // N.B. Don't flush if shutting down: this was done in
+ // finishCapEventLogging and the capabilities have already been freed.
+ // This can also race against the shutdown if the flush is triggered by the
+ // ticker thread. (#26573)
+ if (getSchedState() == SCHED_SHUTTING_DOWN) {
+ return;
+ }
+
ACQUIRE_LOCK(&eventBufMutex);
printAndClearEventBuf(&eventBuf);
RELEASE_LOCK(&eventBufMutex);
#if defined(THREADED_RTS)
- Task *task = getMyTask();
+ Task *task = newBoundTask();
stopAllCapabilitiesWith(cap, task, SYNC_FLUSH_EVENT_LOG);
flushAllCapsEventsBufs();
releaseAllCapabilities(getNumCapabilities(), cap ? *cap : NULL, task);
+ exitMyTask();
#else
flushLocalEventsBuf(getCapability(0));
#endif
=====================================
testsuite/tests/rts/all.T
=====================================
@@ -2,6 +2,11 @@ test('testblockalloc',
[c_src, only_ways(['normal','threaded1']), extra_run_opts('+RTS -I0')],
compile_and_run, [''])
+test('numeric_version_eventlog_flush',
+ [ignore_stdout, req_ghc_with_threaded_rts],
+ run_command,
+ ['{compiler} --numeric-version +RTS -l --eventlog-flush-interval=1 -RTS'])
+
test('testmblockalloc',
[c_src, only_ways(['normal','threaded1']), extra_run_opts('+RTS -I0 -xr0.125T'),
when(arch('wasm32'), skip)], # MBlocks can't be freed on wasm32, see Note [Megablock allocator on wasm] in rts
=====================================
testsuite/tests/simplCore/should_compile/T26588.hs
=====================================
@@ -0,0 +1,32 @@
+module T26588 ( getOptionSettingFromText ) where
+
+import Control.Applicative ( Const(..) )
+import Data.Map (Map)
+import qualified Data.Map.Strict as Map
+
+------------------------------------------------------------------------
+-- ConfigState
+
+data ConfigLeaf
+data ConfigTrie = ConfigTrie !(Maybe ConfigLeaf) !ConfigMap
+
+type ConfigMap = Map Int ConfigTrie
+
+freshLeaf :: [Int] -> ConfigLeaf -> ConfigTrie
+freshLeaf [] l = ConfigTrie (Just l) mempty
+freshLeaf (a:as) l = ConfigTrie Nothing (Map.singleton a (freshLeaf as l))
+
+adjustConfigTrie :: Functor t => [Int] -> (Maybe ConfigLeaf -> t (Maybe ConfigLeaf)) -> Maybe (ConfigTrie) -> t (Maybe ConfigTrie)
+adjustConfigTrie as f Nothing = fmap (freshLeaf as) <$> f Nothing
+adjustConfigTrie (a:as) f (Just (ConfigTrie x m)) = Just . ConfigTrie x <$> adjustConfigMap a as f m
+adjustConfigTrie [] f (Just (ConfigTrie x m)) = g <$> f x
+ where g Nothing | Map.null m = Nothing
+ g x' = Just (ConfigTrie x' m)
+
+adjustConfigMap :: Functor t => Int -> [Int] -> (Maybe ConfigLeaf -> t (Maybe ConfigLeaf)) -> ConfigMap -> t ConfigMap
+adjustConfigMap a as f = Map.alterF (adjustConfigTrie as f) a
+
+getOptionSettingFromText :: Int -> [Int] -> ConfigMap -> IO ()
+getOptionSettingFromText p ps = getConst . adjustConfigMap p ps f
+ where
+ f _ = Const (return ())
=====================================
testsuite/tests/simplCore/should_compile/T26589.hs
=====================================
@@ -0,0 +1,44 @@
+module T26589 ( executeTest ) where
+
+-- base
+import Data.Coerce ( coerce )
+import Data.Foldable ( foldMap )
+
+--------------------------------------------------------------------------------
+
+newtype Traversal f = Traversal { getTraversal :: f () }
+
+instance Applicative f => Semigroup (Traversal f) where
+ Traversal f1 <> Traversal f2 = Traversal $ f1 *> f2
+instance Applicative f => Monoid (Traversal f) where
+ mempty = Traversal $ pure ()
+
+newtype Seq a = Seq (FingerTree (Elem a))
+newtype Elem a = Elem { getElem :: a }
+
+data FingerTree a
+ = EmptyT
+ | Deep !a (FingerTree a) !a
+
+executeTest :: Seq () -> IO ()
+executeTest fins = destroyResources
+ where
+ destroyResources :: IO ()
+ destroyResources =
+ getTraversal $
+ flip foldMap1 fins $ \ _ ->
+ Traversal $ return ()
+
+foldMap1 :: forall m a. Monoid m => (a -> m) -> Seq a -> m
+foldMap1 = coerce (foldMap2 :: (Elem a -> m) -> FingerTree (Elem a) -> m)
+
+foldMap2 :: Monoid m => (Elem a -> m) -> FingerTree (Elem a) -> m
+foldMap2 _ EmptyT = mempty
+foldMap2 f' (Deep pr' m' sf') = f' pr' <> foldMapTree f' m' <> f' sf'
+ where
+ foldMapTree :: Monoid m => (a -> m) -> FingerTree a -> m
+ foldMapTree _ EmptyT = mempty
+ foldMapTree f (Deep pr m sf) =
+ f pr <>
+ foldMapTree f m <>
+ f sf
=====================================
testsuite/tests/simplCore/should_compile/T8331.stderr
=====================================
@@ -1,5 +1,148 @@
==================== Tidy Core rules ====================
+"SPEC $c*> @(ST s) @_"
+ forall (@s) (@r) ($dApplicative :: Applicative (ST s)).
+ $fApplicativeReaderT_$c*> @(ST s) @r $dApplicative
+ = ($fApplicativeReaderT2 @s @r)
+ `cast` (forall (a ::~ <*>_N) (b ::~ <*>_N).
+
participants (1)
-
Marge Bot (@marge-bot)