
Simon Peyton Jones pushed to branch wip/T23109a at Glasgow Haskell Compiler / GHC Commits: b10529e9 by Simon Peyton Jones at 2025-04-18T15:27:53+01:00 Comments only - - - - - e805aa2e by Simon Peyton Jones at 2025-04-18T15:35:54+01:00 Refator GHC.Core.Opt.SetLevels.notWorthFloating I refactored `notWorthFloating` while I was doing something else. I don't think there's a change in behaviour, but if so it's very much a corner case. - - - - - d6c13d5d by Simon Peyton Jones at 2025-04-18T17:16:13+01:00 Always float bottoming expressions to the top ...regardless of floatConsts - - - - - 2 changed files: - compiler/GHC/Core/Opt/SetLevels.hs - compiler/GHC/Core/Opt/Simplify/Iteration.hs Changes: ===================================== compiler/GHC/Core/Opt/SetLevels.hs ===================================== @@ -482,14 +482,14 @@ Consider this: f :: T Int -> blah f x vs = case x of { MkT y -> let f vs = ...(case y of I# w -> e)...f.. - in f vs + in f vs } Here we can float the (case y ...) out, because y is sure to be evaluated, to give f x vs = case x of { MkT y -> - case y of I# w -> + case y of { I# w -> let f vs = ...(e)...f.. - in f vs + in f vs }} That saves unboxing it every time round the loop. It's important in some DPH stuff where we really want to avoid that repeated unboxing in @@ -614,7 +614,8 @@ lvlMFE env strict_ctxt e@(_, AnnCase {}) = lvlExpr env e -- See Note [Case MFEs] lvlMFE env strict_ctxt ann_expr - | not float_me + | notWorthFloating expr abs_vars + || not float_me || floatTopLvlOnly env && not (isTopLvl dest_lvl) -- Only floating to the top level is allowed. || hasFreeJoin env fvs -- If there is a free join, don't float @@ -623,9 +624,6 @@ lvlMFE env strict_ctxt ann_expr -- We can't let-bind an expression if we don't know -- how it will be represented at runtime. -- See Note [Representation polymorphism invariants] in GHC.Core - || notWorthFloating expr abs_vars - -- Test notWorhtFloating last; - -- See Note [Large free-variable sets] = -- Don't float it out lvlExpr env ann_expr @@ -676,12 +674,11 @@ lvlMFE env strict_ctxt ann_expr is_function = isFunction ann_expr mb_bot_str = exprBotStrictness_maybe expr -- See Note [Bottoming floats] - -- esp Bottoming floats (2) + -- esp Bottoming floats (BF2) expr_ok_for_spec = exprOkForSpeculation expr abs_vars = abstractVars dest_lvl env fvs dest_lvl = destLevel env fvs fvs_ty is_function is_bot_lam - -- NB: is_bot_lam not is_bot; see (3) in - -- Note [Bottoming floats] + -- NB: is_bot_lam not is_bot; see (BF2) in Note [Bottoming floats] -- float_is_new_lam: the floated thing will be a new value lambda -- replacing, say (g (x+4)) by (lvl x). No work is saved, nor is @@ -698,20 +695,22 @@ lvlMFE env strict_ctxt ann_expr -- A decision to float entails let-binding this thing, and we only do -- that if we'll escape a value lambda, or will go to the top level. + -- Never float trivial expressions; + -- notably, save_work might be true of a lone evaluated variable. float_me = saves_work || saves_alloc || is_mk_static -- See Note [Saving work] + is_hnf = exprIsHNF expr saves_work = escapes_value_lam -- (a) - && not (exprIsHNF expr) -- (b) + && not is_hnf -- (b) && not float_is_new_lam -- (c) escapes_value_lam = dest_lvl `ltMajLvl` (le_ctxt_lvl env) - -- See Note [Saving allocation] and Note [Floating to the top] - saves_alloc = isTopLvl dest_lvl - && floatConsts env - && ( not strict_ctxt -- (a) - || exprIsHNF expr -- (b) - || (is_bot_lam && escapes_value_lam)) -- (c) + -- See Note [Floating to the top] + saves_alloc = isTopLvl dest_lvl + && ( (floatConsts env && + (not strict_ctxt || is_hnf)) -- (FT1) and (FT2) + || (is_bot_lam && escapes_value_lam)) -- (FT3) hasFreeJoin :: LevelEnv -> DVarSet -> Bool -- Has a free join point which is not being floated to top level. @@ -726,7 +725,7 @@ hasFreeJoin env fvs The key idea in let-floating is to * float a redex out of a (value) lambda Doing so can save an unbounded amount of work. -But see also Note [Saving allocation]. +But see also Note [Floating to the top]. So we definitely float an expression out if (a) It will escape a value lambda (escapes_value_lam) @@ -771,10 +770,12 @@ Wrinkles: we have saved nothing: one pair will still be allocated for each call of `f`. Hence the (not float_is_new_lam) in saves_work. -Note [Saving allocation] -~~~~~~~~~~~~~~~~~~~~~~~~ +Note [Floating to the top] +~~~~~~~~~~~~~~~~~~~~~~~~~~ Even if `saves_work` is false, we we may want to float even cheap/HNF -expressions out of value lambdas, for several reasons: +expressions out of value lambdas. Data suggests, however, that it is better +/only/ to do so, /if/ they can go to top level. If the expression goes to top +level we don't pay the cost of allocating cold-path thunks described in (SW2). * Doing so may save allocation. Consider f = \x. .. (\y.e) ... @@ -782,6 +783,11 @@ expressions out of value lambdas, for several reasons: (assuming e does not mention x). An example where this really makes a difference is simplrun009. +* In principle this would be true even if the (\y.e) didn't go to top level; but + in practice we only float a HNF if it goes all way to the top. We don't pay + /any/ allocation cost for a top-level floated expression; it just becomes + static data. + * It may allow SpecContr to fire on functions. Consider f = \x. ....(f (\y.e)).... After floating we get @@ -793,21 +799,7 @@ expressions out of value lambdas, for several reasons: a big difference for string literals and bottoming expressions: see Note [Floating to the top] -Data suggests, however, that it is better /only/ to float HNFS, /if/ they can go -to top level. See (SW2) of Note [Saving work]. If the expression goes to top -level we don't pay the cost of allocating cold-path thunks described in (SW2). - -Hence `isTopLvl dest_lvl` in `saves_alloc`. - -Note [Floating to the top] -~~~~~~~~~~~~~~~~~~~~~~~~~~ -Even though Note [Saving allocation] suggests that we should not, in -general, float HNFs, the balance change if it goes to the top: - -* We don't pay an allocation cost for the floated expression; it - just becomes static data. - -* Floating string literal is valuable -- no point in duplicating the +* Floating string literals is valuable -- no point in duplicating the at each call site! * Floating bottoming expressions is valuable: they are always cold @@ -815,32 +807,32 @@ general, float HNFs, the balance change if it goes to the top: can be quite big, inhibiting inlining. See Note [Bottoming floats] So we float an expression to the top if: - (a) the context is lazy (so we get allocation), or - (b) the expression is a HNF (so we get allocation), or - (c) the expression is bottoming and floating would escape a - value lambda (NB: if the expression itself is a lambda, (b) - will apply; so this case only catches bottoming thunks) + (FT1) the context is lazy (so we get allocation), or + (FT2) the expression is a HNF (so we get allocation), or + (FT3) the expression is bottoming and floating would escape a + value lambda (NB: if the expression itself is a lambda, (b) + will apply; so this case only catches bottoming thunks) Examples: -* (a) Strict. Case scrutinee +* (FT1) Strict. Case scrutinee f = case g True of .... Don't float (g True) to top level; then we have the admin of a top-level thunk to worry about, with zero gain. -* (a) Strict. Case alternative +* (FT1) Strict. Case alternative h = case y of True -> g True False -> False Don't float (g True) to the top level -* (b) HNF +* (FT2) HNF f = case y of True -> p:q False -> blah We may as well float the (p:q) so it becomes a static data structure. -* (c) Bottoming expressions; see also Note [Bottoming floats] +* (FT3) Bottoming expressions; see also Note [Bottoming floats] f x = case x of 0 -> error <big thing> _ -> x+1 @@ -853,7 +845,7 @@ Examples: 'foo' anyway. So float bottoming things only if they escape a lambda. -* Arguments +* (FT4) Arguments t = f (g True) Prior to Apr 22 we didn't float (g True) to the top if f was strict. But (a) this only affected CAFs, because if it escapes a value lambda @@ -868,28 +860,6 @@ early loses opportunities for RULES which (needless to say) are important in some nofib programs (gcd is an example). [SPJ note: I think this is obsolete; the flag seems always on.] -Note [Large free-variable sets] -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -In #24471 we had something like - x1 = I# 1 - ... - x1000 = I# 1000 - foo = f x1 (f x2 (f x3 ....)) -So every sub-expression in `foo` has lots and lots of free variables. But -none of these sub-expressions float anywhere; the entire float-out pass is a -no-op. - -In lvlMFE, we want to find out quickly if the MFE is not-floatable; that is -the common case. In #24471 it turned out that we were testing `abs_vars` (a -relatively complicated calculation that takes at least O(n-free-vars) time to -compute) for every sub-expression. - -Better instead to test `float_me` early. That still involves looking at -dest_lvl, which means looking at every free variable, but the constant factor -is a lot better. - -ToDo: find a way to fix the bad asymptotic complexity. - Note [Floating join point bindings] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Mostly we don't float join points at all -- we want them to /stay/ join points. @@ -1053,30 +1023,36 @@ we'd like to float the call to error, to get But, as ever, we need to be careful: -(1) We want to float a bottoming +(BF1) We want to float a bottoming expression even if it has free variables: f = \x. g (let v = h x in error ("urk" ++ v)) Then we'd like to abstract over 'x', and float the whole arg of g: lvl = \x. let v = h x in error ("urk" ++ v) f = \x. g (lvl x) - To achieve this we pass is_bot to destLevel - -(2) We do not do this for lambdas that return - bottom. Instead we treat the /body/ of such a function specially, - via point (1). For example: + To achieve this we pass `is_bot` to destLevel + +(BF2) We do the same for /lambdas/ that return bottom. + Suppose the original lambda had /no/ free vars: + f = \x. ....(\y z. error (y++z))... + then we'd like to float that whole lambda + lvl = \y z. error (y++z) + f = \x. ....lvl.... + If we just floated its bottom-valued body, we might abstract the arguments in + the "wrong" order and end up with this bad result + lvl = \z y. error (y++z) + f = \x. ....(\y z. lvl z y).... + + If the lambda does have free vars, this will happen: f = \x. ....(\y z. if x then error y else error z).... - If we float the whole lambda thus + We float the whole lambda thus lvl = \x. \y z. if x then error y else error z f = \x. ...(lvl x)... - we may well end up eta-expanding that PAP to + And we may well end up eta-expanding that PAP to + lvl = \x. \y z. if b then error y else error z f = \x. ...(\y z. lvl x y z)... + so we get a (small) closure. So be it. - ===> - lvl = \x z y. if b then error y else error z - f = \x. ...(\y z. lvl x z y)... - (There is no guarantee that we'll choose the perfect argument order.) - -(3) If we have a /binding/ that returns bottom, we want to float it to top +(BF3) If we have a /binding/ that returns bottom, we want to float it to top level, even if it has free vars (point (1)), and even it has lambdas. Example: ... let { v = \y. error (show x ++ show y) } in ... @@ -1092,7 +1068,6 @@ But, as ever, we need to be careful: join points (#24768), and floating to the top would abstract over those join points, which we should never do. - See Maessen's paper 1999 "Bottom extraction: factoring error handling out of functional programs" (unpublished I think). @@ -1135,7 +1110,6 @@ float the case (as advocated here) we won't float the (build ...y..) either, so fusion will happen. It can be a big effect, esp in some artificial benchmarks (e.g. integer, queens), but there is no perfect answer. - -} annotateBotStr :: Id -> Arity -> Maybe (Arity, DmdSig, CprSig) -> Id @@ -1152,69 +1126,124 @@ annotateBotStr id n_extra mb_bot_str = id notWorthFloating :: CoreExpr -> [Var] -> Bool --- Returns True if the expression would be replaced by --- something bigger than it is now. For example: --- abs_vars = tvars only: return True if e is trivial, --- but False for anything bigger --- abs_vars = [x] (an Id): return True for trivial, or an application (f x) --- but False for (f x x) --- --- One big goal is that floating should be idempotent. Eg if --- we replace e with (lvl79 x y) and then run FloatOut again, don't want --- to replace (lvl79 x y) with (lvl83 x y)! - +-- See Note [notWorthFloating] notWorthFloating e abs_vars - = go e (count isId abs_vars) + = go e 0 where - go (Var {}) n = n >= 0 - go (Lit lit) n = assert (n==0) $ - litIsTrivial lit -- Note [Floating literals] - go (Type {}) _ = True - go (Coercion {}) _ = True + n_abs_vars = count isId abs_vars -- See (NWF5) + + go :: CoreExpr -> Int -> Bool + -- (go e n) return True if (e x1 .. xn) is not worth floating + -- where `e` has n trivial value arguments x1..xn + -- See (NWF4) + go (Lit lit) n = assert (n==0) $ + litIsTrivial lit -- See (NWF1) + go (Type {}) _ = True + go (Tick t e) n = not (tickishIsCode t) && go e n + go (Cast e _) n = n==0 || go e n -- See (NWF3) + go (Coercion {}) _ = True go (App e arg) n - -- See Note [Floating applications to coercions] - | not (isRuntimeArg arg) = go e n - | n==0 = False - | exprIsTrivial arg = go e (n-1) -- NB: exprIsTrivial arg = go arg 0 - | otherwise = False - go (Tick t e) n = not (tickishIsCode t) && go e n - go (Cast e _) n = go e n - go (Case e b _ as) n + | Type {} <- arg = go e n -- Just types, not coercions (NWF2) + | exprIsTrivial arg = go e (n+1) + | otherwise = False -- (f non-triv) is worth floating + + go (Case e b _ as) _ + -- Do not float the `case` part of trivial cases (NWF3) + -- We'll have a look at the RHS when we get there | null as - = go e n -- See Note [Empty case is trivial] - | Just rhs <- isUnsafeEqualityCase e b as - = go rhs n -- See (U2) of Note [Implementing unsafeCoerce] in base:Unsafe.Coerce - go _ _ = False + = True -- See Note [Empty case is trivial] + | Just {} <- isUnsafeEqualityCase e b as + = True -- See (U2) of Note [Implementing unsafeCoerce] in base:Unsafe.Coerce + | otherwise + = False -{- -Note [Floating literals] -~~~~~~~~~~~~~~~~~~~~~~~~ -It's important to float Integer literals, so that they get shared, -rather than being allocated every time round the loop. -Hence the litIsTrivial. + go (Var _) n + | n==0 = True -- Naked variable + | n <= n_abs_vars = True -- (f a b c) is not worth floating if + | otherwise = False -- a,b,c are all abstracted; see (NWF5) -Ditto literal strings (LitString), which we'd like to float to top -level, which is now possible. + go _ _ = False -- Let etc is worth floating -Note [Floating applications to coercions] -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -We don’t float out variables applied only to type arguments, since the -extra binding would be pointless: type arguments are completely erased. -But *coercion* arguments aren’t (see Note [Coercion tokens] in -"GHC.CoreToStg" and Note [inlineBoringOk] in"GHC.Core.Unfold"), -so we still want to float out variables applied only to -coercion arguments. +{- Note [notWorthFloating] +~~~~~~~~~~~~~~~~~~~~~~~~~~ +`notWorthFloating` returns True if the expression would be replaced by something +bigger than it is now. One big goal is that floating should be idempotent. Eg +if we replace e with (lvl79 x y) and then run FloatOut again, don't want to +replace (lvl79 x y) with (lvl83 x y)! +For example: + abs_vars = tvars only: return True if e is trivial, + but False for anything bigger + abs_vars = [x] (an Id): return True for trivial, or an application (f x) + but False for (f x x) + +(NWF1) It's important to float Integer literals, so that they get shared, rather + than being allocated every time round the loop. Hence the litIsTrivial. + + Ditto literal strings (LitString), which we'd like to float to top + level, which is now possible. + +(NWF2) We don’t float out variables applied only to type arguments, since the + extra binding would be pointless: type arguments are completely erased. + But *coercion* arguments aren’t (see Note [Coercion tokens] in + "GHC.CoreToStg" and Note [inlineBoringOk] in"GHC.Core.Unfold"), + so we still want to float out variables applied only to + coercion arguments. + +(NWF3) Some expressions have trivial wrappers: + - Casts (e |> co) + - Unary-class applications: + - Dictionary applications (MkC meth) + - Class-op applictions (op dict) + - Case of empty alts + - Unsafe-equality case + In all these cases we say "not worth floating", and we do so /regardless/ + of the wrapped expression. The SetLevels stuff may subsequently float the + components of the expression. + + Example: is it worth floating (f x |> co)? No! If we did we'd get + lvl = f x |> co + ...lvl.... + Then we'd do cast worker/wrapper and end up with. + lvl' = f x + ...(lvl' |> co)... + Silly! Better not to float it in the first place. If we say "no" here, + we'll subsequently say "yes" for (f x) and get + lvl = f x + ....(lvl |> co)... + which is what we want. In short: don't float trivial wrappers. + +(NWF4) The only non-trivial expression that we say "not worth floating" for + is an application + f x y z + where the number of value arguments is <= the number of abstracted Ids. + This is what makes floating idempotent. Hence counting the number of + value arguments in `go` + +(NWF5) In #24471 we had something like + x1 = I# 1 + ... + x1000 = I# 1000 + foo = f x1 (f x2 (f x3 ....)) + So every sub-expression in `foo` has lots and lots of free variables. But + none of these sub-expressions float anywhere; the entire float-out pass is a + no-op. -************************************************************************ -* * -\subsection{Bindings} -* * -************************************************************************ + So `notWorthFloating` tries to avoid evaluating `n_abs_vars`, in cases where + it obviously /is/ worth floating. (In #24471 it turned out that we were + testing `abs_vars` (a relatively complicated calculation that takes at least + O(n-free-vars) time to compute) for every sub-expression.) -The binding stuff works for top level too. + Hence testing `n_abs_vars only` at the very end. -} +{- ********************************************************************* +* * + Bindings + This binding stuff works for top level too. +* * +********************************************************************* -} + lvlBind :: LevelEnv -> CoreBindWithFVs -> LvlM (LevelledBind, LevelEnv) @@ -1261,7 +1290,7 @@ lvlBind env (AnnNonRec bndr rhs) -- is_bot_lam: looks like (\xy. bot), maybe zero lams -- NB: not isBottomThunk! -- NB: not is_join: don't send bottoming join points to the top. - -- See Note [Bottoming floats] point (3) + -- See Note [Bottoming floats] (BF3) is_top_bindable = exprIsTopLevelBindable deann_rhs bndr_ty n_extra = count isId abs_vars @@ -1552,9 +1581,8 @@ destLevel env fvs fvs_ty is_function is_bot -- See Note [Floating join point bindings] = tOP_LEVEL - | is_bot -- Send bottoming bindings to the top - = as_far_as_poss -- regardless; see Note [Bottoming floats] - -- Esp Bottoming floats (1) and (3) + | is_bot -- Send bottoming bindings to the top regardless; + = as_far_as_poss -- see (BF1) and (BF2) in Note [Bottoming floats] | Just n_args <- floatLams env , n_args > 0 -- n=0 case handled uniformly by the 'otherwise' case @@ -1568,8 +1596,13 @@ destLevel env fvs fvs_ty is_function is_bot max_fv_id_level = maxFvLevel isId env fvs -- Max over Ids only; the -- tyvars will be abstracted + -- as_far_as_poss: destination level depends only on the free Ids (more + -- precisely, free CoVars) of the /type/, not the free Ids of the /term/. + -- Why worry about the free CoVars? See Note [Floating and kind casts] + -- + -- There may be free Ids in the term, but then we'll just + -- lambda-abstract over them as_far_as_poss = maxFvLevel' isId env fvs_ty - -- See Note [Floating and kind casts] {- Note [Floating and kind casts] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -1732,10 +1765,9 @@ maxFvLevel max_me env var_set -- It's OK to use a non-deterministic fold here because maxIn commutes. maxFvLevel' :: (Var -> Bool) -> LevelEnv -> TyCoVarSet -> Level --- Same but for TyCoVarSet +-- Precisely the same as `maxFvLevel` but for TyCoVarSet rather than DVarSet maxFvLevel' max_me env var_set = nonDetStrictFoldUniqSet (maxIn max_me env) tOP_LEVEL var_set - -- It's OK to use a non-deterministic fold here because maxIn commutes. maxIn :: (Var -> Bool) -> LevelEnv -> InVar -> Level -> Level maxIn max_me (LE { le_lvl_env = lvl_env, le_env = id_env }) in_var lvl ===================================== compiler/GHC/Core/Opt/Simplify/Iteration.hs ===================================== @@ -801,9 +801,9 @@ makeTrivial env top_lvl dmd occ_fs expr = return (emptyLetFloats, expr) | not (bindingOk top_lvl expr expr_ty) -- Cannot trivialise - = return (emptyLetFloats, expr) -- See Note [Cannot trivialise] + = return (emptyLetFloats, expr) -- See Note [Cannot trivialise] - | otherwise -- 'expr' is not of form (Cast e co) + | otherwise = do { (floats, expr1) <- prepareRhs env top_lvl occ_fs expr ; uniq <- getUniqueM ; let name = mkSystemVarName uniq occ_fs View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/bc27b6c9b536a8200cd2b8750e4744f... -- View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/bc27b6c9b536a8200cd2b8750e4744f... You're receiving this email because of your account on gitlab.haskell.org.