Simon Peyton Jones pushed to branch wip/T26425 at Glasgow Haskell Compiler / GHC

Commits:

5 changed files:

Changes:

  • compiler/GHC/Core/Opt/OccurAnal.hs
    ... ... @@ -664,14 +664,14 @@ through A, so it should have ManyOcc. Bear this case in mind!
    664 664
         * maps a OneOcc to OneOcc{ occ_n_br = 0 }
    
    665 665
       In our example, occ_join_points will be extended with
    
    666 666
           [j :-> [v :-> OneOcc{occ_n_br=0}]]
    
    667
    -  See addJoinPoint.
    
    667
    +  See `addJoinPoint` and (W5) below.
    
    668 668
     
    
    669 669
     * At an occurrence of a join point, we do everything as normal, but add in the
    
    670 670
       UsageDetails from the occ_join_points.  See mkOneOcc.
    
    671 671
     
    
    672 672
     * Crucially, at the NonRec binding of the join point, in `occAnalBind`, we use
    
    673
    -  `orUDs`, not `andUDs` to combine the usage from the RHS with the usage from
    
    674
    -  the body.
    
    673
    +  `combineJoinPointUDs`, not `andUDs` to combine the usage from the RHS with
    
    674
    +  the usage from the body.
    
    675 675
     
    
    676 676
     Here are the consequences
    
    677 677
     
    
    ... ... @@ -688,7 +688,7 @@ Here are the consequences
    688 688
       These are `andUDs` together in `addOccInfo`, and hence
    
    689 689
       `v` gets ManyOccs, just as it should.  Clever!
    
    690 690
     
    
    691
    -There are a couple of tricky wrinkles
    
    691
    +There are, of course, some tricky wrinkles
    
    692 692
     
    
    693 693
     (W1) Consider this example which shadows `j`:
    
    694 694
               join j = rhs in
    
    ... ... @@ -718,6 +718,8 @@ There are a couple of tricky wrinkles
    718 718
          * In `postprcess_uds`, we add the chucked-out join points to the
    
    719 719
            returned UsageDetails, with `andUDs`.
    
    720 720
     
    
    721
    +Wrinkles (W1) and (W2) are very similar to Note [Binder swap] (BS3).
    
    722
    +
    
    721 723
     (W3) Consider this example, which shadows `j`, but this time in an argument
    
    722 724
                   join j = rhs
    
    723 725
                   in f (case x of { K j -> ...; ... })
    
    ... ... @@ -734,10 +736,38 @@ There are a couple of tricky wrinkles
    734 736
     
    
    735 737
     (W4) What if the join point binding has a stable unfolding, or RULES?
    
    736 738
          They are just alternative right-hand sides, and at each call site we
    
    737
    -     will use only one of them. So again, we can use `orUDs` to combine
    
    738
    -     usage info from all these alternatives RHSs.
    
    739
    -
    
    740
    -Wrinkles (W1) and (W2) are very similar to Note [Binder swap] (BS3).
    
    739
    +     will use only one of them. So again, we can use `combineJoinPointUDs`
    
    740
    +     to combine usage info from all these alternatives RHSs.
    
    741
    +
    
    742
    +(W5) Other things being equal, we want keep the OccInfoEnv in the range of
    
    743
    +  `occ_join_points` as small as possible, because it is /duplicated/ at
    
    744
    +  /every occurrence/ of the join point.  We really only want to include
    
    745
    +  OccInfo for
    
    746
    +       * Local, non-recursive let-bound Ids
    
    747
    +       * that occur just once in the RHS of the join point
    
    748
    +  particularly including
    
    749
    +       * thunks (that's the original point) and
    
    750
    +       * join points (so that the trick works recursively).
    
    751
    +  We call these the "tracked Ids of j".
    
    752
    +
    
    753
    +  Including lambda binders is pointless, and slows down the occurrence analyser.
    
    754
    +
    
    755
    +  e.g.    \x. let y = x+1 in
    
    756
    +              join j v = ..x..y..(f z z)..
    
    757
    +              in ...
    
    758
    +  In the `occ_join_points` binding for `j`, we want to track `y`, but
    
    759
    +  not `x` (lambda bound) nor `z` (occurs many times).
    
    760
    +
    
    761
    +  To exploit this:
    
    762
    +     * `occ_local_lets` tracks which Ids are local, non-recursive lets
    
    763
    +     * `addJoinPoint` only populates j's entry with occ-info for the "tracked Ids"
    
    764
    +       of `j`; that is, that are (a) in occ_local_lets and (b) have OneOcc.
    
    765
    +     * `combineJoinPointUDs` uses
    
    766
    +          orLocalOcc  for local-let Ids
    
    767
    +          andLocalOcc for non-local-let Ids
    
    768
    +
    
    769
    +  This fancy footwork can matter in extreme cases: it gave a 25% reduction in
    
    770
    +  total compiler allocation in #26425..
    
    741 771
     
    
    742 772
     Note [Finding join points]
    
    743 773
     ~~~~~~~~~~~~~~~~~~~~~~~~~~
    
    ... ... @@ -759,62 +789,62 @@ rest of 'OccInfo' until it goes on the binder.
    759 789
     
    
    760 790
     Note [Join arity prediction based on joinRhsArity]
    
    761 791
     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    
    762
    -In general, the join arity from tail occurrences of a join point (O) may be
    
    763
    -higher or lower than the manifest join arity of the join body (M). E.g.,
    
    792
    +In general, the join arity from tail occurrences of a join point (OAr) may be
    
    793
    +higher or lower than the manifest join arity of the join body (MAr). E.g.,
    
    764 794
     
    
    765
    -  -- M > O:
    
    766
    -  let f x y = x + y              -- M = 2
    
    767
    -  in if b then f 1 else f 2      -- O = 1
    
    795
    +  -- MAr > Oar:
    
    796
    +  let f x y = x + y              -- MAr = 2
    
    797
    +  in if b then f 1 else f 2      -- OAr = 1
    
    768 798
       ==> { Contify for join arity 1 }
    
    769 799
       join f x = \y -> x + y
    
    770 800
       in if b then jump f 1 else jump f 2
    
    771 801
     
    
    772
    -  -- M < O
    
    773
    -  let f = id                     -- M = 0
    
    774
    -  in if ... then f 12 else f 13  -- O = 1
    
    802
    +  -- MAr < Oar
    
    803
    +  let f = id                     -- MAr = 0
    
    804
    +  in if ... then f 12 else f 13  -- OAr = 1
    
    775 805
       ==> { Contify for join arity 1, eta-expand f }
    
    776 806
       join f x = id x
    
    777 807
       in if b then jump f 12 else jump f 13
    
    778 808
     
    
    779
    -But for *recursive* let, it is crucial that both arities match up, consider
    
    809
    +But for *recursive* let, it is crucial MAr=OAr.  Consider:
    
    780 810
     
    
    781 811
       letrec f x y = if ... then f x else True
    
    782 812
       in f 42
    
    783 813
     
    
    784
    -Here, M=2 but O=1. If we settled for a joinrec arity of 1, the recursive jump
    
    814
    +Here, MAr=2 but OAr=1. If we settled for a joinrec arity of 1, the recursive jump
    
    785 815
     would not happen in a tail context! Contification is invalid here.
    
    786
    -So indeed it is crucial to demand that M=O.
    
    816
    +So indeed it is crucial to demand that MAr=OAr.
    
    787 817
     
    
    788
    -(Side note: Actually, we could be more specific: Let O1 be the join arity of
    
    789
    -occurrences from the letrec RHS and O2 the join arity from the let body. Then
    
    790
    -we need M=O1 and M<=O2 and could simply eta-expand the RHS to match O2 later.
    
    791
    -M=O is the specific case where we don't want to eta-expand. Neither the join
    
    818
    +(Side note: Actually, we could be more specific: Let OAr1 be the join arity of
    
    819
    +occurrences from the letrec RHS and OAr2 the join arity from the let body. Then
    
    820
    +we need MAr=OAr1 and MAr<=OAr2 and could simply eta-expand the RHS to match OAr2 later.
    
    821
    +MAr=OAr is the specific case where we don't want to eta-expand. Neither the join
    
    792 822
     points paper nor GHC does this at the moment.)
    
    793 823
     
    
    794 824
     We can capitalise on this observation and conclude that *if* f could become a
    
    795
    -joinrec (without eta-expansion), it will have join arity M.
    
    796
    -Now, M is just the result of 'joinRhsArity', a rather simple, local analysis.
    
    825
    +joinrec (without eta-expansion), it will have join arity MAr.
    
    826
    +Now, MAr is just the result of 'joinRhsArity', a rather simple, local analysis.
    
    797 827
     It is also the join arity inside the 'TailUsageDetails' returned by
    
    798 828
     'occAnalLamTail', so we can predict join arity without doing any fixed-point
    
    799 829
     iteration or really doing any deep traversal of let body or RHS at all.
    
    800
    -We check for M in the 'adjustTailUsage' call inside 'tagRecBinders'.
    
    830
    +We check for MAr in the 'adjustTailUsage' call inside 'tagRecBinders'.
    
    801 831
     
    
    802 832
     All this is quite apparent if you look at the contification transformation in
    
    803 833
     Fig. 5 of "Compiling without Continuations" (which does not account for
    
    804 834
     eta-expansion at all, mind you). The letrec case looks like this
    
    805
    -
    
    835
    +n
    
    806 836
       letrec f = /\as.\xs. L[us] in L'[es]
    
    807 837
         ... and a bunch of conditions establishing that f only occurs
    
    808 838
             in app heads of join arity (len as + len xs) inside us and es ...
    
    809 839
     
    
    810
    -The syntactic form `/\as.\xs. L[us]` forces M=O iff `f` occurs in `us`. However,
    
    840
    +The syntactic form `/\as.\xs. L[us]` forces MAr=OAr iff `f` occurs in `us`. However,
    
    811 841
     for non-recursive functions, this is the definition of contification from the
    
    812 842
     paper:
    
    813 843
     
    
    814 844
       let f = /\as.\xs.u in L[es]     ... conditions ...
    
    815 845
     
    
    816
    -Note that u could be a lambda itself, as we have seen. No relationship between M
    
    817
    -and O to exploit here.
    
    846
    +Note that u could be a lambda itself, as we have seen. No relationship between MAr
    
    847
    +and OAr to exploit here.
    
    818 848
     
    
    819 849
     Note [Join points and unfoldings/rules]
    
    820 850
     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    
    ... ... @@ -992,23 +1022,29 @@ occAnalBind !env lvl ire (NonRec bndr rhs) thing_inside combine
    992 1022
       = -- Analyse the RHS and /then/ the body
    
    993 1023
         let -- Analyse the rhs first, generating rhs_uds
    
    994 1024
             !(rhs_uds_s, bndr', rhs') = occAnalNonRecRhs env lvl ire mb_join bndr rhs
    
    995
    -        rhs_uds = foldl1' orUDs rhs_uds_s   -- NB: orUDs.  See (W4) of
    
    996
    -                                           -- Note [Occurrence analysis for join points]
    
    1025
    +        rhs_uds = foldl1' (combineJoinPointUDs env)
    
    1026
    +                          rhs_uds_s   -- NB: combineJoinPointUDs.  See (W4) of
    
    1027
    +                                      -- Note [Occurrence analysis for join points]
    
    997 1028
     
    
    998 1029
             -- Now analyse the body, adding the join point
    
    999 1030
             -- into the environment with addJoinPoint
    
    1000
    -        !(WUD body_uds (occ, body)) = occAnalNonRecBody env bndr' $ \env ->
    
    1031
    +        env_body = addLocalLet env lvl bndr
    
    1032
    +        !(WUD body_uds (occ, body)) = occAnalNonRecBody env_body bndr' $ \env ->
    
    1001 1033
                                           thing_inside (addJoinPoint env bndr' rhs_uds)
    
    1002 1034
         in
    
    1003 1035
         if isDeadOcc occ     -- Drop dead code; see Note [Dead code]
    
    1004 1036
         then WUD body_uds body
    
    1005
    -    else WUD (rhs_uds `orUDs` body_uds)    -- Note `orUDs`
    
    1037
    +    else -- pprTrace "occAnal-nonrec" (vcat [ ppr bndr <+> ppr occ
    
    1038
    +         --                                , text "rhs_uds" <+> ppr rhs_uds
    
    1039
    +         --                                , text "body_uds" <+> ppr body_uds ]) $
    
    1040
    +         WUD (combineJoinPointUDs env rhs_uds body_uds)    -- Note `orUDs`
    
    1006 1041
                  (combine [NonRec (fst (tagNonRecBinder lvl occ bndr')) rhs']
    
    1007 1042
                           body)
    
    1008 1043
     
    
    1009 1044
       -- The normal case, including newly-discovered join points
    
    1010 1045
       -- Analyse the body and /then/ the RHS
    
    1011
    -  | WUD body_uds (occ,body) <- occAnalNonRecBody env bndr thing_inside
    
    1046
    +  | let env_body = addLocalLet env lvl bndr
    
    1047
    +  , WUD body_uds (occ,body) <- occAnalNonRecBody env_body bndr thing_inside
    
    1012 1048
       = if isDeadOcc occ   -- Drop dead code; see Note [Dead code]
    
    1013 1049
         then WUD body_uds body
    
    1014 1050
         else let
    
    ... ... @@ -1054,7 +1090,7 @@ occAnalNonRecRhs !env lvl imp_rule_edges mb_join bndr rhs
    1054 1090
         rhs_ctxt = mkNonRecRhsCtxt lvl bndr unf
    
    1055 1091
     
    
    1056 1092
         -- See Note [Join arity prediction based on joinRhsArity]
    
    1057
    -    -- Match join arity O from mb_join_arity with manifest join arity M as
    
    1093
    +    -- Match join arity OAr from mb_join_arity with manifest join arity MAr as
    
    1058 1094
         -- returned by of occAnalLamTail. It's totally OK for them to mismatch;
    
    1059 1095
         -- hence adjust the UDs from the RHS
    
    1060 1096
     
    
    ... ... @@ -1764,7 +1800,7 @@ makeNode !env imp_rule_edges bndr_set (bndr, rhs)
    1764 1800
                                    -- here because that is what we are setting!
    
    1765 1801
         WTUD unf_tuds unf' = occAnalUnfolding rhs_env unf
    
    1766 1802
         adj_unf_uds = adjustTailArity (JoinPoint rhs_ja) unf_tuds
    
    1767
    -      -- `rhs_ja` is `joinRhsArity rhs` and is the prediction for source M
    
    1803
    +      -- `rhs_ja` is `joinRhsArity rhs` and is the prediction for source MAr
    
    1768 1804
           -- of Note [Join arity prediction based on joinRhsArity]
    
    1769 1805
     
    
    1770 1806
         --------- IMP-RULES --------
    
    ... ... @@ -1775,7 +1811,7 @@ makeNode !env imp_rule_edges bndr_set (bndr, rhs)
    1775 1811
     
    
    1776 1812
         --------- All rules --------
    
    1777 1813
         -- See Note [Join points and unfoldings/rules]
    
    1778
    -    -- `rhs_ja` is `joinRhsArity rhs'` and is the prediction for source M
    
    1814
    +    -- `rhs_ja` is `joinRhsArity rhs'` and is the prediction for source MAr
    
    1779 1815
         -- of Note [Join arity prediction based on joinRhsArity]
    
    1780 1816
         rules_w_uds :: [(CoreRule, UsageDetails, UsageDetails)]
    
    1781 1817
         rules_w_uds = [ (r,l,adjustTailArity (JoinPoint rhs_ja) rhs_wuds)
    
    ... ... @@ -2177,7 +2213,9 @@ occAnalLamTail :: OccEnv -> CoreExpr -> WithTailUsageDetails CoreExpr
    2177 2213
     -- See Note [Adjusting right-hand sides]
    
    2178 2214
     occAnalLamTail env expr
    
    2179 2215
       = let !(WUD usage expr') = occ_anal_lam_tail env expr
    
    2180
    -    in WTUD (TUD (joinRhsArity expr) usage) expr'
    
    2216
    +    in WTUD (TUD (joinRhsArity expr') usage) expr'
    
    2217
    +       -- If expr looks like (\x. let dead = e in \y. blah), where `dead` is dead
    
    2218
    +       -- then joinRhsArity expr' might exceed joinRhsArity expr
    
    2181 2219
     
    
    2182 2220
     occ_anal_lam_tail :: OccEnv -> CoreExpr -> WithUsageDetails CoreExpr
    
    2183 2221
     -- Does not markInsideLam etc for the outmost batch of lambdas
    
    ... ... @@ -2598,7 +2636,7 @@ occAnalArgs :: OccEnv -> CoreExpr -> [CoreExpr]
    2598 2636
                 -> WithUsageDetails CoreExpr
    
    2599 2637
     -- The `fun` argument is just an accumulating parameter,
    
    2600 2638
     -- the base for building the application we return
    
    2601
    -occAnalArgs !env fun args !one_shots
    
    2639
    +occAnalArgs env fun args one_shots
    
    2602 2640
       = go emptyDetails fun args one_shots
    
    2603 2641
       where
    
    2604 2642
         env_args = setNonTailCtxt encl env
    
    ... ... @@ -2657,8 +2695,19 @@ Constructors are rather like lambdas in this way.
    2657 2695
     occAnalApp :: OccEnv
    
    2658 2696
                -> (Expr CoreBndr, [Arg CoreBndr], [CoreTickish])
    
    2659 2697
                -> WithUsageDetails (Expr CoreBndr)
    
    2660
    --- Naked variables (not applied) end up here too
    
    2661
    -occAnalApp !env (Var fun, args, ticks)
    
    2698
    +occAnalApp !env (Var fun_id, [], ticks)
    
    2699
    +  = -- Naked variables (not applied) end up here too, and it's worth giving
    
    2700
    +    -- this common case special treatment, because there is so much less to do.
    
    2701
    +    -- This is just a specialised copy of the (Var fun_id) case below
    
    2702
    +    WUD fun_uds (mkTicks ticks fun')
    
    2703
    +  where
    
    2704
    +    !(fun', fun_id')  = lookupBndrSwap env fun_id
    
    2705
    +    !fun_uds = mkOneOcc env fun_id' int_cxt 0
    
    2706
    +    !int_cxt = case occ_encl env of
    
    2707
    +                   OccScrut -> IsInteresting
    
    2708
    +                   _other   -> NotInteresting
    
    2709
    +
    
    2710
    +occAnalApp env (Var fun, args, ticks)
    
    2662 2711
       -- Account for join arity of runRW# continuation
    
    2663 2712
       -- See Note [Simplification of runRW#]
    
    2664 2713
       --
    
    ... ... @@ -2863,7 +2912,11 @@ data OccEnv
    2863 2912
                  -- Invariant: no Id maps to an empty OccInfoEnv
    
    2864 2913
                  -- See Note [Occurrence analysis for join points]
    
    2865 2914
                , occ_join_points :: !JoinPointInfo
    
    2866
    -    }
    
    2915
    +
    
    2916
    +           , occ_local_lets :: IdSet    -- Non-top-level non-rec-bound lets
    
    2917
    +                -- I tried making this field strict, but
    
    2918
    +                -- doing so slightly increased allocation
    
    2919
    +           }
    
    2867 2920
     
    
    2868 2921
     type JoinPointInfo = IdEnv OccInfoEnv
    
    2869 2922
     
    
    ... ... @@ -2914,7 +2967,8 @@ initOccEnv
    2914 2967
     
    
    2915 2968
                , occ_join_points = emptyVarEnv
    
    2916 2969
                , occ_bs_env = emptyVarEnv
    
    2917
    -           , occ_bs_rng = emptyVarSet }
    
    2970
    +           , occ_bs_rng = emptyVarSet
    
    2971
    +           , occ_local_lets = emptyVarSet }
    
    2918 2972
     
    
    2919 2973
     noBinderSwaps :: OccEnv -> Bool
    
    2920 2974
     noBinderSwaps (OccEnv { occ_bs_env = bs_env }) = isEmptyVarEnv bs_env
    
    ... ... @@ -3154,23 +3208,26 @@ postprocess_uds bndrs bad_joins uds
    3154 3208
           | uniq `elemVarEnvByKey` env = plusVarEnv_C andLocalOcc env join_env
    
    3155 3209
           | otherwise                  = env
    
    3156 3210
     
    
    3211
    +addLocalLet :: OccEnv -> TopLevelFlag -> Id -> OccEnv
    
    3212
    +addLocalLet env@(OccEnv { occ_local_lets = ids }) top_lvl id
    
    3213
    +  | isTopLevel top_lvl = env
    
    3214
    +  | otherwise          = env { occ_local_lets = ids `extendVarSet` id }
    
    3215
    +
    
    3157 3216
     addJoinPoint :: OccEnv -> Id -> UsageDetails -> OccEnv
    
    3158
    -addJoinPoint env bndr rhs_uds
    
    3217
    +addJoinPoint env@(OccEnv { occ_join_points = join_points, occ_local_lets = local_lets })
    
    3218
    +             join_bndr (UD { ud_env = rhs_occs })
    
    3159 3219
       | isEmptyVarEnv zeroed_form
    
    3160 3220
       = env
    
    3161 3221
       | otherwise
    
    3162
    -  = env { occ_join_points = extendVarEnv (occ_join_points env) bndr zeroed_form }
    
    3222
    +  = env { occ_join_points = extendVarEnv join_points join_bndr zeroed_form }
    
    3163 3223
       where
    
    3164
    -    zeroed_form = mkZeroedForm rhs_uds
    
    3224
    +    zeroed_form = mapMaybeUniqSetToUFM do_one local_lets
    
    3225
    +     -- See Note [Occurrence analysis for join points] for "zeroed form"
    
    3165 3226
     
    
    3166
    -mkZeroedForm :: UsageDetails -> OccInfoEnv
    
    3167
    --- See Note [Occurrence analysis for join points] for "zeroed form"
    
    3168
    -mkZeroedForm (UD { ud_env = rhs_occs })
    
    3169
    -  = mapMaybeUFM do_one rhs_occs
    
    3170
    -  where
    
    3171
    -    do_one :: LocalOcc -> Maybe LocalOcc
    
    3172
    -    do_one (ManyOccL {})    = Nothing
    
    3173
    -    do_one occ@(OneOccL {}) = Just (occ { lo_n_br = 0 })
    
    3227
    +    do_one :: Var -> Maybe LocalOcc
    
    3228
    +    do_one bndr = case lookupVarEnv rhs_occs bndr of
    
    3229
    +                    Just occ@(OneOccL {}) -> Just (occ { lo_n_br = 0 })
    
    3230
    +                    _                     -> Nothing
    
    3174 3231
     
    
    3175 3232
     --------------------
    
    3176 3233
     transClosureFV :: VarEnv VarSet -> VarEnv VarSet
    
    ... ... @@ -3628,7 +3685,12 @@ data LocalOcc -- See Note [LocalOcc]
    3628 3685
                        -- Combining (AlwaysTailCalled 2) and (AlwaysTailCalled 3)
    
    3629 3686
                        -- gives NoTailCallInfo
    
    3630 3687
                   , lo_int_cxt :: !InterestingCxt }
    
    3688
    +
    
    3631 3689
         | ManyOccL !TailCallInfo
    
    3690
    +       -- Why do we need TailCallInfo on ManyOccL?
    
    3691
    +       -- Answer: recursive bindings are entered many times:
    
    3692
    +       --    rec { j x = ...j x'... } in j y
    
    3693
    +       -- See the uses of `andUDs` in `tagRecBinders`
    
    3632 3694
     
    
    3633 3695
     instance Outputable LocalOcc where
    
    3634 3696
       ppr (OneOccL { lo_n_br = n, lo_tail = tci })
    
    ... ... @@ -3663,7 +3725,7 @@ instance Outputable UsageDetails where
    3663 3725
     -- | TailUsageDetails captures the result of applying 'occAnalLamTail'
    
    3664 3726
     --   to a function `\xyz.body`. The TailUsageDetails pairs together
    
    3665 3727
     --   * the number of lambdas (including type lambdas: a JoinArity)
    
    3666
    ---   * UsageDetails for the `body` of the lambda, unadjusted by `adjustTailUsage`.
    
    3728
    +--   * UsageDetails for the `body` of the lambda, /unadjusted/ by `adjustTailUsage`.
    
    3667 3729
     -- If the binding turns out to be a join point with the indicated join
    
    3668 3730
     -- arity, this unadjusted usage details is just what we need; otherwise we
    
    3669 3731
     -- need to discard tail calls. That's what `adjustTailUsage` does.
    
    ... ... @@ -3681,8 +3743,17 @@ data WithTailUsageDetails a = WTUD !TailUsageDetails !a
    3681 3743
     
    
    3682 3744
     andUDs:: UsageDetails -> UsageDetails -> UsageDetails
    
    3683 3745
     orUDs :: UsageDetails -> UsageDetails -> UsageDetails
    
    3684
    -andUDs = combineUsageDetailsWith andLocalOcc
    
    3685
    -orUDs  = combineUsageDetailsWith orLocalOcc
    
    3746
    +andUDs = combineUsageDetailsWith (\_uniq -> andLocalOcc)
    
    3747
    +orUDs  = combineUsageDetailsWith (\_uniq -> orLocalOcc)
    
    3748
    +
    
    3749
    +combineJoinPointUDs :: OccEnv -> UsageDetails -> UsageDetails -> UsageDetails
    
    3750
    +-- See (W5) in Note [Occurrence analysis for join points]
    
    3751
    +combineJoinPointUDs (OccEnv { occ_local_lets = local_lets }) uds1 uds2
    
    3752
    +  = combineUsageDetailsWith combine uds1 uds2
    
    3753
    +  where
    
    3754
    +    combine uniq occ1 occ2
    
    3755
    +      | uniq `elemVarSetByKey` local_lets = orLocalOcc  occ1 occ2
    
    3756
    +      | otherwise                         = andLocalOcc occ1 occ2
    
    3686 3757
     
    
    3687 3758
     mkOneOcc :: OccEnv -> Id -> InterestingCxt -> JoinArity -> UsageDetails
    
    3688 3759
     mkOneOcc !env id int_cxt arity
    
    ... ... @@ -3699,7 +3770,8 @@ mkOneOcc !env id int_cxt arity
    3699 3770
       = mkSimpleDetails (unitVarEnv id occ)
    
    3700 3771
     
    
    3701 3772
       where
    
    3702
    -    occ = OneOccL { lo_n_br = 1, lo_int_cxt = int_cxt
    
    3773
    +    occ = OneOccL { lo_n_br = 1
    
    3774
    +                  , lo_int_cxt = int_cxt
    
    3703 3775
                       , lo_tail = AlwaysTailCalled arity }
    
    3704 3776
     
    
    3705 3777
     -- Add several occurrences, assumed not to be tail calls
    
    ... ... @@ -3786,7 +3858,7 @@ restrictFreeVars bndrs fvs = restrictUniqSetToUFM bndrs fvs
    3786 3858
     -------------------
    
    3787 3859
     -- Auxiliary functions for UsageDetails implementation
    
    3788 3860
     
    
    3789
    -combineUsageDetailsWith :: (LocalOcc -> LocalOcc -> LocalOcc)
    
    3861
    +combineUsageDetailsWith :: (Unique -> LocalOcc -> LocalOcc -> LocalOcc)
    
    3790 3862
                             -> UsageDetails -> UsageDetails -> UsageDetails
    
    3791 3863
     {-# INLINE combineUsageDetailsWith #-}
    
    3792 3864
     combineUsageDetailsWith plus_occ_info
    
    ... ... @@ -3796,9 +3868,9 @@ combineUsageDetailsWith plus_occ_info
    3796 3868
       | isEmptyVarEnv env2 = uds1
    
    3797 3869
       | otherwise
    
    3798 3870
       -- See Note [Strictness in the occurrence analyser]
    
    3799
    -  -- Using strictPlusVarEnv here speeds up the test T26425 by about 10% by avoiding
    
    3800
    -  -- intermediate thunks.
    
    3801
    -  = UD { ud_env       = strictPlusVarEnv_C plus_occ_info env1 env2
    
    3871
    +  -- Using strictPlusVarEnv here speeds up the test T26425
    
    3872
    +  -- by about 10% by avoiding intermediate thunks.
    
    3873
    +  = UD { ud_env       = strictPlusVarEnv_C_Directly plus_occ_info env1 env2
    
    3802 3874
            , ud_z_many    = strictPlusVarEnv z_many1   z_many2
    
    3803 3875
            , ud_z_in_lam  = plusVarEnv z_in_lam1 z_in_lam2
    
    3804 3876
            , ud_z_tail    = strictPlusVarEnv z_tail1   z_tail2 }
    
    ... ... @@ -3842,8 +3914,6 @@ lookupOccInfoByUnique (UD { ud_env = env
    3842 3914
             | uniq `elemVarEnvByKey` z_tail = NoTailCallInfo
    
    3843 3915
             | otherwise                     = ti
    
    3844 3916
     
    
    3845
    -
    
    3846
    -
    
    3847 3917
     -------------------
    
    3848 3918
     -- See Note [Adjusting right-hand sides]
    
    3849 3919
     
    
    ... ... @@ -3853,21 +3923,22 @@ adjustNonRecRhs :: JoinPointHood
    3853 3923
     -- ^ This function concentrates shared logic between occAnalNonRecBind and the
    
    3854 3924
     -- AcyclicSCC case of occAnalRec.
    
    3855 3925
     -- It returns the adjusted rhs UsageDetails combined with the body usage
    
    3856
    -adjustNonRecRhs mb_join_arity rhs_wuds@(WTUD _ rhs)
    
    3857
    -  = WUD (adjustTailUsage mb_join_arity rhs_wuds) rhs
    
    3858
    -
    
    3926
    +adjustNonRecRhs mb_join_arity (WTUD (TUD rhs_ja uds) rhs)
    
    3927
    +  = WUD (adjustTailUsage exact_join rhs uds) rhs
    
    3928
    +  where
    
    3929
    +    exact_join = mb_join_arity == JoinPoint rhs_ja
    
    3859 3930
     
    
    3860
    -adjustTailUsage :: JoinPointHood
    
    3861
    -                -> WithTailUsageDetails CoreExpr    -- Rhs usage, AFTER occAnalLamTail
    
    3931
    +adjustTailUsage :: Bool        -- True <=> Exactly-matching join point; don't do markNonTail
    
    3932
    +                -> CoreExpr    -- Rhs usage, AFTER occAnalLamTail
    
    3862 3933
                     -> UsageDetails
    
    3863
    -adjustTailUsage mb_join_arity (WTUD (TUD rhs_ja uds) rhs)
    
    3934
    +                -> UsageDetails
    
    3935
    +adjustTailUsage exact_join rhs uds
    
    3864 3936
       = -- c.f. occAnal (Lam {})
    
    3865 3937
         markAllInsideLamIf (not one_shot) $
    
    3866 3938
         markAllNonTailIf (not exact_join) $
    
    3867 3939
         uds
    
    3868 3940
       where
    
    3869 3941
         one_shot   = isOneShotFun rhs
    
    3870
    -    exact_join = mb_join_arity == JoinPoint rhs_ja
    
    3871 3942
     
    
    3872 3943
     adjustTailArity :: JoinPointHood -> TailUsageDetails -> UsageDetails
    
    3873 3944
     adjustTailArity mb_rhs_ja (TUD ja usage)
    
    ... ... @@ -3914,8 +3985,9 @@ tagNonRecBinder lvl occ bndr
    3914 3985
     tagRecBinders :: TopLevelFlag           -- At top level?
    
    3915 3986
                   -> UsageDetails           -- Of body of let ONLY
    
    3916 3987
                   -> [NodeDetails]
    
    3917
    -              -> WithUsageDetails       -- Adjusted details for whole scope,
    
    3918
    -                                        -- with binders removed
    
    3988
    +              -> WithUsageDetails       -- Adjusted details for whole scope
    
    3989
    +                                        -- still including the binders;
    
    3990
    +                                        -- (they are removed by `addInScope`)
    
    3919 3991
                       [IdWithOccInfo]       -- Tagged binders
    
    3920 3992
     -- Substantially more complicated than non-recursive case. Need to adjust RHS
    
    3921 3993
     -- details *before* tagging binders (because the tags depend on the RHSes).
    
    ... ... @@ -3925,32 +3997,21 @@ tagRecBinders lvl body_uds details_s
    3925 3997
     
    
    3926 3998
          -- 1. See Note [Join arity prediction based on joinRhsArity]
    
    3927 3999
          --    Determine possible join-point-hood of whole group, by testing for
    
    3928
    -     --    manifest join arity M.
    
    3929
    -     --    This (re-)asserts that makeNode had made tuds for that same arity M!
    
    4000
    +     --    manifest join arity MAr.
    
    4001
    +     --    This (re-)asserts that makeNode had made tuds for that same arity MAr!
    
    3930 4002
          unadj_uds = foldr (andUDs . test_manifest_arity) body_uds details_s
    
    3931
    -     test_manifest_arity ND{nd_rhs = WTUD tuds rhs}
    
    3932
    -       = adjustTailArity (JoinPoint (joinRhsArity rhs)) tuds
    
    4003
    +     test_manifest_arity ND{nd_rhs = WTUD (TUD rhs_ja uds) rhs}
    
    4004
    +       = assertPpr (rhs_ja == joinRhsArity rhs) (ppr rhs_ja $$ ppr uds $$ ppr rhs) $
    
    4005
    +         uds
    
    3933 4006
     
    
    4007
    +     will_be_joins :: Bool
    
    3934 4008
          will_be_joins = decideRecJoinPointHood lvl unadj_uds bndrs
    
    3935 4009
     
    
    3936
    -     mb_join_arity :: Id -> JoinPointHood
    
    3937
    -     -- mb_join_arity: See Note [Join arity prediction based on joinRhsArity]
    
    3938
    -     -- This is the source O
    
    3939
    -     mb_join_arity bndr
    
    3940
    -         -- Can't use willBeJoinId_maybe here because we haven't tagged
    
    3941
    -         -- the binder yet (the tag depends on these adjustments!)
    
    3942
    -       | will_be_joins
    
    3943
    -       , AlwaysTailCalled arity <- lookupTailCallInfo unadj_uds bndr
    
    3944
    -       = JoinPoint arity
    
    3945
    -       | otherwise
    
    3946
    -       = assert (not will_be_joins) -- Should be AlwaysTailCalled if
    
    3947
    -         NotJoinPoint               -- we are making join points!
    
    3948
    -
    
    3949 4010
          -- 2. Adjust usage details of each RHS, taking into account the
    
    3950 4011
          --    join-point-hood decision
    
    3951
    -     rhs_udss' = [ adjustTailUsage (mb_join_arity bndr) rhs_wuds
    
    4012
    +     rhs_udss' = [ adjustTailUsage will_be_joins rhs rhs_uds
    
    3952 4013
                          -- Matching occAnalLamTail in makeNode
    
    3953
    -                 | ND { nd_bndr = bndr, nd_rhs = rhs_wuds } <- details_s ]
    
    4014
    +                 | ND { nd_rhs = WTUD (TUD _ rhs_uds) rhs } <- details_s ]
    
    3954 4015
     
    
    3955 4016
          -- 3. Compute final usage details from adjusted RHS details
    
    3956 4017
          adj_uds = foldr andUDs body_uds rhs_udss'
    
    ... ... @@ -3969,9 +4030,9 @@ setBinderOcc occ_info bndr
    3969 4030
       | otherwise                  = setIdOccInfo bndr occ_info
    
    3970 4031
     
    
    3971 4032
     -- | Decide whether some bindings should be made into join points or not, based
    
    3972
    --- on its occurrences. This is
    
    4033
    +-- on its occurrences.
    
    3973 4034
     -- Returns `False` if they can't be join points. Note that it's an
    
    4035
    +-- all-or-nothing decision: if multiple binders are given, they are
    
    3974 4036
     -- assumed to be mutually recursive.
    
    3975 4037
     --
    
    3976 4038
     -- It must, however, be a final decision. If we say `True` for 'f',
    

  • compiler/GHC/Core/Opt/Simplify/Iteration.hs
    ... ... @@ -4595,13 +4595,21 @@ mkLetUnfolding :: SimplEnv -> TopLevelFlag -> UnfoldingSource
    4595 4595
                    -> InId -> Bool    -- True <=> this is a join point
    
    4596 4596
                    -> OutExpr -> SimplM Unfolding
    
    4597 4597
     mkLetUnfolding env top_lvl src id is_join new_rhs
    
    4598
    -  = return (mkUnfolding uf_opts src is_top_lvl is_bottoming is_join new_rhs Nothing)
    
    4599
    -            -- We make an  unfolding *even for loop-breakers*.
    
    4600
    -            -- Reason: (a) It might be useful to know that they are WHNF
    
    4601
    -            --         (b) In GHC.Iface.Tidy we currently assume that, if we want to
    
    4602
    -            --             expose the unfolding then indeed we *have* an unfolding
    
    4603
    -            --             to expose.  (We could instead use the RHS, but currently
    
    4604
    -            --             we don't.)  The simple thing is always to have one.
    
    4598
    +--  | is_join
    
    4599
    +--  , UnfNever <- guidance
    
    4600
    +--  = -- For large join points, don't keep an unfolding at all if it is large
    
    4601
    +--    -- This is just an attempt to keep residency under control in
    
    4602
    +--    -- deeply-nested join-point such as those arising in #26425
    
    4603
    +--    return NoUnfolding
    
    4604
    +
    
    4605
    +  | otherwise
    
    4606
    +  = return (mkCoreUnfolding src is_top_lvl new_rhs Nothing guidance)
    
    4607
    +    -- We make an  unfolding *even for loop-breakers*.
    
    4608
    +    -- Reason: (a) It might be useful to know that they are WHNF
    
    4609
    +    --         (b) In GHC.Iface.Tidy we currently assume that, if we want to
    
    4610
    +    --             expose the unfolding then indeed we *have* an unfolding
    
    4611
    +    --             to expose.  (We could instead use the RHS, but currently
    
    4612
    +    --             we don't.)  The simple thing is always to have one.
    
    4605 4613
       where
    
    4606 4614
         -- !opts: otherwise, we end up retaining all the SimpleEnv
    
    4607 4615
         !uf_opts = seUnfoldingOpts env
    
    ... ... @@ -4612,6 +4620,9 @@ mkLetUnfolding env top_lvl src id is_join new_rhs
    4612 4620
         -- See Note [Force bottoming field]
    
    4613 4621
         !is_bottoming = isDeadEndId id
    
    4614 4622
     
    
    4623
    +    is_top_bottoming = is_top_lvl && is_bottoming
    
    4624
    +    guidance         = calcUnfoldingGuidance uf_opts is_top_bottoming is_join new_rhs
    
    4625
    +
    
    4615 4626
     -------------------
    
    4616 4627
     simplStableUnfolding :: SimplEnv -> BindContext
    
    4617 4628
                          -> InId
    

  • compiler/GHC/Types/Unique/FM.hs
    ... ... @@ -53,7 +53,7 @@ module GHC.Types.Unique.FM (
    53 53
             plusUFM,
    
    54 54
             strictPlusUFM,
    
    55 55
             plusUFM_C,
    
    56
    -        strictPlusUFM_C,
    
    56
    +        strictPlusUFM_C, strictPlusUFM_C_Directly,
    
    57 57
             plusUFM_CD,
    
    58 58
             plusUFM_CD2,
    
    59 59
             mergeUFM,
    
    ... ... @@ -281,6 +281,9 @@ plusUFM_C f (UFM x) (UFM y) = UFM (M.unionWith f x y)
    281 281
     strictPlusUFM_C :: (elt -> elt -> elt) -> UniqFM key elt -> UniqFM key elt -> UniqFM key elt
    
    282 282
     strictPlusUFM_C f (UFM x) (UFM y) = UFM (MS.unionWith f x y)
    
    283 283
     
    
    284
    +strictPlusUFM_C_Directly :: (Unique -> elt -> elt -> elt) -> UniqFM key elt -> UniqFM key elt -> UniqFM key elt
    
    285
    +strictPlusUFM_C_Directly f (UFM x) (UFM y) = UFM (MS.unionWithKey (f . mkUniqueGrimily) x y)
    
    286
    +
    
    284 287
     -- | `plusUFM_CD f m1 d1 m2 d2` merges the maps using `f` as the
    
    285 288
     -- combinding function and `d1` resp. `d2` as the default value if
    
    286 289
     -- there is no entry in `m1` reps. `m2`. The domain is the union of
    

  • compiler/GHC/Types/Unique/Set.hs
    ... ... @@ -40,6 +40,7 @@ module GHC.Types.Unique.Set (
    40 40
             lookupUniqSet_Directly,
    
    41 41
             partitionUniqSet,
    
    42 42
             mapUniqSet,
    
    43
    +        mapUniqSetToUFM, mapMaybeUniqSetToUFM,
    
    43 44
             unsafeUFMToUniqSet,
    
    44 45
             nonDetEltsUniqSet,
    
    45 46
             nonDetKeysUniqSet,
    
    ... ... @@ -211,6 +212,14 @@ mapUniqSet f = mkUniqSet . map f . nonDetEltsUniqSet
    211 212
     mapMaybeUniqSet_sameUnique :: (a -> Maybe b) -> UniqSet a -> UniqSet b
    
    212 213
     mapMaybeUniqSet_sameUnique f (UniqSet a) = UniqSet $ mapMaybeUFM_sameUnique f a
    
    213 214
     
    
    215
    +mapUniqSetToUFM :: (a -> b) -> UniqSet a -> UniqFM a b
    
    216
    +-- Same keys, new values
    
    217
    +mapUniqSetToUFM f (UniqSet ufm) = mapUFM f ufm
    
    218
    +
    
    219
    +mapMaybeUniqSetToUFM :: (a -> Maybe b) -> UniqSet a -> UniqFM a b
    
    220
    +-- Same keys, new values
    
    221
    +mapMaybeUniqSetToUFM f (UniqSet ufm) = mapMaybeUFM f ufm
    
    222
    +
    
    214 223
     -- Two 'UniqSet's are considered equal if they contain the same
    
    215 224
     -- uniques.
    
    216 225
     instance Eq (UniqSet a) where
    

  • compiler/GHC/Types/Var/Env.hs
    ... ... @@ -12,7 +12,8 @@ module GHC.Types.Var.Env (
    12 12
             elemVarEnv, disjointVarEnv, anyVarEnv,
    
    13 13
             extendVarEnv, extendVarEnv_C, extendVarEnv_Acc,
    
    14 14
             extendVarEnvList,
    
    15
    -        strictPlusVarEnv, plusVarEnv, plusVarEnv_C, strictPlusVarEnv_C,
    
    15
    +        strictPlusVarEnv, plusVarEnv, plusVarEnv_C,
    
    16
    +        strictPlusVarEnv_C, strictPlusVarEnv_C_Directly,
    
    16 17
             plusVarEnv_CD, plusMaybeVarEnv_C,
    
    17 18
             plusVarEnvList, alterVarEnv,
    
    18 19
             delVarEnvList, delVarEnv,
    
    ... ... @@ -525,6 +526,7 @@ delVarEnv :: VarEnv a -> Var -> VarEnv a
    525 526
     minusVarEnv       :: VarEnv a -> VarEnv b -> VarEnv a
    
    526 527
     plusVarEnv_C      :: (a -> a -> a) -> VarEnv a -> VarEnv a -> VarEnv a
    
    527 528
     strictPlusVarEnv_C :: (a -> a -> a) -> VarEnv a -> VarEnv a -> VarEnv a
    
    529
    +strictPlusVarEnv_C_Directly :: (Unique -> a -> a -> a) -> VarEnv a -> VarEnv a -> VarEnv a
    
    528 530
     plusVarEnv_CD     :: (a -> a -> a) -> VarEnv a -> a -> VarEnv a -> a -> VarEnv a
    
    529 531
     plusMaybeVarEnv_C :: (a -> a -> Maybe a) -> VarEnv a -> VarEnv a -> VarEnv a
    
    530 532
     mapVarEnv         :: (a -> b) -> VarEnv a -> VarEnv b
    
    ... ... @@ -552,6 +554,7 @@ extendVarEnv_Acc = addToUFM_Acc
    552 554
     extendVarEnvList = addListToUFM
    
    553 555
     plusVarEnv_C     = plusUFM_C
    
    554 556
     strictPlusVarEnv_C = strictPlusUFM_C
    
    557
    +strictPlusVarEnv_C_Directly = strictPlusUFM_C_Directly
    
    555 558
     plusVarEnv_CD    = plusUFM_CD
    
    556 559
     plusMaybeVarEnv_C = plusMaybeUFM_C
    
    557 560
     delVarEnvList    = delListFromUFM