[Git][ghc/ghc][wip/computed-goto] 7 commits: build: Relax ghc/ghc-boot Cabal bound to 3.16

5 Nov 2025

Cheng Shao pushed to branch wip/computed-goto at Glasgow Haskell Compiler / GHC


Commits:
b5508f2c by Rodrigo Mesquita at 2025-11-04T14:10:56+00:00
build: Relax ghc/ghc-boot Cabal bound to 3.16

Fixes #26202

- - - - -
c5b3541f by Rodrigo Mesquita at 2025-11-04T14:10:56+00:00
cabal-reinstall: Use haddock-api +in-tree-ghc

Fixes #26202

- - - - -
c6d4b945 by Rodrigo Mesquita at 2025-11-04T14:10:56+00:00
cabal-reinstall: Pass --strict to Happy

This is necessary to make the generated Parser build successfully
This mimics Hadrian, which always passes --strict to happy.

Fixes #26202

- - - - -
79df1e0e by Rodrigo Mesquita at 2025-11-04T14:10:56+00:00
genprimopcode: Require higher happy version

I've bumped the happy version to forbid deprecated Happy versions which
don't successfully compile.

- - - - -
fa5d33de by Simon Peyton Jones at 2025-11-05T08:35:40-05:00
Add a HsWrapper optimiser

This MR addresses #26349, by introduceing optSubTypeHsWrapper.
There is a long

   Note [Deep subsumption and WpSubType]

in GHC.Tc.Types.Evidence that explains what is going on.

- - - - -
ea58cae5 by Simon Peyton Jones at 2025-11-05T08:35:40-05:00
Improve mkWpFun_FRR

This commit ensures that `mkWpFun_FRR` directly produces a `FunCo` in
the cases where it can.

(Previously called `mkWpFun` which in turn optimised to a `FunCo`, but
that made the smarts in `mkWpFun` /essential/ rather than (as they
should be) optional.

- - - - -
65770b9a by Cheng Shao at 2025-11-05T15:28:06+01:00
rts: use computed goto for instruction dispatch in the bytecode interpreter

This patch uses computed goto for instruction dispatch in the bytecode
interpreter. Previously instruction dispatch is done by a classic
switch loop, so executing the next instruction requires two jumps: one
to the start of the switch loop and another to the case block based on
the instruction tag. By using computed goto, we can build a jump table
consisted of code addresses indexed by the instruction tags
themselves, so executing the next instruction requires only one jump,
to the destination directly fetched from the jump table.

Closes #12953.

- - - - -


25 changed files:

- cabal.project-reinstall
- compiler/GHC/Core/Coercion.hs
- compiler/GHC/Hs/Syn/Type.hs
- compiler/GHC/HsToCore/Binds.hs
- compiler/GHC/HsToCore/Match.hs
- compiler/GHC/Iface/Ext/Ast.hs
- compiler/GHC/Tc/Errors/Hole.hs
- compiler/GHC/Tc/Gen/App.hs
- compiler/GHC/Tc/Gen/Head.hs
- compiler/GHC/Tc/Gen/Pat.hs
- compiler/GHC/Tc/Types/Evidence.hs
- compiler/GHC/Tc/Utils/Concrete.hs
- compiler/GHC/Tc/Utils/Unify.hs
- compiler/GHC/Tc/Zonk/Type.hs
- compiler/Setup.hs
- compiler/ghc.cabal.in
- libraries/ghc-boot/Setup.hs
- libraries/ghc-boot/ghc-boot.cabal.in
- rts/Interpreter.c
- rts/include/rts/Bytecodes.h
- + testsuite/tests/simplCore/should_compile/T26349.hs
- + testsuite/tests/simplCore/should_compile/T26349.stderr
- testsuite/tests/simplCore/should_compile/all.T
- testsuite/tests/simplCore/should_compile/rule2.stderr
- utils/genprimopcode/genprimopcode.cabal


Changes:

=====================================
cabal.project-reinstall
=====================================
@@ -59,6 +59,7 @@ constraints: ghc +internal-interpreter +dynamic-system-linke,
              ghc-bin +internal-interpreter +threaded,
              ghci +internal-interpreter,
              haddock +in-ghc-tree,
+             haddock-api +in-ghc-tree,
              any.array installed,
              any.base installed,
              any.deepseq installed,
@@ -68,6 +69,8 @@ constraints: ghc +internal-interpreter +dynamic-system-linke,
              any.pretty installed,
              any.template-haskell installed
 
+package *
+    happy-options: --strict
 
 benchmarks: False
 tests: False


=====================================
compiler/GHC/Core/Coercion.hs
=====================================
@@ -41,7 +41,8 @@ module GHC.Core.Coercion (
         mkInstCo, mkAppCo, mkAppCos, mkTyConAppCo,
         mkFunCo, mkFunCo2, mkFunCoNoFTF, mkFunResCo,
         mkNakedFunCo,
-        mkNakedForAllCo, mkForAllCo, mkForAllVisCos, mkHomoForAllCos,
+        mkNakedForAllCo, mkForAllCo, mkForAllVisCos,
+        mkHomoForAllCo, mkHomoForAllCos,
         mkPhantomCo, mkAxiomCo,
         mkHoleCo, mkUnivCo, mkSubCo,
         mkProofIrrelCo,
@@ -980,7 +981,7 @@ mkForAllCo v visL visR kind_co co
   = mkReflCo r (mkTyCoForAllTy v visL ty)
 
   | otherwise
-  = mkForAllCo_NoRefl v visL visR kind_co co
+  = mk_forall_co v visL visR kind_co co
 
 -- mkForAllVisCos [tv{vis}] constructs a cast
 --   forall tv. res  ~R#   forall tv{vis} res`.
@@ -1000,14 +1001,26 @@ mkHomoForAllCos vs orig_co
   = foldr go orig_co vs
   where
     go :: ForAllTyBinder -> Coercion -> Coercion
-    go (Bndr var vis) = mkForAllCo_NoRefl var vis vis MRefl
-

--- | Like 'mkForAllCo', but there is no need to check that the inner coercion isn't Refl;
---   the caller has done that. (For example, it is guaranteed in 'mkHomoForAllCos'.)
--- The kind of the tycovar should be the left-hand kind of the kind coercion.
-mkForAllCo_NoRefl :: TyCoVar -> ForAllTyFlag -> ForAllTyFlag
-                  -> KindMCoercion -> Coercion -> Coercion
-mkForAllCo_NoRefl tcv visL visR kind_co co
+    go (Bndr var vis) co = mk_forall_co var vis vis MRefl co
+
+mkHomoForAllCo :: TyVar -> Coercion -> Coercion
+-- Specialised for a single TyVar,
+--    and visibility of coreTyLamForAllTyFlag
+mkHomoForAllCo tv orig_co
+  | Just (ty, r) <- isReflCo_maybe orig_co
+  = mkReflCo r (mkForAllTy (Bndr tv vis) ty)
+  | otherwise
+  = mk_forall_co tv vis vis MRefl orig_co
+  where
+    vis  = coreTyLamForAllTyFlag
+
+-- | `mk_forall_co` just builds a ForAllCo.
+-- With debug on, it checks invariants (e.g. he kind of the tycovar should
+--   be the left-hand kind of the kind coercion).
+-- Callers should have done any isReflCo short-cutting.
+mk_forall_co :: TyCoVar -> ForAllTyFlag -> ForAllTyFlag
+             -> KindMCoercion -> Coercion -> Coercion
+mk_forall_co tcv visL visR kind_co co
   = assertGoodForAllCo tcv visL visR kind_co co $
     assertPpr (not (isReflCo co && isReflMCo kind_co && visL == visR)) (ppr co) $
     ForAllCo { fco_tcv = tcv, fco_visL = visL, fco_visR = visR
@@ -1769,7 +1782,7 @@ mkPiCos r vs co = foldr (mkPiCo r) co vs
 -- | Make a forall 'Coercion', where both types related by the coercion
 -- are quantified over the same variable.
 mkPiCo  :: Role -> Var -> Coercion -> Coercion
-mkPiCo r v co | isTyVar v = mkHomoForAllCos [Bndr v coreTyLamForAllTyFlag] co
+mkPiCo r v co | isTyVar v = mkHomoForAllCo v co
               | isCoVar v = assert (not (v `elemVarSet` tyCoVarsOfCo co)) $
                   -- We didn't call mkForAllCo here because if v does not appear
                   -- in co, the argument coercion will be nominal. But here we


=====================================
compiler/GHC/Hs/Syn/Type.hs
=====================================
@@ -187,11 +187,13 @@ liftPRType :: (Type -> Type) -> PRType -> PRType
 liftPRType f pty = (f (prTypeType pty), [])
 
 hsWrapperType :: HsWrapper -> Type -> Type
+-- Return the type of (WrapExpr wrap e), given that e :: ty
 hsWrapperType wrap ty = prTypeType $ go wrap (ty,[])
   where
     go WpHole              = id
+    go (WpSubType w)       = go w
     go (w1 `WpCompose` w2) = go w1 . go w2
-    go (WpFun _ w2 (Scaled m exp_arg)) = liftPRType $ \t ->
+    go (WpFun _ w2 (Scaled m exp_arg) _) = liftPRType $ \t ->
       let act_res = funResultTy t
           exp_res = hsWrapperType w2 act_res
       in mkFunctionType m exp_arg exp_res


=====================================
compiler/GHC/HsToCore/Binds.hs
=====================================
@@ -1597,9 +1597,13 @@ dsHsWrapper hs_wrap thing_inside
 ds_hs_wrapper :: HsWrapper
               -> ((CoreExpr -> CoreExpr) -> DsM a)
               -> DsM a
-ds_hs_wrapper wrap = go wrap
+ds_hs_wrapper hs_wrap
+  = go hs_wrap
   where
     go WpHole            k = k $ \e -> e
+    go (WpSubType w)     k = go (optSubTypeHsWrapper w) k
+                             -- See (DSST3) in Note [Deep subsumption and WpSubType]
+                             --             in GHC.Tc.Types.Evidence
     go (WpTyApp ty)      k = k $ \e -> App e (Type ty)
     go (WpEvLam ev)      k = k $ Lam ev
     go (WpTyLam tv)      k = k $ Lam tv
@@ -1612,13 +1616,13 @@ ds_hs_wrapper wrap = go wrap
     go (WpCompose c1 c2) k = go c1 $ \w1 ->
                              go c2 $ \w2 ->
                              k (w1 . w2)
-    go (WpFun c1 c2 st)  k = -- See Note [Desugaring WpFun]
-                             do { x <- newSysLocalDs st
-                                ; go c1 $ \w1 ->
-                                  go c2 $ \w2 ->
-                                  let app f a = mkCoreApp (text "dsHsWrapper") f a
-                                      arg     = w1 (Var x)
-                                  in k (\e -> (Lam x (w2 (app e arg)))) }
+    go (WpFun c1 c2 st _) k = -- See Note [Desugaring WpFun]
+                              do { x <- newSysLocalDs st
+                                 ; go c1 $ \w1 ->
+                                   go c2 $ \w2 ->
+                                   let app f a = mkCoreApp (text "dsHsWrapper") f a
+                                       arg     = w1 (Var x)
+                                   in k (\e -> (Lam x (w2 (app e arg)))) }
 
 --------------------------------------
 dsTcEvBinds_s :: [TcEvBinds] -> ([CoreBind] -> DsM a) -> DsM a


=====================================
compiler/GHC/HsToCore/Match.hs
=====================================
@@ -1240,7 +1240,7 @@ viewLExprEq (e1,_) (e2,_) = lexp e1 e2
     --        equating different ways of writing a coercion)
     wrap WpHole WpHole = True
     wrap (WpCompose w1 w2) (WpCompose w1' w2') = wrap w1 w1' && wrap w2 w2'
-    wrap (WpFun w1 w2 _)   (WpFun w1' w2' _)   = wrap w1 w1' && wrap w2 w2'
+    wrap (WpFun w1 w2 _ _) (WpFun w1' w2' _ _) = wrap w1 w1' && wrap w2 w2'
     wrap (WpCast co)       (WpCast co')        = co `eqCoercion` co'
     wrap (WpEvApp et1)     (WpEvApp et2)       = et1 `ev_term` et2
     wrap (WpTyApp t)       (WpTyApp t')        = eqType t t'


=====================================
compiler/GHC/Iface/Ext/Ast.hs
=====================================
@@ -696,7 +696,7 @@ instance ToHie (LocatedA HsWrapper) where
         (WpLet bs)      -> toHie $ EvBindContext (mkScope osp) (getRealSpanA osp) (L osp bs)
         (WpCompose a b) -> concatM $
           [toHie (L osp a), toHie (L osp b)]
-        (WpFun a b _)   -> concatM $
+        (WpFun a b _ _) -> concatM $
           [toHie (L osp a), toHie (L osp b)]
         (WpEvLam a) ->
           toHie $ C (EvidenceVarBind EvWrapperBind (mkScope osp) (getRealSpanA osp))


=====================================
compiler/GHC/Tc/Errors/Hole.hs
=====================================
@@ -823,9 +823,11 @@ tcFilterHoleFits limit typed_hole ht@(hole_ty, _) candidates =
 
     unfoldWrapper :: HsWrapper -> [Type]
     unfoldWrapper = reverse . unfWrp'
-      where unfWrp' (WpTyApp ty) = [ty]
-            unfWrp' (WpCompose w1 w2) = unfWrp' w1 ++ unfWrp' w2
-            unfWrp' _ = []
+      where
+        unfWrp' (WpTyApp ty)      = [ty]
+        unfWrp' (WpSubType w)     = unfWrp' w
+        unfWrp' (WpCompose w1 w2) = unfWrp' w1 ++ unfWrp' w2
+        unfWrp' _                  = []
 
 
     -- The real work happens here, where we invoke the type checker using


=====================================
compiler/GHC/Tc/Gen/App.hs
=====================================
@@ -794,7 +794,7 @@ tcInstFun do_ql inst_final (tc_fun, fun_ctxt) fun_sigma rn_args
       = do { let herald = case fun_ctxt of
                              VAExpansion (OrigStmt{}) _ _ -> ExpectedFunTySyntaxOp DoOrigin tc_fun
                              _ ->  ExpectedFunTyArg (HsExprTcThing tc_fun) (unLoc arg)
-           ; (wrap, arg_ty, res_ty) <-
+           ; (fun_co, arg_ty, res_ty) <-
                 -- NB: matchActualFunTy does the rep-poly check.
                 -- For example, suppose we have f :: forall r (a::TYPE r). a -> Int
                 -- In an application (f x), we need 'x' to have a fixed runtime
@@ -805,7 +805,7 @@ tcInstFun do_ql inst_final (tc_fun, fun_ctxt) fun_sigma rn_args
                   (n_val_args, fun_sigma) fun_ty
 
            ; arg' <- quickLookArg do_ql ctxt arg arg_ty
-           ; let acc' = arg' : addArgWrap wrap acc
+           ; let acc' = arg' : addArgWrap (mkWpCastN fun_co) acc
            ; go (pos+1) acc' res_ty rest_args }
 
     new_arg_ty :: LHsExpr GhcRn -> Int -> TcM (Scaled TcType)


=====================================
compiler/GHC/Tc/Gen/Head.hs
=====================================
@@ -765,13 +765,13 @@ tcInferOverLit lit@(OverLit { ol_val = val
            thing    = NameThing from_name
            mb_thing = Just thing
            herald   = ExpectedFunTyArg thing (HsLit noExtField hs_lit)
-       ; (wrap2, sarg_ty, res_ty) <- matchActualFunTy herald mb_thing (1, from_ty) from_ty
+       ; (co2, sarg_ty, res_ty) <- matchActualFunTy herald mb_thing (1, from_ty) from_ty
 
        ; co <- unifyType mb_thing (hsLitType hs_lit) (scaledThing sarg_ty)
        -- See Note [Source locations for implicit function calls] in GHC.Iface.Ext.Ast
        ; let lit_expr = L (l2l loc) $ mkHsWrapCo co $
                         HsLit noExtField hs_lit
-             from_expr = mkHsWrap (wrap2 <.> wrap1) $
+             from_expr = mkHsWrap (mkWpCastN co2 <.> wrap1) $
                          mkHsVar (L loc from_id)
              witness = HsApp noExtField (L (l2l loc) from_expr) lit_expr
              lit' = OverLit { ol_val = val


=====================================
compiler/GHC/Tc/Gen/Pat.hs
=====================================
@@ -699,7 +699,7 @@ tc_pat pat_ty penv ps_pat thing_inside = case ps_pat of
 
          -- Expression must be a function
         ; let herald = ExpectedFunTyViewPat $ unLoc expr
-        ; (expr_wrap1, Scaled _mult inf_arg_ty, inf_res_sigma)
+        ; (expr_co1, Scaled _mult inf_arg_ty, inf_res_sigma)
             <- matchActualFunTy herald (Just . HsExprRnThing $ unLoc expr) (1,expr_rho) expr_rho
                -- See Note [View patterns and polymorphism]
                -- expr_wrap1 :: expr_rho "->" (inf_arg_ty -> inf_res_sigma)
@@ -720,7 +720,7 @@ tc_pat pat_ty penv ps_pat thing_inside = case ps_pat of
               -- NB: pat_ty comes from matchActualFunTy, so it has a
               -- fixed RuntimeRep, as needed to call mkWpFun.
 
-              expr_wrap = expr_wrap2' <.> expr_wrap1
+              expr_wrap = expr_wrap2' <.> mkWpCastN expr_co1
 
         ; return $ (ViewPat pat_ty (mkLHsWrap expr_wrap expr') pat', res) }
 


=====================================
compiler/GHC/Tc/Types/Evidence.hs
=====================================
@@ -8,10 +8,11 @@ module GHC.Tc.Types.Evidence (
   -- * HsWrapper
   HsWrapper(..),
   (<.>), mkWpTyApps, mkWpEvApps, mkWpEvVarApps, mkWpTyLams, mkWpForAllCast,
-  mkWpEvLams, mkWpLet, mkWpFun, mkWpCastN, mkWpCastR, mkWpEta,
+  mkWpEvLams, mkWpLet, mkWpFun, mkWpCastN, mkWpCastR, mkWpEta, mkWpSubType,
   collectHsWrapBinders,
   idHsWrapper, isIdHsWrapper,
   pprHsWrapper, hsWrapDictBinders,
+  optSubTypeHsWrapper,
 
   -- * Evidence bindings
   TcEvBinds(..), EvBindsVar(..),
@@ -73,7 +74,7 @@ import GHC.Types.Unique.DFM
 import GHC.Types.Unique.FM
 import GHC.Types.Name( isInternalName )
 import GHC.Types.Var
-import GHC.Types.Id( idScaledType )
+import GHC.Types.Id( idScaledType, idType )
 import GHC.Types.Var.Env
 import GHC.Types.Var.Set
 import GHC.Types.Basic
@@ -134,35 +135,128 @@ maybeSymCo NotSwapped co = co
 ************************************************************************
 -}
 
--- We write    wrap :: t1 ~> t2
--- if       wrap[ e::t1 ] :: t2
+{- Note [Deep subsumption and WpSubType]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+When making DeepSubsumption checks, we may end up with hard-to-spot identity wrappers.
+For example (#26349) suppose we have
+    (forall a. Eq a => a->a) -> Int  <=   (forall a. Eq a => a->a) -> Int
+The two types are equal so we should certainly get an identity wrapper.  But we'll get
+tihs wrapper from `tcSubType`:
+    WpFun (WpTyLam a <.> WpEvLam dg <.> WpLet (dw=dg) <.> WpEvApp dw <.> WpTyApp a)
+          WpHole
+That elaborate wrapper is really just a no-op, but it's far from obvious.  If we just
+desugar (HsWrap f wp) straightforwardly we'll get
+   \(g:forall a. Eq a => a -> a).
+       f (/\a. \(dg:Eq a). let dw=dg in g a dw)
+
+To recognise that as just `f`, we'd have to eta-reduce twice.  But eta-reduction
+is not sound in general, so we'll end up retaining the lambdas.  Two bad results:
+
+* Adding DeepSubsumption gratuitiously makes programs less efficient.
+
+* When the subsumption is on the LHS of a rule, or in a SPECIALISE pragma, we
+  may not be able to make a decent RULE at all, and will fail with "LHS of rule
+  is too complicated to desugar" (#26255)
+
+It'd be ideal to solve the problem at the source, by never generating those
+gruesome wrappers in the first place, but we can't do that because:
+
+* The WpTyLam and WpTyApp are introduced independently, not together, in `tcSubType`,
+  so we can't easily cancel them out.   For example, even if we have
+     forall a. t1  <=  forall a. t2
+  there is no guarantee that these are the "same" a.  E.g.
+     forall a b. a -> b -> b   <=   forall x y. y -> x -> x
+  Similarly WpEvLam and WpEvApp
+
+* We have not yet done constraint solving so we don't know what evidence will
+  end up in those WpLet bindings.
+
+TL;DR we must generate the wrapper and then optimise it way if it turns out
+that it is a no-op.  Here's our solution:
+
+(DSST1) Tag the wrappers generated from a subtype check with WpSubType. In normal
+  wrappers the binders of a WpTyLam or WpEvLam can scope over the "hole" of the
+  wrapper -- that is how we introduce type-lambdas and dictionary-lambda into the
+  terms!  But in /subtype/ wrappers, these type/dictionary lambdas only scope over
+  the WpTyApp and WpEvApp nodes in the /same/ wrapper.  That is what justifies us
+  eta-reducing the type/dictionary lambdas.
+
+  In short, (WpSubType wp) means the same as `wp`, but with the added promise that
+  the binders in `wp` do not scope over the hole.
+
+(DSST2) Avoid creating a WpSubType in the common WpHole case, using `mkWpSubType`.
+
+(DSST3) When desugaring, try eta-reduction on the payload of a WpSubType.
+  This is done in `GHC.HsToCore.Binds.dsHsWrapper` by the call to `optSubTypeHsWrapper`.
+
+  We don't attempt to optimise HsWrappers /other than/ subtype wrappers. Why not?
+  Because there aren't any useful optimsations we can do.  (We could collapse
+  adjacent `WpCast`s perhaps, but that'll happen later automatically via `mkCast`.)
+
+  TL;DR:
+    * we /must/ optimise subtype-HsWrappers (that's the point of this Note!)
+    * there is little point in attempting to optimise any other HsWrappers
+
+Note [WpFun-RR-INVARIANT]
+~~~~~~~~~~~~~~~~~~~~~~~~~
+Given
+  wrap = WpFun wrap1 wrap2 sty1 ty2
+  where:  wrap1 :: exp_arg ~~> act_arg
+          wrap2 :: act_res ~~> exp_res
+          wrap  :: (act_arg -> act_res) ~~> (exp_arg -> exp_res)
+we have
+  WpFun-RR-INVARIANT:
+      the input (exp_arg) and output (act_arg) types of `wrap1`
+      both have a fixed runtime-rep
+
+Reason: We desugar wrap[e] into
+    \(x:exp_arg). wrap2[ e wrap1[x] ]
+And then, because of Note [Representation polymorphism invariants], we need:
+
+  * `exp_arg` must have a fixed runtime rep,
+    so that lambda obeys the the FRR rules
+
+  * `act_arg` must have a fixed runtime rep,
+    so the that application (e wrap1[x]) obeys the FRR tules
+
+Hence WpFun-INVARIANT.
+-}
+
 data HsWrapper
+  -- NOTATION (~~>):
+  --    We write          wrap :: t1 ~~> t2
+  --    if       wrap[ e::t1 ] :: t2
   = WpHole                      -- The identity coercion
 
+  | WpSubType HsWrapper
+       -- (WpSubType wp) is the same as `wp`, but with extra invariants
+       -- See Note [Deep subsumption and WpSubType] (DSST1)
+
   | WpCompose HsWrapper HsWrapper
        -- (wrap1 `WpCompose` wrap2)[e] = wrap1[ wrap2[ e ]]
        --
        -- Hence  (\a. []) `WpCompose` (\b. []) = (\a b. [])
        -- But    ([] a)   `WpCompose` ([] b)   = ([] b a)
        --
-       -- If wrap1 :: t2 ~> t3
-       --    wrap2 :: t1 ~> t2
-       --- Then (wrap1 `WpCompose` wrap2) :: t1 ~> t3
-
-  | WpFun HsWrapper HsWrapper (Scaled TcTypeFRR)
-       -- (WpFun wrap1 wrap2 (w, t1))[e] = \(x:_w exp_arg). wrap2[ e wrap1[x] ]
-       -- So note that if  e     :: act_arg -> act_res
-       --                  wrap1 :: exp_arg ~> act_arg
-       --                  wrap2 :: act_res ~> exp_res
-       --           then   WpFun wrap1 wrap2 : (act_arg -> arg_res) ~> (exp_arg -> exp_res)
+       -- If wrap1 :: t2 ~~> t3
+       --    wrap2 :: t1 ~~> t2
+       --- Then (wrap1 `WpCompose` wrap2) :: t1 ~~> t3
+
+  | WpFun HsWrapper HsWrapper (Scaled TcTypeFRR) TcType
+       -- (WpFun wrap1 wrap2 (w, t1) t2)[e] = \(x:_w exp_arg). wrap2[ e wrap1[x] ]
+       --
+       -- INVARIANT: both input and output types of `wrap1` have a fixed runtime-rep
+       --            See Note [WpFun-RR-INVARIANT]
+       --
+       -- Typing rules:
+       -- If    e     :: act_arg -> act_res
+       --       wrap1 :: exp_arg ~~> act_arg
+       --       wrap2 :: act_res ~~> exp_res
+       -- then   WpFun wrap1 wrap2 :: (act_arg -> act_res) ~~> (exp_arg -> exp_res)
        -- This isn't the same as for mkFunCo, but it has to be this way
        -- because we can't use 'sym' to flip around these HsWrappers
-       -- The TcType is the "from" type of the first wrapper;
-       --     it always a Type, not a Constraint
        --
-       -- NB: a WpFun is always for a (->) function arrow
-       --
-       -- Use 'mkWpFun' to construct such a wrapper.
+       -- NB: a WpFun is always for a (->) function arrow, never (=>)
 
   | WpCast TcCoercionR        -- A cast:  [] `cast` co
                               -- Guaranteed not the identity coercion
@@ -212,50 +306,48 @@ WpCast c1 <.> WpCast c2 = WpCast (c2 `mkTransCo` c1)
   --
   -- NB: <.> behaves like function composition:
   --
-  --   WpCast c1 <.> WpCast c2 :: coercionLKind c2 ~> coercionRKind c1
+  --   WpCast c1 <.> WpCast c2 :: coercionLKind c2 ~~> coercionRKind c1
   --
   -- This is thus the same as WpCast (c2 ; c1) and not WpCast (c1 ; c2).
 c1        <.> c2        = c1 `WpCompose` c2
 
--- | Smart constructor to create a 'WpFun' 'HsWrapper', which avoids introducing
--- a lambda abstraction if the two supplied wrappers are either identities or
--- casts.
---
--- PRECONDITION: either:
---
---  1. both of the 'HsWrapper's are identities or casts, or
---  2. both the "from" and "to" types of the first wrapper have a syntactically
---     fixed RuntimeRep (see Note [Fixed RuntimeRep] in GHC.Tc.Utils.Concrete).
 mkWpFun :: HsWrapper -> HsWrapper
         -> Scaled TcTypeFRR -- ^ the "from" type of the first wrapper
         -> TcType           -- ^ Either "from" type or "to" type of the second wrapper
                             --   (used only when the second wrapper is the identity)
         -> HsWrapper
-mkWpFun WpHole       WpHole       _             _  = WpHole
-mkWpFun WpHole       (WpCast co2) (Scaled w t1) _  = WpCast (mk_wp_fun_co w (mkRepReflCo t1) co2)
-mkWpFun (WpCast co1) WpHole       (Scaled w _)  t2 = WpCast (mk_wp_fun_co w (mkSymCo co1)    (mkRepReflCo t2))
-mkWpFun (WpCast co1) (WpCast co2) (Scaled w _)  _  = WpCast (mk_wp_fun_co w (mkSymCo co1)    co2)
-mkWpFun w_arg        w_res        t1            _  =
-  -- In this case, we will desugar to a lambda
-  --
-  --   \x. w_res[ e w_arg[x] ]
-  --
-  -- To satisfy Note [Representation polymorphism invariants] in GHC.Core,
-  -- it must be the case that both the lambda bound variable x and the function
-  -- argument w_arg[x] have a fixed runtime representation, i.e. that both the
-  -- "from" and "to" types of the first wrapper "w_arg" have a fixed runtime representation.
-  --
-  -- Unfortunately, we can't check this with an assertion here, because of
-  -- [Wrinkle: Typed Template Haskell] in Note [hasFixedRuntimeRep] in GHC.Tc.Utils.Concrete.
-  WpFun w_arg w_res t1
-
-mkWpEta :: [Id] -> HsWrapper -> HsWrapper
+-- ^ Smart constructor for `WpFun`
+-- Just removes clutter and optimises some common cases.
+--
+-- PRECONDITION: same as Note [WpFun-RR-INVARIANT]
+--
+-- Unfortunately, we can't check PRECONDITION with an assertion here, because of
+-- [Wrinkle: Typed Template Haskell] in Note [hasFixedRuntimeRep] in GHC.Tc.Utils.Concrete.
+mkWpFun w1 w2 st1@(Scaled m1 t1) t2
+  = case (w1,w2) of
+      (WpHole,     WpHole)     -> WpHole
+      (WpHole,     WpCast co2) -> WpCast (mk_wp_fun_co m1 (mkRepReflCo t1) co2)
+      (WpCast co1, WpHole)     -> WpCast (mk_wp_fun_co m1 (mkSymCo co1)    (mkRepReflCo t2))
+      (WpCast co1, WpCast co2) -> WpCast (mk_wp_fun_co m1 (mkSymCo co1)    co2)
+      (_,          _)          -> WpFun w1 w2 st1 t2
+
+mkWpSubType :: HsWrapper -> HsWrapper
+-- See (DSST2) in Note [Deep subsumption and WpSubType]
+mkWpSubType WpHole      = WpHole
+mkWpSubType (WpCast co) = WpCast co
+mkWpSubType w           = WpSubType w
+
+mkWpEta :: Type -> [Id] -> HsWrapper -> HsWrapper
 -- (mkWpEta [x1, x2] wrap) [e]
 --   = \x1. \x2.  wrap[e x1 x2]
 -- Just generates a bunch of WpFuns
-mkWpEta xs wrap = foldr eta_one wrap xs
+-- The incoming type is the type of the entire expression
+mkWpEta orig_fun_ty xs wrap = go orig_fun_ty xs
   where
-    eta_one x wrap = WpFun idHsWrapper wrap (idScaledType x)
+    go _      []       = wrap
+    go fun_ty (id:ids) = WpFun idHsWrapper (go res_ty ids) (idScaledType id) res_ty
+                       where
+                         res_ty = funResultTy fun_ty
 
 mk_wp_fun_co :: Mult -> TcCoercionR -> TcCoercionR -> TcCoercionR
 mk_wp_fun_co mult arg_co res_co
@@ -333,8 +425,9 @@ hsWrapDictBinders wrap = go wrap
  where
    go (WpEvLam dict_id)   = unitBag dict_id
    go (w1 `WpCompose` w2) = go w1 `unionBags` go w2
-   go (WpFun _ w _)       = go w
+   go (WpFun _ w _ _)     = go w
    go WpHole              = emptyBag
+   go (WpSubType {})      = emptyBag  -- See Note [Deep subsumption and WpSubType]
    go (WpCast  {})        = emptyBag
    go (WpEvApp {})        = emptyBag
    go (WpTyLam {})        = emptyBag
@@ -350,6 +443,7 @@ collectHsWrapBinders wrap = go wrap []
     go :: HsWrapper -> [HsWrapper] -> ([Var], HsWrapper)
     go (WpEvLam v)       wraps = add_lam v (gos wraps)
     go (WpTyLam v)       wraps = add_lam v (gos wraps)
+    go (WpSubType w)     wraps = go w wraps
     go (WpCompose w1 w2) wraps = go w1 (w2:wraps)
     go wrap              wraps = ([], foldl' (<.>) wrap wraps)
 
@@ -358,6 +452,162 @@ collectHsWrapBinders wrap = go wrap []
 
     add_lam v (vs,w) = (v:vs, w)
 
+
+optSubTypeHsWrapper :: HsWrapper -> HsWrapper
+-- This optimiser is used only on the payload of WpSubType
+-- It finds cases where the entire wrapper is a no-op
+-- See (DSST3) in Note [Deep subsumption and WpSubType]
+optSubTypeHsWrapper wrap
+  = opt wrap
+  where
+    opt :: HsWrapper -> HsWrapper
+    opt w = foldr (<.>) WpHole (opt1 w [])
+
+    opt1 :: HsWrapper -> [HsWrapper] -> [HsWrapper]
+    -- opt1 w ws = w <.> (foldr <.> WpHole ws)
+    -- INVARIANT: ws::[HsWrapper] is optimised
+    opt1 WpHole                 ws = ws
+    opt1 (WpSubType w)          ws = opt1 w ws
+    opt1 (w1 `WpCompose` w2)    ws = opt1 w1 (opt1 w2 ws)
+    opt1 (WpCast co)            ws = opt_co co ws
+    opt1 (WpEvLam ev)           ws = opt_ev_lam ev ws
+    opt1 (WpTyLam tv)           ws = opt_ty_lam tv ws
+    opt1 (WpLet binds)          ws = pushWpLet binds ws
+    opt1 (WpFun w1 w2 sty1 ty2) ws = opt_fun w1 w2 sty1 ty2 ws
+    opt1 w@(WpTyApp {})         ws = w : ws
+    opt1 w@(WpEvApp {})         ws = w : ws
+
+    -----------------
+    -- (WpTyLam a <.> WpTyApp a <.> w) = w
+    -- i.e.   /\a. <hole> a   -->  <hole>
+    -- This is only valid if whatever fills the hole does not mention 'a'
+    -- But that's guaranteed in subtype-wrappers;
+    -- see (DSST1) in Note [Deep subsumption and WpSubType]
+    opt_ty_lam tv (WpTyApp ty : ws)
+      | Just tv' <- getTyVar_maybe ty
+      , tv==tv'
+      , all (tv `not_in`) ws
+      = ws
+
+    -- (WpTyLam a <.> WpCastCo co <.> w)
+    --    = WpCast (ForAllCo a co) (WpTyLam <.> w)
+    opt_ty_lam tv (WpCast co : ws)
+      = opt_co (mkHomoForAllCo tv co) (opt_ty_lam tv ws)
+
+    opt_ty_lam tv ws
+      = WpTyLam tv : ws
+
+    -----------------
+    -- (WpEvLam ev <.> WpEvAp ev <.> w) = w
+    -- Similar notes to WpTyLam
+    opt_ev_lam ev (WpEvApp ev_tm : ws)
+      | EvExpr (Var ev') <- ev_tm
+      , ev == ev'
+      , all (ev `not_in`) ws
+      = ws
+
+    -- (WpEvLam ev <.> WpCast co <.> w)
+    --    = WpCast (FunCo ev co) (WpEvLam <.> w)
+    opt_ev_lam ev (WpCast co : ws)
+      = opt_co fun_co (opt_ev_lam ev ws)
+      where
+        fun_co = mkFunCo Representational FTF_C_T
+                        (mkNomReflCo ManyTy)
+                        (mkRepReflCo (idType ev))
+                        co
+
+    opt_ev_lam ev ws
+      = WpEvLam ev : ws
+
+    -----------------
+    -- WpCast co <.> WpCast co' <.> ws = WpCast (co;co') ws
+    opt_co co (WpCast co' : ws)     = opt_co (co `mkTransCo` co') ws
+    opt_co co ws | isReflexiveCo co = ws
+                 | otherwise        = WpCast co : ws
+
+    ------------------
+    opt_fun w1 w2 sty1 ty2 ws
+      = case mkWpFun (opt w1) (opt w2) sty1 ty2 of
+          WpHole    -> ws
+          WpCast co -> opt_co co ws
+          w         -> w : ws
+
+    ------------------
+    -- Tiresome check that the lambda-bound type/evidence variable that we
+    -- want to eta-reduce isn't free in the rest of the wrapper
+    not_in :: TyVar -> HsWrapper -> Bool
+    not_in _  WpHole                   = True
+    not_in v (WpCast co)               = not (anyFreeVarsOfCo (== v) co)
+    not_in v (WpTyApp ty)              = not (anyFreeVarsOfType (== v) ty)
+    not_in v (WpFun w1 w2 _ _)         = not_in v w1 && not_in v w2
+    not_in v (WpSubType w)             = not_in v w
+    not_in v (WpCompose w1 w2)         = not_in v w1 && not_in v w2
+    not_in v (WpEvApp (EvExpr e))      = not (v `elemVarSet` exprFreeVars e)
+    not_in _ (WpEvApp (EvTypeable {})) = False  -- Giving up; conservative
+    not_in _ (WpEvApp (EvFun {}))      = False  -- Giving up; conservative
+    not_in _ (WpTyLam {}) = False    -- Give  up; conservative
+    not_in _ (WpEvLam {}) = False    -- Ditto
+    not_in _ (WpLet {})   = False    -- Ditto
+
+pushWpLet :: TcEvBinds -> [HsWrapper] -> [HsWrapper]
+-- See if we can transform
+--    WpLet binds <.> w1 <.> .. <.> wn   -->   w1' <.> .. <.> wn'
+-- by substitution.
+-- We do this just for the narrow case when
+--   - the `binds` are all just v=w, variables only
+--   - the wi are all WpTyApp, WpEvApp, or WpCast
+-- This is just enough to get us the eta-reductions that we seek
+pushWpLet tc_ev_binds ws
+  = case tc_ev_binds of
+      TcEvBinds {} -> pprPanic "pushWpLet" (ppr tc_ev_binds)
+      EvBinds binds
+        | isEmptyBag binds
+        -> ws
+        | Just env <- ev_bind_swizzle binds
+        -> case go env ws of
+              Just ws' -> ws'
+              Nothing  -> bale_out
+        | otherwise
+        -> bale_out
+  where
+    bale_out = WpLet tc_ev_binds : ws
+
+    go :: IdEnv Id -> [HsWrapper] -> Maybe [HsWrapper]
+    go env (WpCast co  : ws) = do { ws' <- go env ws
+                                  ; return (WpCast co  : ws') }
+    go env (WpTyApp ty : ws) = do { ws' <- go env ws
+                                  ; return (WpTyApp ty : ws') }
+    go env (WpEvApp (EvExpr (Var v)) : ws)
+       = do { v'  <- swizzle_id env v
+            ; ws' <- go env ws
+            ; return (WpEvApp (EvExpr (Var v')) : ws') }
+
+    go _ ws = case ws of
+                 []    -> Just []
+                 (_:_) -> Nothing  -- Could not fully eliminate the WpLet
+
+    swizzle_id :: IdEnv Id -> Id -> Maybe Id
+    -- Nothing <=> ran out of fuel
+    -- This is just belt and braces; we should never build bottom evidence
+    swizzle_id env v = go 100 v
+      where
+        go :: Int -> EvId -> Maybe EvId
+        go fuel v
+          | fuel == 0                     = Nothing
+          | Just v' <- lookupVarEnv env v = go (fuel-1) v'
+          | otherwise                     = Just v
+
+    ev_bind_swizzle :: Bag EvBind -> Maybe (IdEnv Id)
+    -- Succeeds only if the bindings are all var-to-var bindings
+    ev_bind_swizzle evbs = foldl' do_one (Just emptyVarEnv) evbs
+      where
+        do_one :: Maybe (IdEnv Id) -> EvBind -> Maybe (IdEnv Id)
+        do_one Nothing _ = Nothing
+        do_one (Just swizzle) (EvBind {eb_lhs = bndr, eb_rhs = rhs})
+          = case rhs of
+               EvExpr (Var v) -> Just (extendVarEnv swizzle bndr v)
+               _              -> Nothing
+
 {-
 ************************************************************************
 *                                                                      *
@@ -1018,8 +1268,9 @@ pprHsWrapper wrap pp_thing_inside
     -- True  <=> appears in function application position
     -- False <=> appears as body of let or lambda
     help it WpHole             = it
-    help it (WpCompose f1 f2)  = help (help it f2) f1
-    help it (WpFun f1 f2 (Scaled w t1)) = add_parens $ text "\\(x" <> dcolon <> brackets (ppr w) <> ppr t1 <> text ")." <+>
+    help it (WpCompose w1 w2)  = help (help it w2) w1
+    help it (WpSubType w)      = no_parens $ text "subtype" <> braces (help it w False)
+    help it (WpFun f1 f2 (Scaled w t1) _) = add_parens $ text "\\(x" <> dcolon <> brackets (ppr w) <> ppr t1 <> text ")." <+>
                                             help (\_ -> it True <+> help (\_ -> text "x") f1 True) f2 False
     help it (WpCast co)   = add_parens $ sep [it False, nest 2 (text "|>"
                                               <+> pprParendCo co)]


=====================================
compiler/GHC/Tc/Utils/Concrete.hs
=====================================
@@ -626,8 +626,12 @@ hasFixedRuntimeRep :: HasDebugCallStack
                         -- @ki@ is concrete, and @co :: ty ~# ty'@.
                         -- That is, @ty'@ has a syntactically fixed RuntimeRep
                         -- in the sense of Note [Fixed RuntimeRep].
-hasFixedRuntimeRep frr_ctxt ty =
-  checkFRR_with (fmap (fmap coToMCo) . unifyConcrete_kind (fsLit "cx") . ConcreteFRR) frr_ctxt ty
+hasFixedRuntimeRep frr_ctxt ty
+  = checkFRR_with unify_conc frr_ctxt ty
+  where
+    unify_conc frr_orig ki
+      = do { co <- unifyConcrete_kind (fsLit "cx") (ConcreteFRR frr_orig) ki
+           ; return (coToMCo co) }
 
 -- | Like 'hasFixedRuntimeRep', but we perform an eager syntactic check.
 --


=====================================
compiler/GHC/Tc/Utils/Unify.hs
=====================================
@@ -148,7 +148,7 @@ matchActualFunTy
       -- (Both are used only for error messages)
   -> TcRhoType
       -- ^ Type to analyse: a TcRhoType
-  -> TcM (HsWrapper, Scaled TcSigmaTypeFRR, TcSigmaType)
+  -> TcM (TcCoercion, Scaled TcSigmaTypeFRR, TcSigmaType)
 -- This function takes in a type to analyse (a RhoType) and returns
 -- an argument type and a result type (splitting apart a function arrow).
 -- The returned argument type is a SigmaType with a fixed RuntimeRep;
@@ -157,7 +157,7 @@ matchActualFunTy
 -- See Note [matchActualFunTy error handling] for the first three arguments
 
 -- If   (wrap, arg_ty, res_ty) = matchActualFunTy ... fun_ty
--- then wrap :: fun_ty ~> (arg_ty -> res_ty)
+-- then wrap :: fun_ty ~~> (arg_ty -> res_ty)
 -- and NB: res_ty is an (uninstantiated) SigmaType
 
 matchActualFunTy herald mb_thing err_info fun_ty
@@ -172,13 +172,13 @@ matchActualFunTy herald mb_thing err_info fun_ty
     -- hide the forall inside a meta-variable
     go :: TcRhoType   -- The type we're processing, perhaps after
                       -- expanding type synonyms
-       -> TcM (HsWrapper, Scaled TcSigmaTypeFRR, TcSigmaType)
+       -> TcM (TcCoercion, Scaled TcSigmaTypeFRR, TcSigmaType)
     go ty | Just ty' <- coreView ty = go ty'
 
     go (FunTy { ft_af = af, ft_mult = w, ft_arg = arg_ty, ft_res = res_ty })
       = assert (isVisibleFunArg af) $
       do { hasFixedRuntimeRep_syntactic (FRRExpectedFunTy herald 1) arg_ty
-         ; return (idHsWrapper, Scaled w arg_ty, res_ty) }
+         ; return (mkNomReflCo fun_ty, Scaled w arg_ty, res_ty) }
 
     go ty@(TyVarTy tv)
       | isMetaTyVar tv
@@ -210,7 +210,7 @@ matchActualFunTy herald mb_thing err_info fun_ty
            ; res_ty <- newOpenFlexiTyVarTy
            ; let unif_fun_ty = mkScaledFunTys [arg_ty] res_ty
            ; co <- unifyType mb_thing fun_ty unif_fun_ty
-           ; return (mkWpCastN co, arg_ty, res_ty) }
+           ; return (co, arg_ty, res_ty) }
 
     ------------
     mk_ctxt :: TcType -> TidyEnv -> ZonkM (TidyEnv, ErrCtxtMsg)
@@ -249,8 +249,10 @@ matchActualFunTys :: ExpectedFunTyOrigin -- ^ See Note [Herald for matchExpected
                   -> Arity
                   -> TcSigmaType
                   -> TcM (HsWrapper, [Scaled TcSigmaTypeFRR], TcRhoType)
--- If    matchActualFunTys n ty = (wrap, [t1,..,tn], res_ty)
--- then  wrap : ty ~> (t1 -> ... -> tn -> res_ty)
+-- NB: Called only from `tcSynArgA`, and hence scheduled for destruction
+--
+-- If    matchActualFunTys n fun_ty = (wrap, [t1,..,tn], res_ty)
+-- then  wrap : fun_ty ~~>  (t1 -> ... -> tn -> res_ty)
 --       and res_ty is a RhoType
 -- NB: the returned type is top-instantiated; it's a RhoType
 matchActualFunTys herald ct_orig n_val_args_wanted top_ty
@@ -265,15 +267,13 @@ matchActualFunTys herald ct_orig n_val_args_wanted top_ty
     go 0 _ fun_ty = return (idHsWrapper, [], fun_ty)
 
     go n so_far fun_ty
-      = do { (wrap_fun1, arg_ty1, res_ty1) <- matchActualFunTy
-                                                 herald Nothing
-                                                 (n_val_args_wanted, top_ty)
-                                                 fun_ty
-           ; (wrap_res, arg_tys, res_ty)   <- go (n-1) (arg_ty1:so_far) res_ty1
+      = do { (co1, arg_ty1, res_ty1) <- matchActualFunTy herald Nothing
+                                           (n_val_args_wanted, top_ty) fun_ty
+           ; (wrap_res, arg_tys, res_ty) <- go (n-1) (arg_ty1:so_far) res_ty1
            ; let wrap_fun2 = mkWpFun idHsWrapper wrap_res arg_ty1 res_ty
            -- NB: arg_ty1 comes from matchActualFunTy, so it has
-           -- a syntactically fixed RuntimeRep as needed to call mkWpFun.
-           ; return (wrap_fun2 <.> wrap_fun1, arg_ty1:arg_tys, res_ty) }
+           -- a syntactically fixed RuntimeRep
+           ; return (wrap_fun2 <.> mkWpCastN co1, arg_ty1:arg_tys, res_ty) }
 
 {-
 ************************************************************************
@@ -459,7 +459,7 @@ tcSkolemiseGeneral ds_flag ctxt top_ty expected_ty thing_inside
 tcSkolemiseCompleteSig :: TcCompleteSig
                        -> ([ExpPatType] -> TcRhoType -> TcM result)
                        -> TcM (HsWrapper, result)
--- ^ The wrapper has type: spec_ty ~> expected_ty
+-- ^ The wrapper has type: spec_ty ~~> expected_ty
 -- See Note [Skolemisation] for the differences between
 -- tcSkolemiseCompleteSig and tcTopSkolemise
 
@@ -790,7 +790,7 @@ matchExpectedFunTys :: forall a.
                     -> ([ExpPatType] -> ExpRhoType -> TcM a)
                     -> TcM (HsWrapper, a)
 -- If    matchExpectedFunTys n ty = (wrap, _)
--- then  wrap : (t1 -> ... -> tn -> ty_r) ~> ty,
+-- then  wrap : (t1 -> ... -> tn -> ty_r) ~~> ty,
 --   where [t1, ..., tn], ty_r are passed to the thing_inside
 --
 -- Unconditionally concludes by skolemising any trailing invisible
@@ -865,12 +865,13 @@ matchExpectedFunTys herald ctx arity (Check top_ty) thing_inside
                                    , ft_arg = arg_ty, ft_res = res_ty })
       = assert (isVisibleFunArg af) $
         do { let arg_pos = arity - n_req + 1   -- 1 for the first argument etc
-           ; (arg_co, arg_ty) <- hasFixedRuntimeRep (FRRExpectedFunTy herald arg_pos) arg_ty
+           ; (arg_co, arg_ty_frr) <- hasFixedRuntimeRep (FRRExpectedFunTy herald arg_pos) arg_ty
+           ; let arg_sty_frr = Scaled mult arg_ty_frr
            ; (wrap_res, result) <- check (n_req - 1)
-                                         (mkCheckExpFunPatTy (Scaled mult arg_ty) : rev_pat_tys)
+                                         (mkCheckExpFunPatTy arg_sty_frr : rev_pat_tys)
                                          res_ty
            ; let wrap_arg = mkWpCastN arg_co
-                 fun_wrap = mkWpFun wrap_arg wrap_res (Scaled mult arg_ty) res_ty
+                 fun_wrap = mkWpFun wrap_arg wrap_res arg_sty_frr res_ty
            ; return (fun_wrap, result) }
 
     ----------------------------
@@ -1407,7 +1408,7 @@ tcSubTypePat :: CtOrigin -> UserTypeCtxt
 -- Used in patterns; polarity is backwards compared
 --   to tcSubType
 -- If wrap = tc_sub_type_et t1 t2
---    => wrap :: t1 ~> t2
+--    => wrap :: t1 ~~> t2
 tcSubTypePat inst_orig ctxt (Check ty_actual) ty_expected
   = tc_sub_type unifyTypeET inst_orig ctxt ty_actual ty_expected
 
@@ -1427,11 +1428,12 @@ tcSubTypeDS :: HsExpr GhcRn
                            -- DeepSubsumption <=> when checking, this type
                            --                     is deeply skolemised
             -> TcM HsWrapper
--- Only one call site, in GHC.Tc.Gen.App.tcApp
+-- Only one call site, in GHC.Tc.Gen.App.checkResultTy
 tcSubTypeDS rn_expr act_rho exp_rho
-  = tc_sub_type_deep Top (unifyExprType rn_expr) orig GenSigCtxt act_rho exp_rho
-  where
-    orig = exprCtOrigin rn_expr
+  = do { wrap <- tc_sub_type_deep Top (unifyExprType rn_expr)
+                                  (exprCtOrigin rn_expr)
+                                  GenSigCtxt act_rho exp_rho
+       ; return (mkWpSubType wrap) }
 
 ---------------
 
@@ -1456,7 +1458,7 @@ tcSubTypeSigma :: CtOrigin       -- where did the actual type arise / why are we
                -> TcSigmaType -> TcSigmaType -> TcM HsWrapper
 -- External entry point, but no ExpTypes on either side
 -- Checks that actual <= expected
--- Returns HsWrapper :: actual ~ expected
+-- Returns HsWrapper :: actual ~~> expected
 tcSubTypeSigma orig ctxt ty_actual ty_expected
   = tc_sub_type (unifyType Nothing) orig ctxt ty_actual ty_expected
 
@@ -1495,7 +1497,7 @@ tc_sub_type :: (TcType -> TcType -> TcM TcCoercionN)  -- How to unify
             -> TcM HsWrapper
 -- Checks that actual_ty is more polymorphic than expected_ty
 -- If wrap = tc_sub_type t1 t2
---    => wrap :: t1 ~> t2
+--    => wrap :: t1 ~~> t2
 --
 -- The "how to unify argument" is always a call to `uType TypeLevel orig`,
 -- but with different ways of constructing the CtOrigin `orig` from
@@ -1504,7 +1506,8 @@ tc_sub_type :: (TcType -> TcType -> TcM TcCoercionN)  -- How to unify
 ----------------------
 tc_sub_type unify inst_orig ctxt ty_actual ty_expected
   = do { ds_flag <- getDeepSubsumptionFlag
-       ; tc_sub_type_ds Top ds_flag unify inst_orig ctxt ty_actual ty_expected }
+       ; wrap <- tc_sub_type_ds Top ds_flag unify inst_orig ctxt ty_actual ty_expected
+       ; return (mkWpSubType wrap) }
 
 ----------------------
 tc_sub_type_ds :: Position p -- ^ position in the type (for error messages only)
@@ -1753,59 +1756,59 @@ we deal with function arrows. Suppose we have:
   ty_actual   = act_arg -> act_res
   ty_expected = exp_arg -> exp_res
 
-To produce fun_wrap :: (act_arg -> act_res) ~> (exp_arg -> exp_res), we use
+To produce fun_wrap :: (act_arg -> act_res) ~~> (exp_arg -> exp_res), we use
 the fact that the function arrow is contravariant in its argument type and
 covariant in its result type. Thus we recursively perform subtype checks
 on the argument types (with actual/expected switched) and the result types,
 to get:
 
-  arg_wrap :: exp_arg ~> act_arg   -- NB: expected/actual have switched sides
-  res_wrap :: act_res ~> exp_res
+  arg_wrap :: exp_arg ~~> act_arg   -- NB: expected/actual have switched sides
+  res_wrap :: act_res ~~> exp_res
 
 Then fun_wrap = mkWpFun arg_wrap res_wrap.
 
-Wrinkle [Representation-polymorphism checking during subtyping]
+Note [Representation-polymorphism checking during subtyping]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+When doing deep subsumption in `tc_sub_type_deep`, looking under function arrows,
+we would usually build a `WpFun` HsWrapper.  When desugared, we get eta-expansion:
 
-  Inserting a WpFun HsWrapper amounts to impedance matching in deep subsumption
-  via eta-expansion:
+  f  ==>  \(x :: exp_arg). res_wrap [ f (arg_wrap [x]) ]
 
-    f  ==>  \ (x :: exp_arg) -> res_wrap [ f (arg_wrap [x]) ]
+Since we produce a lambda, we must enforce the representation polymorphism
+invariants described in Note [Representation polymorphism invariants] in GHC.Core.
+That is, we must ensure that both
+   - x (the lambda binder), and
+   - (arg_wrap [x]) (the function argument)
+have a fixed runtime representation.
 
-  As we produce a lambda, we must enforce the representation polymorphism
-  invariants described in Note [Representation polymorphism invariants] in GHC.Core.
-  That is, we must ensure that both x (the lambda binder) and (arg_wrap [x]) (the function argument)
-  have a fixed runtime representation.
+But we don't /always/ need to produce a `WpFun`: if both argument and result wrappers
+are merely coercions, we can produce a `WpCast co` instead of a `WpFun`.  In that
+case there is no eta-expansion, and hence no need for FRR checks.
 
-  Note however that desugaring mkWpFun does not always introduce a lambda: if
-  both the argument and result HsWrappers are casts, then a FunCo cast suffices,
-  in which case we should not perform representation-polymorphism checking.
+Here's a contrived example (there are undoubtedly more natural examples)
+(see testsuite/tests/rep-poly/NoEtaRequired):
 
-  This means that, in the FunTy/FunTy case of tc_sub_type_deep, we can skip
-  the representation-polymorphism checks if the produced argument and result
-  wrappers are identities or casts.
-  It is important to do so, otherwise we reject valid programs.
+    type Id :: k -> k
+    type family Id a where
 
-    Here's a contrived example (there are undoubtedly more natural examples)
-    (see testsuite/tests/rep-poly/NoEtaRequired):
+    type T :: TYPE r -> TYPE (Id r)
+    type family T a where
 
-      type Id :: k -> k
-      type family Id a where
+    test :: forall r (a :: TYPE r). a :~~: T a -> ()
+    test HRefl =
+      let
+        f :: (a -> a) -> ()
+        f _ = ()
+        g :: T a -> T a
+        g = undefined
+      in f g
 
-      type T :: TYPE r -> TYPE (Id r)
-      type family T a where
+We don't need to eta-expand `g` to make `f g` typecheck; a cast
+suffices.  Hence we should not perform representation-polymorphism
+checks; they would fail here.
 
-      test :: forall r (a :: TYPE r). a :~~: T a -> ()
-      test HRefl =
-        let
-          f :: (a -> a) -> ()
-          f _ = ()
-          g :: T a -> T a
-          g = undefined
-        in f g
-
-    We don't need to eta-expand `g` to make `f g` typecheck; a cast suffices.
-    Hence we should not perform representation-polymorphism checks; they would
-    fail here.
+All this is done by `mkWpFun_FRR`, which checks for the cast/cast case and
+returns a `FunCo` if so.
 
 Note [Setting the argument context]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -1947,7 +1950,7 @@ getDeepSubsumptionFlag = do { ds <- xoptM LangExt.DeepSubsumption
 -- | 'tc_sub_type_deep' is where the actual work happens for deep subsumption.
 --
 -- Given @ty_actual@ (a sigma-type) and @ty_expected@ (deeply skolemised, i.e.
--- a deep rho type), it returns an 'HsWrapper' @wrap :: ty_actual ~> ty_expected@.
+-- a deep rho type), it returns an 'HsWrapper' @wrap :: ty_actual ~~> ty_expected@.
 tc_sub_type_deep :: HasDebugCallStack
                  => Position p     -- ^ Position in the type (for error messages only)
                  -> (TcType -> TcType -> TcM TcCoercionN) -- ^ How to unify
@@ -1958,7 +1961,7 @@ tc_sub_type_deep :: HasDebugCallStack
                  -> TcM HsWrapper
 
 -- If wrap = tc_sub_type_deep t1 t2
---    => wrap :: t1 ~> t2
+--    => wrap :: t1 ~~> t2
 -- Here is where the work actually happens!
 -- Precondition: ty_expected is deeply skolemised
 
@@ -2015,8 +2018,8 @@ tc_sub_type_deep pos unify inst_orig ctxt ty_actual ty_expected
            ; unify_wrap <- just_unify exp_funTy ty_e
            ; fun_wrap <- go_fun af1 act_mult act_arg act_res af1 exp_mult exp_arg exp_res
            ; return $ unify_wrap <.> fun_wrap
-             -- unify_wrap :: exp_funTy ~> ty_e
-             -- fun_wrap :: ty_a ~> exp_funTy
+             -- unify_wrap :: exp_funTy ~~> ty_e
+             -- fun_wrap :: ty_a ~~> exp_funTy
            }
     go1 ty_a (FunTy { ft_af = af2, ft_mult = exp_mult, ft_arg = exp_arg, ft_res = exp_res })
       | isVisibleFunArg af2
@@ -2028,8 +2031,8 @@ tc_sub_type_deep pos unify inst_orig ctxt ty_actual ty_expected
            ; unify_wrap <- just_unify ty_a act_funTy
            ; fun_wrap <- go_fun af2 act_mult act_arg act_res af2 exp_mult exp_arg exp_res
            ; return $ fun_wrap <.> unify_wrap
-             -- unify_wrap :: ty_a ~> act_funTy
-             -- fun_wrap :: act_funTy ~> ty_e
+             -- unify_wrap :: ty_a ~~> act_funTy
+             -- fun_wrap :: act_funTy ~~> ty_e
            }
 
     -- Otherwise, revert to unification.
@@ -2064,17 +2067,28 @@ mkWpFun_FRR
   -> Position p
   -> FunTyFlag -> Type -> TcType -> Type --   actual FunTy
   -> FunTyFlag -> Type -> TcType -> Type -- expected FunTy
-  -> HsWrapper -- ^ exp_arg ~> act_arg
-  -> HsWrapper -- ^ act_res ~> exp_res
-  -> TcM HsWrapper -- ^ act_funTy ~> exp_funTy
+  -> HsWrapper -- ^ exp_arg ~~> act_arg
+  -> HsWrapper -- ^ act_res ~~> exp_res
+  -> TcM HsWrapper -- ^ (act_arg->act_res) ~~> (exp_arg->exp_res)
 mkWpFun_FRR unify pos act_af act_mult act_arg act_res exp_af exp_mult exp_arg exp_res arg_wrap res_wrap
-  = do { ((exp_arg_co, exp_arg_frr), (act_arg_co, _act_arg_frr)) <-
-            if needs_frr_checks
-              -- See Wrinkle [Representation-polymorphism checking during subtyping]
-            then do { exp_frr_wrap <- hasFixedRuntimeRep (frr_ctxt True ) exp_arg
-                    ; act_frr_wrap <- hasFixedRuntimeRep (frr_ctxt False) act_arg
-                    ; return (exp_frr_wrap, act_frr_wrap) }
-            else return ((mkNomReflCo exp_arg, exp_arg), (mkNomReflCo act_arg, act_arg))
+  | Just arg_co <- getWpCo_maybe arg_wrap act_arg   -- arg_co :: exp_arg ~R# act_arg
+  , Just res_co <- getWpCo_maybe res_wrap act_res   -- res_co :: act_res ~R# exp_res
+  = -- The argument and result wrappers are both hole or cast;
+    -- so we can make do with a FunCo
+    -- See Note [Representation-polymorphism checking during subtyping]
+    do { mult_co <- unify act_mult exp_mult
+       ; let the_co = mkFunCo2 Representational act_af exp_af mult_co (mkSymCo arg_co) res_co
+       ; return (mkWpCastR the_co) }
+
+  | otherwise
+  = -- We need a full WpFun, with the eta-expansion that it entails
+    -- And hence we must add fixed-runtime-rep checks so that the eta-expansion is OK
+    -- See Note [Representation-polymorphism checking during subtyping]
+    do { (exp_arg_co, exp_arg_frr)  <- hasFixedRuntimeRep (frr_ctxt True ) exp_arg
+       ; (act_arg_co, _act_arg_frr) <- hasFixedRuntimeRep (frr_ctxt False) act_arg
+       -- exp_arg_frr, act_arg_frr :: Type   have fixed runtime-reps
+       -- exp_arg_co :: exp_arg ~ exp_arg_frr      Usually Refl
+       -- act_arg_co :: act_arg ~ act_arg_frr      Usually Refl
 
          -- Enforce equality of multiplicities (not the more natural sub-multiplicity).
          -- See Note [Multiplicity in deep subsumption]
@@ -2083,46 +2097,36 @@ mkWpFun_FRR unify pos act_af act_mult act_arg act_res exp_af exp_mult exp_arg ex
            -- equality to be Refl, but it might well not be (#26332).
 
        ; let
-            exp_arg_fun_co =
+            exp_arg_fun_co =  -- (exp_arg_frr -> exp_res) ~ (exp_arg -> exp_res)
               mkFunCo Nominal exp_af
-                 (mkReflCo Nominal exp_mult)
+                 (mkNomReflCo exp_mult)
                  (mkSymCo exp_arg_co)
-                 (mkReflCo Nominal exp_res)
-            act_arg_fun_co =
+                 (mkNomReflCo exp_res)
+            act_arg_fun_co =  -- (act_arg -> act_res) ~ (act_arg_frr -> act_res)
               mkFunCo Nominal act_af
                  act_arg_mult_co
                  act_arg_co
-                 (mkReflCo Nominal act_res)
-            arg_wrap_frr =
+                 (mkNomReflCo act_res)
+            arg_wrap_frr =    -- exp_arg_frr ~~> act_arg_frr
               mkWpCastN (mkSymCo exp_arg_co) <.> arg_wrap <.> mkWpCastN act_arg_co
-               --  exp_arg_co :: exp_arg ~> exp_arg_frr
-               --  act_arg_co :: act_arg ~> act_arg_frr
-               --  arg_wrap :: exp_arg ~> act_arg
-               --  arg_wrap_frr :: exp_arg_frr ~> act_arg_frr
 
-       ; return $
-            mkWpCastN exp_arg_fun_co
+       ; return $   -- Whole thing :: (act_arg->act_res) ~~> (exp_arg->exp_ress)
+            mkWpCastN exp_arg_fun_co   -- (exp_ar_frr->exp_res) ~~> (exp_arg->exp_res)
               <.>
             mkWpFun arg_wrap_frr res_wrap (Scaled exp_mult exp_arg_frr) exp_res
-              <.>
-            mkWpCastN act_arg_fun_co
+              <.>                       -- (act_arg_frr->act_res) ~~> (exp_arg_frr->exp_res)
+            mkWpCastN act_arg_fun_co    -- (act_arg->act_res) ~~> (act_arg_frr->act_res)
        }
   where
-    needs_frr_checks :: Bool
-    needs_frr_checks =
-      not (hole_or_cast arg_wrap)
-        ||
-      not (hole_or_cast res_wrap)
-    hole_or_cast :: HsWrapper -> Bool
-    hole_or_cast WpHole = True
-    hole_or_cast (WpCast {}) = True
-    hole_or_cast _ = False
+    getWpCo_maybe :: HsWrapper -> Type -> Maybe CoercionR
+    -- See if a HsWrapper is just a coercion
+    getWpCo_maybe WpHole      ty = Just (mkRepReflCo ty)
+    getWpCo_maybe (WpCast co) _  = Just co
+    getWpCo_maybe _           _  = Nothing
+
     frr_ctxt :: Bool -> FixedRuntimeRepContext
-    frr_ctxt is_exp_ty =
-      FRRDeepSubsumption
-        { frrDSExpected = is_exp_ty
-        , frrDSPosition = pos
-        }
+    frr_ctxt is_exp_ty = FRRDeepSubsumption { frrDSExpected = is_exp_ty
+                                            , frrDSPosition = pos }
 
 -----------------------
 deeplySkolemise :: SkolemInfo -> TcSigmaType
@@ -2146,9 +2150,9 @@ deeplySkolemise skol_info ty
            ; let tvs     = binderVars bndrs
                  tvs1    = binderVars bndrs1
                  tv_prs1 = map tyVarName tvs `zip` bndrs1
-           ; return ( mkWpEta ids1 (mkWpTyLams tvs1
-                                    <.> mkWpEvLams ev_vars1
-                                    <.> wrap)
+           ; return ( mkWpEta ty ids1 (mkWpTyLams tvs1
+                                      <.> mkWpEvLams ev_vars1
+                                      <.> wrap)
                     , tv_prs1  ++ tvs_prs2
                     , ev_vars1 ++ ev_vars2
                     , mkScaledFunTys arg_tys' rho ) }
@@ -2182,7 +2186,7 @@ deeplyInstantiate orig ty
            ; ids1  <- newSysLocalIds (fsLit "di") arg_tys'
            ; wrap1 <- instCall orig (mkTyVarTys tvs') theta'
            ; (wrap2, rho2) <- go subst' rho
-           ; return (mkWpEta ids1 (wrap2 <.> wrap1),
+           ; return (mkWpEta ty ids1 (wrap2 <.> wrap1),
                      mkScaledFunTys arg_tys' rho2) }
 
       | otherwise


=====================================
compiler/GHC/Tc/Zonk/Type.hs
=====================================
@@ -1233,13 +1233,16 @@ zonk_cmd_top (HsCmdTop (CmdTopTc stack_tys ty ids) cmd)
 -------------------------------------------------------------------------
 zonkCoFn :: HsWrapper -> ZonkBndrTcM HsWrapper
 zonkCoFn WpHole   = return WpHole
+zonkCoFn (WpSubType w)     = do { w' <- zonkCoFn w
+                                ; return (WpSubType w') }
 zonkCoFn (WpCompose c1 c2) = do { c1' <- zonkCoFn c1
                                 ; c2' <- zonkCoFn c2
                                 ; return (WpCompose c1' c2') }
-zonkCoFn (WpFun c1 c2 t1)  = do { c1' <- zonkCoFn c1
-                                ; c2' <- zonkCoFn c2
-                                ; t1' <- noBinders $ zonkScaledTcTypeToTypeX t1
-                                ; return (WpFun c1' c2' t1') }
+zonkCoFn (WpFun c1 c2 t1 t2) = do { c1' <- zonkCoFn c1
+                                  ; c2' <- zonkCoFn c2
+                                  ; t1' <- noBinders $ zonkScaledTcTypeToTypeX t1
+                                  ; t2' <- noBinders $ zonkTcTypeToTypeX t2
+                                  ; return (WpFun c1' c2' t1' t2') }
 zonkCoFn (WpCast co)   = WpCast  <$> noBinders (zonkCoToCo co)
 zonkCoFn (WpEvLam ev)  = WpEvLam <$> zonkEvBndrX ev
 zonkCoFn (WpEvApp arg) = WpEvApp <$> noBinders (zonkEvTerm arg)


=====================================
compiler/Setup.hs
=====================================
@@ -1,4 +1,5 @@
 {-# LANGUAGE NamedFieldPuns #-}
+{-# LANGUAGE CPP #-}
 module Main where
 
 import Distribution.Simple
@@ -12,6 +13,8 @@ import Distribution.Simple.Program
 import Distribution.Simple.Utils
 import Distribution.Simple.Setup
 import Distribution.Simple.PackageIndex
+import qualified Distribution.Simple.LocalBuildInfo as LBI
+
 
 import System.IO
 import System.Process
@@ -59,8 +62,9 @@ primopIncls =
 ghcAutogen :: Verbosity -> LocalBuildInfo -> IO ()
 ghcAutogen verbosity lbi@LocalBuildInfo{pkgDescrFile,withPrograms,componentNameMap,installedPkgs}
   = do
+  let i = LBI.interpretSymbolicPathLBI lbi
   -- Get compiler/ root directory from the cabal file
-  let Just compilerRoot = takeDirectory <$> pkgDescrFile
+  let Just compilerRoot = takeDirectory . i <$> pkgDescrFile
 
   -- Require the necessary programs
   (gcc   ,withPrograms) <- requireProgram normal gccProgram withPrograms
@@ -80,15 +84,19 @@ ghcAutogen verbosity lbi@LocalBuildInfo{pkgDescrFile,withPrograms,componentNameM
   -- Call genprimopcode to generate *.hs-incl
   forM_ primopIncls $ \(file,command) -> do
     contents <- readProcess "genprimopcode" [command] primopsStr
-    rewriteFileEx verbosity (buildDir lbi  file) contents
+    rewriteFileEx verbosity (i (buildDir lbi)  file) contents
 
   -- Write GHC.Platform.Constants
-  let platformConstantsPath = autogenPackageModulesDir lbi  "GHC/Platform/Constants.hs"
+  let platformConstantsPath = i (autogenPackageModulesDir lbi)  "GHC/Platform/Constants.hs"
       targetOS = case lookup "target os" settings of
         Nothing -> error "no target os in settings"
         Just os -> os
   createDirectoryIfMissingVerbose verbosity True (takeDirectory platformConstantsPath)
+#if MIN_VERSION_Cabal(3,14,0)
+  withTempFile "Constants_tmp.hs" $ \tmp h -> do
+#else
   withTempFile (takeDirectory platformConstantsPath) "Constants_tmp.hs" $ \tmp h -> do
+#endif
     hClose h
     callProcess "deriveConstants" ["--gen-haskell-type","-o",tmp,"--target-os",targetOS]
     renameFile tmp platformConstantsPath
@@ -103,7 +111,7 @@ ghcAutogen verbosity lbi@LocalBuildInfo{pkgDescrFile,withPrograms,componentNameM
         _ -> error "Couldn't find unique ghc-internal library when building ghc"
 
   -- Write GHC.Settings.Config
-      configHsPath = autogenPackageModulesDir lbi  "GHC/Settings/Config.hs"
+      configHsPath = i (autogenPackageModulesDir lbi)  "GHC/Settings/Config.hs"
       configHs = generateConfigHs cProjectUnitId cGhcInternalUnitId settings
   createDirectoryIfMissingVerbose verbosity True (takeDirectory configHsPath)
   rewriteFileEx verbosity configHsPath configHs


=====================================
compiler/ghc.cabal.in
=====================================
@@ -50,7 +50,7 @@ extra-source-files:
 
 
 custom-setup
-    setup-depends: base >= 3 && < 5, Cabal >= 1.6 && <3.14, directory, process, filepath, containers
+    setup-depends: base >= 3 && < 5, Cabal >= 1.6 && <3.16, directory, process, filepath, containers
 
 Flag internal-interpreter
     Description: Build with internal interpreter support.


=====================================
libraries/ghc-boot/Setup.hs
=====================================
@@ -10,6 +10,7 @@ import Distribution.Verbosity
 import Distribution.Simple.Program
 import Distribution.Simple.Utils
 import Distribution.Simple.Setup
+import qualified Distribution.Simple.LocalBuildInfo as LBI
 
 import System.IO
 import System.Directory
@@ -32,12 +33,13 @@ main = defaultMainWithHooks ghcHooks
 ghcAutogen :: Verbosity -> LocalBuildInfo -> IO ()
 ghcAutogen verbosity lbi@LocalBuildInfo{..} = do
   -- Get compiler/ root directory from the cabal file
-  let Just compilerRoot = takeDirectory <$> pkgDescrFile
+  let Just compilerRoot = takeDirectory . i <$> pkgDescrFile
 
-  let platformHostFile = "GHC/Platform/Host.hs"
-      platformHostPath = autogenPackageModulesDir lbi  platformHostFile
+      i = LBI.interpretSymbolicPathLBI lbi
+      platformHostFile = "GHC/Platform/Host.hs"
+      platformHostPath = i (autogenPackageModulesDir lbi)  platformHostFile
       ghcVersionFile = "GHC/Version.hs"
-      ghcVersionPath = autogenPackageModulesDir lbi  ghcVersionFile
+      ghcVersionPath = i (autogenPackageModulesDir lbi)  ghcVersionFile
 
   -- Get compiler settings
   settings <- lookupEnv "HADRIAN_SETTINGS" >>= \case


=====================================
libraries/ghc-boot/ghc-boot.cabal.in
=====================================
@@ -28,7 +28,7 @@ build-type:     Custom
 extra-source-files: changelog.md
 
 custom-setup
-    setup-depends: base >= 3 && < 5, Cabal >= 1.6 && <3.14, directory, filepath
+    setup-depends: base >= 3 && < 5, Cabal >= 1.6 && <3.16, directory, filepath
 
 source-repository head
     type:     git


=====================================
rts/Interpreter.c
=====================================
@@ -91,6 +91,80 @@ See also Note [Width of parameters] for some more motivation.
 
 /* #define INTERP_STATS */
 
+// Note [Instruction dispatch in the bytecode interpreter]
+// ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+// Like all bytecode interpreters out there, instruction dispatch is
+// the backbone of our bytecode interpreter:
+//
+// - Each instruction starts with a unique integer tag
+// - Each instruction has a piece of code to handle it
+// - Fetch next instruction's tag, interpret, repeat
+//
+// There are two classical approaches to organize the interpreter loop
+// and implement instruction dispatch:
+//
+// 1. switch-case: fetch the instruction tag, then a switch statement
+//    contains each instruction's handler code as a case within it.
+//    This is the simplest and most portable approach, but the
+//    compiler often generates suboptimal code that involves two jumps
+//    per instruction: the first one that jumps back to the switch
+//    statement, followed by the second one that jumps to the handler
+//    case statement.
+// 2. computed-goto (direct threaded code): GNU C has an extension
+//    (https://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html) that
+//    allows storing a code label as a pointer and using the goto
+//    statement to jump to such a pointer. So we can organize the
+//    handler code as a code block under a label, have a pointer array
+//    that maps an instruction tag to its handler's code label, then
+//    instruction dispatch can happen with a single jump after a
+//    memory load.
+//
+// A classical paper "The Structure and Performance of Efficient
+// Interpreters" by M. Anton Ertl and David Gregg in 2003 explains it
+// in further details with profiling data:
+// https://jilp.org/vol5/v5paper12.pdf. There exist more subtle issues
+// like interaction with modern CPU's branch predictors, though in
+// practice computed-goto does outperform switch-case, and I've
+// observed around 10%-15% wall clock time speedup in simple
+// benchmarks, so our bytecode interpreter now defaults to using
+// computed-goto when applicable, and falls back to switch-case in
+// other cases.
+//
+// The COMPUTED_GOTO macro is defined when we use computed-goto. We
+// don't do autoconf feature detection since it works with all
+// versions of gcc/clang on all platforms we currently support.
+// Exceptions include:
+//
+// - When DEBUG or other macros are enabled so that there's extra
+//   logic per instruction: assertions, statistics, etc. To make
+//   computed-goto support those would need us to duplicate the extra
+//   code in every instruction's handler code block, not really worth
+//   it when speed is not the primary concern.
+// - On wasm, because wasm prohibits goto anyway and LLVM has to lower
+//   goto in C to br_table, so there's no performance benefit of
+//   computed-goto, only slight penalty due to an extra load from the
+//   user-defined dispatch table in the linear memory.
+//
+// The source of truth for our bytecode definition is
+// rts/include/rts/Bytecodes.h. For each bytecode `#define bci_FOO
+// tag`, we have jumptable[tag] which stores the 32-bit offset
+// `&&lbl_bci_FOO - &&lbl_bci_DEFAULT`, so the goto destination can
+// always be computed by adding the jumptable[tag] offset to the base
+// address `&&lbl_bci_DEFAULT`. Whenever you change the bytecode
+// definitions, always remember to update `jumptable` as well!
+
+#if !defined(DEBUG) && !defined(ASSERTS_ENABLED) && !defined(INTERP_STATS) && !defined(wasm32_HOST_ARCH)
+#define COMPUTED_GOTO
+#endif
+
+#if defined(COMPUTED_GOTO)
+#pragma GCC diagnostic ignored "-Wpointer-arith"
+#define INSTRUCTION(name) lbl_##name
+#define NEXT_INSTRUCTION goto *(&&lbl_bci_DEFAULT + jumptable[(bci = instrs[bciPtr++]) & 0xFF])
+#else
+#define INSTRUCTION(name) case name
+#define NEXT_INSTRUCTION goto nextInsn
+#endif
 
 /* Sp points to the lowest live word on the stack. */
 
@@ -1542,7 +1616,9 @@ run_BCO:
         it_lastopc = 0; /* no opcode */
 #endif
 
+#if !defined(COMPUTED_GOTO)
     nextInsn:
+#endif
         ASSERT(bciPtr < bcoSize);
         IF_DEBUG(interpreter,
                  //if (do_print_stack) {
@@ -1572,15 +1648,263 @@ run_BCO:
         it_lastopc = (int)instrs[bciPtr];
 #endif
 
-        bci = BCO_NEXT;
+#if defined(COMPUTED_GOTO)
+        static const int32_t jumptable[] = {
+            &&lbl_bci_DEFAULT - &&lbl_bci_DEFAULT,
+            &&lbl_bci_STKCHECK - &&lbl_bci_DEFAULT,
+            &&lbl_bci_PUSH_L - &&lbl_bci_DEFAULT,
+            &&lbl_bci_PUSH_LL - &&lbl_bci_DEFAULT,
+            &&lbl_bci_PUSH_LLL - &&lbl_bci_DEFAULT,
+            &&lbl_bci_PUSH8 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_PUSH16 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_PUSH32 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_PUSH8_W - &&lbl_bci_DEFAULT,
+            &&lbl_bci_PUSH16_W - &&lbl_bci_DEFAULT,
+            &&lbl_bci_PUSH32_W - &&lbl_bci_DEFAULT,
+            &&lbl_bci_PUSH_G - &&lbl_bci_DEFAULT,
+            &&lbl_bci_DEFAULT - &&lbl_bci_DEFAULT,
+            &&lbl_bci_PUSH_ALTS_P - &&lbl_bci_DEFAULT,
+            &&lbl_bci_PUSH_ALTS_N - &&lbl_bci_DEFAULT,
+            &&lbl_bci_PUSH_ALTS_F - &&lbl_bci_DEFAULT,
+            &&lbl_bci_PUSH_ALTS_D - &&lbl_bci_DEFAULT,
+            &&lbl_bci_PUSH_ALTS_L - &&lbl_bci_DEFAULT,
+            &&lbl_bci_PUSH_ALTS_V - &&lbl_bci_DEFAULT,
+            &&lbl_bci_PUSH_PAD8 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_PUSH_PAD16 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_PUSH_PAD32 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_PUSH_UBX8 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_PUSH_UBX16 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_PUSH_UBX32 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_PUSH_UBX - &&lbl_bci_DEFAULT,
+            &&lbl_bci_PUSH_APPLY_N - &&lbl_bci_DEFAULT,
+            &&lbl_bci_PUSH_APPLY_F - &&lbl_bci_DEFAULT,
+            &&lbl_bci_PUSH_APPLY_D - &&lbl_bci_DEFAULT,
+            &&lbl_bci_PUSH_APPLY_L - &&lbl_bci_DEFAULT,
+            &&lbl_bci_PUSH_APPLY_V - &&lbl_bci_DEFAULT,
+            &&lbl_bci_PUSH_APPLY_P - &&lbl_bci_DEFAULT,
+            &&lbl_bci_PUSH_APPLY_PP - &&lbl_bci_DEFAULT,
+            &&lbl_bci_PUSH_APPLY_PPP - &&lbl_bci_DEFAULT,
+            &&lbl_bci_PUSH_APPLY_PPPP - &&lbl_bci_DEFAULT,
+            &&lbl_bci_PUSH_APPLY_PPPPP - &&lbl_bci_DEFAULT,
+            &&lbl_bci_PUSH_APPLY_PPPPPP - &&lbl_bci_DEFAULT,
+            &&lbl_bci_DEFAULT - &&lbl_bci_DEFAULT,
+            &&lbl_bci_SLIDE - &&lbl_bci_DEFAULT,
+            &&lbl_bci_ALLOC_AP - &&lbl_bci_DEFAULT,
+            &&lbl_bci_ALLOC_AP_NOUPD - &&lbl_bci_DEFAULT,
+            &&lbl_bci_ALLOC_PAP - &&lbl_bci_DEFAULT,
+            &&lbl_bci_MKAP - &&lbl_bci_DEFAULT,
+            &&lbl_bci_MKPAP - &&lbl_bci_DEFAULT,
+            &&lbl_bci_UNPACK - &&lbl_bci_DEFAULT,
+            &&lbl_bci_PACK - &&lbl_bci_DEFAULT,
+            &&lbl_bci_TESTLT_I - &&lbl_bci_DEFAULT,
+            &&lbl_bci_TESTEQ_I - &&lbl_bci_DEFAULT,
+            &&lbl_bci_TESTLT_F - &&lbl_bci_DEFAULT,
+            &&lbl_bci_TESTEQ_F - &&lbl_bci_DEFAULT,
+            &&lbl_bci_TESTLT_D - &&lbl_bci_DEFAULT,
+            &&lbl_bci_TESTEQ_D - &&lbl_bci_DEFAULT,
+            &&lbl_bci_TESTLT_P - &&lbl_bci_DEFAULT,
+            &&lbl_bci_TESTEQ_P - &&lbl_bci_DEFAULT,
+            &&lbl_bci_CASEFAIL - &&lbl_bci_DEFAULT,
+            &&lbl_bci_JMP - &&lbl_bci_DEFAULT,
+            &&lbl_bci_CCALL - &&lbl_bci_DEFAULT,
+            &&lbl_bci_SWIZZLE - &&lbl_bci_DEFAULT,
+            &&lbl_bci_ENTER - &&lbl_bci_DEFAULT,
+            &&lbl_bci_DEFAULT - &&lbl_bci_DEFAULT,
+            &&lbl_bci_RETURN_P - &&lbl_bci_DEFAULT,
+            &&lbl_bci_RETURN_N - &&lbl_bci_DEFAULT,
+            &&lbl_bci_RETURN_F - &&lbl_bci_DEFAULT,
+            &&lbl_bci_RETURN_D - &&lbl_bci_DEFAULT,
+            &&lbl_bci_RETURN_L - &&lbl_bci_DEFAULT,
+            &&lbl_bci_RETURN_V - &&lbl_bci_DEFAULT,
+            &&lbl_bci_BRK_FUN - &&lbl_bci_DEFAULT,
+            &&lbl_bci_TESTLT_W - &&lbl_bci_DEFAULT,
+            &&lbl_bci_TESTEQ_W - &&lbl_bci_DEFAULT,
+            &&lbl_bci_RETURN_T - &&lbl_bci_DEFAULT,
+            &&lbl_bci_PUSH_ALTS_T - &&lbl_bci_DEFAULT,
+            &&lbl_bci_TESTLT_I64 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_TESTEQ_I64 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_TESTLT_I32 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_TESTEQ_I32 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_TESTLT_I16 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_TESTEQ_I16 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_TESTLT_I8 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_TESTEQ_I8 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_TESTLT_W64 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_TESTEQ_W64 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_TESTLT_W32 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_TESTEQ_W32 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_TESTLT_W16 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_TESTEQ_W16 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_TESTLT_W8 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_TESTEQ_W8 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_PRIMCALL - &&lbl_bci_DEFAULT,
+            &&lbl_bci_BCO_NAME - &&lbl_bci_DEFAULT,
+            &&lbl_bci_DEFAULT - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_ADD_64 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_SUB_64 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_AND_64 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_XOR_64 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_NOT_64 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_NEG_64 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_MUL_64 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_SHL_64 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_ASR_64 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_LSR_64 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_OR_64 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_DEFAULT - &&lbl_bci_DEFAULT,
+            &&lbl_bci_DEFAULT - &&lbl_bci_DEFAULT,
+            &&lbl_bci_DEFAULT - &&lbl_bci_DEFAULT,
+            &&lbl_bci_DEFAULT - &&lbl_bci_DEFAULT,
+            &&lbl_bci_DEFAULT - &&lbl_bci_DEFAULT,
+            &&lbl_bci_DEFAULT - &&lbl_bci_DEFAULT,
+            &&lbl_bci_DEFAULT - &&lbl_bci_DEFAULT,
+            &&lbl_bci_DEFAULT - &&lbl_bci_DEFAULT,
+            &&lbl_bci_DEFAULT - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_NEQ_64 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_EQ_64 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_U_GE_64 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_U_GT_64 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_U_LT_64 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_U_LE_64 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_S_GE_64 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_S_GT_64 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_S_LT_64 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_S_LE_64 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_DEFAULT - &&lbl_bci_DEFAULT,
+            &&lbl_bci_DEFAULT - &&lbl_bci_DEFAULT,
+            &&lbl_bci_DEFAULT - &&lbl_bci_DEFAULT,
+            &&lbl_bci_DEFAULT - &&lbl_bci_DEFAULT,
+            &&lbl_bci_DEFAULT - &&lbl_bci_DEFAULT,
+            &&lbl_bci_DEFAULT - &&lbl_bci_DEFAULT,
+            &&lbl_bci_DEFAULT - &&lbl_bci_DEFAULT,
+            &&lbl_bci_DEFAULT - &&lbl_bci_DEFAULT,
+            &&lbl_bci_DEFAULT - &&lbl_bci_DEFAULT,
+            &&lbl_bci_DEFAULT - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_ADD_32 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_SUB_32 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_AND_32 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_XOR_32 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_NOT_32 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_NEG_32 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_MUL_32 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_SHL_32 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_ASR_32 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_LSR_32 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_OR_32 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_DEFAULT - &&lbl_bci_DEFAULT,
+            &&lbl_bci_DEFAULT - &&lbl_bci_DEFAULT,
+            &&lbl_bci_DEFAULT - &&lbl_bci_DEFAULT,
+            &&lbl_bci_DEFAULT - &&lbl_bci_DEFAULT,
+            &&lbl_bci_DEFAULT - &&lbl_bci_DEFAULT,
+            &&lbl_bci_DEFAULT - &&lbl_bci_DEFAULT,
+            &&lbl_bci_DEFAULT - &&lbl_bci_DEFAULT,
+            &&lbl_bci_DEFAULT - &&lbl_bci_DEFAULT,
+            &&lbl_bci_DEFAULT - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_NEQ_32 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_EQ_32 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_U_GE_32 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_U_GT_32 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_U_LT_32 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_U_LE_32 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_S_GE_32 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_S_GT_32 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_S_LT_32 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_S_LE_32 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_DEFAULT - &&lbl_bci_DEFAULT,
+            &&lbl_bci_DEFAULT - &&lbl_bci_DEFAULT,
+            &&lbl_bci_DEFAULT - &&lbl_bci_DEFAULT,
+            &&lbl_bci_DEFAULT - &&lbl_bci_DEFAULT,
+            &&lbl_bci_DEFAULT - &&lbl_bci_DEFAULT,
+            &&lbl_bci_DEFAULT - &&lbl_bci_DEFAULT,
+            &&lbl_bci_DEFAULT - &&lbl_bci_DEFAULT,
+            &&lbl_bci_DEFAULT - &&lbl_bci_DEFAULT,
+            &&lbl_bci_DEFAULT - &&lbl_bci_DEFAULT,
+            &&lbl_bci_DEFAULT - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_ADD_16 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_SUB_16 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_AND_16 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_XOR_16 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_NOT_16 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_NEG_16 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_MUL_16 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_SHL_16 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_ASR_16 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_LSR_16 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_OR_16 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_DEFAULT - &&lbl_bci_DEFAULT,
+            &&lbl_bci_DEFAULT - &&lbl_bci_DEFAULT,
+            &&lbl_bci_DEFAULT - &&lbl_bci_DEFAULT,
+            &&lbl_bci_DEFAULT - &&lbl_bci_DEFAULT,
+            &&lbl_bci_DEFAULT - &&lbl_bci_DEFAULT,
+            &&lbl_bci_DEFAULT - &&lbl_bci_DEFAULT,
+            &&lbl_bci_DEFAULT - &&lbl_bci_DEFAULT,
+            &&lbl_bci_DEFAULT - &&lbl_bci_DEFAULT,
+            &&lbl_bci_DEFAULT - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_NEQ_16 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_EQ_16 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_U_GE_16 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_U_GT_16 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_U_LT_16 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_U_LE_16 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_S_GE_16 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_S_GT_16 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_S_LT_16 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_S_LE_16 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_ADD_08 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_SUB_08 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_AND_08 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_XOR_08 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_NOT_08 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_NEG_08 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_MUL_08 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_SHL_08 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_ASR_08 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_LSR_08 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_OR_08 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_DEFAULT - &&lbl_bci_DEFAULT,
+            &&lbl_bci_DEFAULT - &&lbl_bci_DEFAULT,
+            &&lbl_bci_DEFAULT - &&lbl_bci_DEFAULT,
+            &&lbl_bci_DEFAULT - &&lbl_bci_DEFAULT,
+            &&lbl_bci_DEFAULT - &&lbl_bci_DEFAULT,
+            &&lbl_bci_DEFAULT - &&lbl_bci_DEFAULT,
+            &&lbl_bci_DEFAULT - &&lbl_bci_DEFAULT,
+            &&lbl_bci_DEFAULT - &&lbl_bci_DEFAULT,
+            &&lbl_bci_DEFAULT - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_NEQ_08 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_EQ_08 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_U_GE_08 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_U_GT_08 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_U_LT_08 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_U_LE_08 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_S_GE_08 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_S_GT_08 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_S_LT_08 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_S_LE_08 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_DEFAULT - &&lbl_bci_DEFAULT,
+            &&lbl_bci_DEFAULT - &&lbl_bci_DEFAULT,
+            &&lbl_bci_DEFAULT - &&lbl_bci_DEFAULT,
+            &&lbl_bci_DEFAULT - &&lbl_bci_DEFAULT,
+            &&lbl_bci_DEFAULT - &&lbl_bci_DEFAULT,
+            &&lbl_bci_DEFAULT - &&lbl_bci_DEFAULT,
+            &&lbl_bci_DEFAULT - &&lbl_bci_DEFAULT,
+            &&lbl_bci_DEFAULT - &&lbl_bci_DEFAULT,
+            &&lbl_bci_DEFAULT - &&lbl_bci_DEFAULT,
+            &&lbl_bci_DEFAULT - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_INDEX_ADDR_08 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_INDEX_ADDR_16 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_INDEX_ADDR_32 - &&lbl_bci_DEFAULT,
+            &&lbl_bci_OP_INDEX_ADDR_64 - &&lbl_bci_DEFAULT};
+        NEXT_INSTRUCTION;
+#else
+    bci = BCO_NEXT;
     /* We use the high 8 bits for flags. The highest of which is
      * currently allocated to LARGE_ARGS */
     ASSERT((bci & 0xFF00) == (bci & ( bci_FLAG_LARGE_ARGS )));
-
     switch (bci & 0xFF) {
+#endif
 
         /* check for a breakpoint on the beginning of a BCO */
-        case bci_BRK_FUN:
+        INSTRUCTION(bci_BRK_FUN):
         {
             W_ arg1_brk_array, arg2_info_mod_name, arg3_info_mod_id, arg4_info_index;
 #if defined(PROFILING)
@@ -1779,10 +2103,10 @@ run_BCO:
             cap->r.rCurrentTSO->flags &= ~TSO_STOPPED_ON_BREAKPOINT;
 
             // continue normal execution of the byte code instructions
-            goto nextInsn;
+            NEXT_INSTRUCTION;
         }
 
-        case bci_STKCHECK: {
+        INSTRUCTION(bci_STKCHECK): {
             // Explicit stack check at the beginning of a function
             // *only* (stack checks in case alternatives are
             // propagated to the enclosing function).
@@ -1793,27 +2117,27 @@ run_BCO:
                 SpW(0) = (W_)&stg_apply_interp_info;
                 RETURN_TO_SCHEDULER(ThreadInterpret, StackOverflow);
             } else {
-                goto nextInsn;
+                NEXT_INSTRUCTION;
             }
         }
 
-        case bci_PUSH_L: {
+        INSTRUCTION(bci_PUSH_L): {
             W_ o1 = BCO_GET_LARGE_ARG;
             SpW(-1) = ReadSpW(o1);
             Sp_subW(1);
-            goto nextInsn;
+            NEXT_INSTRUCTION;
         }
 
-        case bci_PUSH_LL: {
+        INSTRUCTION(bci_PUSH_LL): {
             W_ o1 = BCO_GET_LARGE_ARG;
             W_ o2 = BCO_GET_LARGE_ARG;
             SpW(-1) = ReadSpW(o1);
             SpW(-2) = ReadSpW(o2);
             Sp_subW(2);
-            goto nextInsn;
+            NEXT_INSTRUCTION;
         }
 
-        case bci_PUSH_LLL: {
+        INSTRUCTION(bci_PUSH_LLL): {
             W_ o1 = BCO_GET_LARGE_ARG;
             W_ o2 = BCO_GET_LARGE_ARG;
             W_ o3 = BCO_GET_LARGE_ARG;
@@ -1821,52 +2145,52 @@ run_BCO:
             SpW(-2) = ReadSpW(o2);
             SpW(-3) = ReadSpW(o3);
             Sp_subW(3);
-            goto nextInsn;
+            NEXT_INSTRUCTION;
         }
 
-        case bci_PUSH8: {
+        INSTRUCTION(bci_PUSH8): {
             W_ off = BCO_GET_LARGE_ARG;
             Sp_subB(1);
             *(StgWord8*)Sp = (StgWord8) (ReadSpB(off+1));
-            goto nextInsn;
+            NEXT_INSTRUCTION;
         }
 
-        case bci_PUSH16: {
+        INSTRUCTION(bci_PUSH16): {
             W_ off = BCO_GET_LARGE_ARG;
             Sp_subB(2);
             *(StgWord16*)Sp = (StgWord16) (ReadSpB(off+2));
-            goto nextInsn;
+            NEXT_INSTRUCTION;
         }
 
-        case bci_PUSH32: {
+        INSTRUCTION(bci_PUSH32): {
             W_ off = BCO_GET_LARGE_ARG;
             Sp_subB(4);
             *(StgWord32*)Sp = (StgWord32) (ReadSpB(off+4));
-            goto nextInsn;
+            NEXT_INSTRUCTION;
         }
 
-        case bci_PUSH8_W: {
+        INSTRUCTION(bci_PUSH8_W): {
             W_ off = BCO_GET_LARGE_ARG;
             *(StgWord*)(Sp_minusW(1)) = (StgWord) ((StgWord8) (ReadSpB(off)));
             Sp_subW(1);
-            goto nextInsn;
+            NEXT_INSTRUCTION;
         }
 
-        case bci_PUSH16_W: {
+        INSTRUCTION(bci_PUSH16_W): {
             W_ off = BCO_GET_LARGE_ARG;
             *(StgWord*)(Sp_minusW(1)) = (StgWord) ((StgWord16) (ReadSpB(off)));
             Sp_subW(1);
-            goto nextInsn;
+            NEXT_INSTRUCTION;
         }
 
-        case bci_PUSH32_W: {
+        INSTRUCTION(bci_PUSH32_W): {
             W_ off = BCO_GET_LARGE_ARG;
             *(StgWord*)(Sp_minusW(1)) = (StgWord) ((StgWord32) (ReadSpB(off)));
             Sp_subW(1);
-            goto nextInsn;
+            NEXT_INSTRUCTION;
         }
 
-        case bci_PUSH_G: {
+        INSTRUCTION(bci_PUSH_G): {
             W_ o1 = BCO_GET_LARGE_ARG;
             StgClosure *tagged_obj = (StgClosure*) BCO_PTR(o1);
 
@@ -1905,10 +2229,10 @@ run_BCO:
 
             SpW(-1) = (W_) tagged_obj;
             Sp_subW(1);
-            goto nextInsn;
+            NEXT_INSTRUCTION;
         }
 
-        case bci_PUSH_ALTS_P: {
+        INSTRUCTION(bci_PUSH_ALTS_P): {
             W_ o_bco  = BCO_GET_LARGE_ARG;
             Sp_subW(2);
             SpW(1) = BCO_PTR(o_bco);
@@ -1918,10 +2242,10 @@ run_BCO:
             SpW(1) = (W_)cap->r.rCCCS;
             SpW(0) = (W_)&stg_restore_cccs_d_info;
 #endif
-            goto nextInsn;
+            NEXT_INSTRUCTION;
         }
 
-        case bci_PUSH_ALTS_N: {
+        INSTRUCTION(bci_PUSH_ALTS_N): {
             W_ o_bco  = BCO_GET_LARGE_ARG;
             SpW(-2) = (W_)&stg_ctoi_R1n_info;
             SpW(-1) = BCO_PTR(o_bco);
@@ -1931,10 +2255,10 @@ run_BCO:
             SpW(1) = (W_)cap->r.rCCCS;
             SpW(0) = (W_)&stg_restore_cccs_d_info;
 #endif
-            goto nextInsn;
+            NEXT_INSTRUCTION;
         }
 
-        case bci_PUSH_ALTS_F: {
+        INSTRUCTION(bci_PUSH_ALTS_F): {
             W_ o_bco  = BCO_GET_LARGE_ARG;
             SpW(-2) = (W_)&stg_ctoi_F1_info;
             SpW(-1) = BCO_PTR(o_bco);
@@ -1944,10 +2268,10 @@ run_BCO:
             SpW(1) = (W_)cap->r.rCCCS;
             SpW(0) = (W_)&stg_restore_cccs_d_info;
 #endif
-            goto nextInsn;
+            NEXT_INSTRUCTION;
         }
 
-        case bci_PUSH_ALTS_D: {
+        INSTRUCTION(bci_PUSH_ALTS_D): {
             W_ o_bco  = BCO_GET_LARGE_ARG;
             SpW(-2) = (W_)&stg_ctoi_D1_info;
             SpW(-1) = BCO_PTR(o_bco);
@@ -1957,10 +2281,10 @@ run_BCO:
             SpW(1) = (W_)cap->r.rCCCS;
             SpW(0) = (W_)&stg_restore_cccs_d_info;
 #endif
-            goto nextInsn;
+            NEXT_INSTRUCTION;
         }
 
-        case bci_PUSH_ALTS_L: {
+        INSTRUCTION(bci_PUSH_ALTS_L): {
             W_ o_bco  = BCO_GET_LARGE_ARG;
             SpW(-2) = (W_)&stg_ctoi_L1_info;
             SpW(-1) = BCO_PTR(o_bco);
@@ -1970,10 +2294,10 @@ run_BCO:
             SpW(1) = (W_)cap->r.rCCCS;
             SpW(0) = (W_)&stg_restore_cccs_d_info;
 #endif
-            goto nextInsn;
+            NEXT_INSTRUCTION;
         }
 
-        case bci_PUSH_ALTS_V: {
+        INSTRUCTION(bci_PUSH_ALTS_V): {
             W_ o_bco  = BCO_GET_LARGE_ARG;
             SpW(-2) = (W_)&stg_ctoi_V_info;
             SpW(-1) = BCO_PTR(o_bco);
@@ -1983,10 +2307,10 @@ run_BCO:
             SpW(1) = (W_)cap->r.rCCCS;
             SpW(0) = (W_)&stg_restore_cccs_d_info;
 #endif
-            goto nextInsn;
+            NEXT_INSTRUCTION;
         }
 
-        case bci_PUSH_ALTS_T: {
+        INSTRUCTION(bci_PUSH_ALTS_T): {
             W_ o_bco = BCO_GET_LARGE_ARG;
             W_ tuple_info = (W_)BCO_LIT(BCO_GET_LARGE_ARG);
             W_ o_tuple_bco = BCO_GET_LARGE_ARG;
@@ -2006,83 +2330,83 @@ run_BCO:
             W_ ctoi_t_offset = (W_) ctoi_tuple_infos[tuple_stack_words];
             SpW(-4) = ctoi_t_offset;
             Sp_subW(4);
-            goto nextInsn;
+            NEXT_INSTRUCTION;
         }
 
-        case bci_PUSH_APPLY_N:
+        INSTRUCTION(bci_PUSH_APPLY_N):
             Sp_subW(1); SpW(0) = (W_)&stg_ap_n_info;
-            goto nextInsn;
-        case bci_PUSH_APPLY_V:
+            NEXT_INSTRUCTION;
+        INSTRUCTION(bci_PUSH_APPLY_V):
             Sp_subW(1); SpW(0) = (W_)&stg_ap_v_info;
-            goto nextInsn;
-        case bci_PUSH_APPLY_F:
+            NEXT_INSTRUCTION;
+        INSTRUCTION(bci_PUSH_APPLY_F):
             Sp_subW(1); SpW(0) = (W_)&stg_ap_f_info;
-            goto nextInsn;
-        case bci_PUSH_APPLY_D:
+            NEXT_INSTRUCTION;
+        INSTRUCTION(bci_PUSH_APPLY_D):
             Sp_subW(1); SpW(0) = (W_)&stg_ap_d_info;
-            goto nextInsn;
-        case bci_PUSH_APPLY_L:
+            NEXT_INSTRUCTION;
+        INSTRUCTION(bci_PUSH_APPLY_L):
             Sp_subW(1); SpW(0) = (W_)&stg_ap_l_info;
-            goto nextInsn;
-        case bci_PUSH_APPLY_P:
+            NEXT_INSTRUCTION;
+        INSTRUCTION(bci_PUSH_APPLY_P):
             Sp_subW(1); SpW(0) = (W_)&stg_ap_p_info;
-            goto nextInsn;
-        case bci_PUSH_APPLY_PP:
+            NEXT_INSTRUCTION;
+        INSTRUCTION(bci_PUSH_APPLY_PP):
             Sp_subW(1); SpW(0) = (W_)&stg_ap_pp_info;
-            goto nextInsn;
-        case bci_PUSH_APPLY_PPP:
+            NEXT_INSTRUCTION;
+        INSTRUCTION(bci_PUSH_APPLY_PPP):
             Sp_subW(1); SpW(0) = (W_)&stg_ap_ppp_info;
-            goto nextInsn;
-        case bci_PUSH_APPLY_PPPP:
+            NEXT_INSTRUCTION;
+        INSTRUCTION(bci_PUSH_APPLY_PPPP):
             Sp_subW(1); SpW(0) = (W_)&stg_ap_pppp_info;
-            goto nextInsn;
-        case bci_PUSH_APPLY_PPPPP:
+            NEXT_INSTRUCTION;
+        INSTRUCTION(bci_PUSH_APPLY_PPPPP):
             Sp_subW(1); SpW(0) = (W_)&stg_ap_ppppp_info;
-            goto nextInsn;
-        case bci_PUSH_APPLY_PPPPPP:
+            NEXT_INSTRUCTION;
+        INSTRUCTION(bci_PUSH_APPLY_PPPPPP):
             Sp_subW(1); SpW(0) = (W_)&stg_ap_pppppp_info;
-            goto nextInsn;
+            NEXT_INSTRUCTION;
 
-        case bci_PUSH_PAD8: {
+        INSTRUCTION(bci_PUSH_PAD8): {
             Sp_subB(1);
             *(StgWord8*)Sp = 0;
-            goto nextInsn;
+            NEXT_INSTRUCTION;
         }
 
-        case bci_PUSH_PAD16: {
+        INSTRUCTION(bci_PUSH_PAD16): {
             Sp_subB(2);
             *(StgWord16*)Sp = 0;
-            goto nextInsn;
+            NEXT_INSTRUCTION;
         }
 
-        case bci_PUSH_PAD32: {
+        INSTRUCTION(bci_PUSH_PAD32): {
             Sp_subB(4);
             *(StgWord32*)Sp = 0;
-            goto nextInsn;
+            NEXT_INSTRUCTION;
         }
 
-        case bci_PUSH_UBX8: {
+        INSTRUCTION(bci_PUSH_UBX8): {
             W_ o_lit = BCO_GET_LARGE_ARG;
             Sp_subB(1);
             *(StgWord8*)Sp = (StgWord8) BCO_LIT(o_lit);
-            goto nextInsn;
+            NEXT_INSTRUCTION;
         }
 
-        case bci_PUSH_UBX16: {
+        INSTRUCTION(bci_PUSH_UBX16): {
             W_ o_lit = BCO_GET_LARGE_ARG;
             Sp_subB(2);
             *(StgWord16*)Sp = (StgWord16) BCO_LIT(o_lit);
-            goto nextInsn;
+            NEXT_INSTRUCTION;
         }
 
-        case bci_PUSH_UBX32: {
+        INSTRUCTION(bci_PUSH_UBX32): {
             W_ o_lit = BCO_GET_LARGE_ARG;
             Sp_subB(4);
             *(StgWord32*)Sp = (StgWord32) BCO_LIT(o_lit);
-            goto nextInsn;
+            NEXT_INSTRUCTION;
         }
 
-        case bci_PUSH_UBX: {
+        INSTRUCTION(bci_PUSH_UBX): {
             W_ i;
             W_ o_lits = BCO_GET_LARGE_ARG;
             W_ n_words = BCO_GET_LARGE_ARG;
@@ -2090,10 +2414,10 @@ run_BCO:
             for (i = 0; i < n_words; i++) {
                 SpW(i) = (W_)BCO_LIT(o_lits+i);
             }
-            goto nextInsn;
+            NEXT_INSTRUCTION;
         }
 
-        case bci_SLIDE: {
+        INSTRUCTION(bci_SLIDE): {
             W_ n  = BCO_GET_LARGE_ARG;
             W_ by = BCO_GET_LARGE_ARG;
             /*
@@ -2106,10 +2430,10 @@ run_BCO:
             }
             Sp_addW(by);
             INTERP_TICK(it_slides);
-            goto nextInsn;
+            NEXT_INSTRUCTION;
         }
 
-        case bci_ALLOC_AP: {
+        INSTRUCTION(bci_ALLOC_AP): {
             StgHalfWord n_payload = BCO_GET_LARGE_ARG;
             StgAP *ap = (StgAP*)allocate(cap, AP_sizeW(n_payload));
             SpW(-1) = (W_)ap;
@@ -2119,10 +2443,10 @@ run_BCO:
             // visible only from our stack
             SET_HDR(ap, &stg_AP_info, cap->r.rCCCS)
             Sp_subW(1);
-            goto nextInsn;
+            NEXT_INSTRUCTION;
         }
 
-        case bci_ALLOC_AP_NOUPD: {
+        INSTRUCTION(bci_ALLOC_AP_NOUPD): {
             StgHalfWord n_payload = BCO_GET_LARGE_ARG;
             StgAP *ap = (StgAP*)allocate(cap, AP_sizeW(n_payload));
             SpW(-1) = (W_)ap;
@@ -2132,10 +2456,10 @@ run_BCO:
             // visible only from our stack
             SET_HDR(ap, &stg_AP_NOUPD_info, cap->r.rCCCS)
             Sp_subW(1);
-            goto nextInsn;
+            NEXT_INSTRUCTION;
         }
 
-        case bci_ALLOC_PAP: {
+        INSTRUCTION(bci_ALLOC_PAP): {
             StgPAP* pap;
             StgHalfWord arity = BCO_GET_LARGE_ARG;
             StgHalfWord n_payload = BCO_GET_LARGE_ARG;
@@ -2147,10 +2471,10 @@ run_BCO:
             // visible only from our stack
             SET_HDR(pap, &stg_PAP_info, cap->r.rCCCS)
             Sp_subW(1);
-            goto nextInsn;
+            NEXT_INSTRUCTION;
         }
 
-        case bci_MKAP: {
+        INSTRUCTION(bci_MKAP): {
             StgHalfWord i;
             W_ stkoff = BCO_GET_LARGE_ARG;
             StgHalfWord n_payload = BCO_GET_LARGE_ARG;
@@ -2171,10 +2495,10 @@ run_BCO:
                      debugBelch("\tBuilt ");
                      printObj((StgClosure*)ap);
                 );
-            goto nextInsn;
+            NEXT_INSTRUCTION;
         }
 
-        case bci_MKPAP: {
+        INSTRUCTION(bci_MKPAP): {
             StgHalfWord i;
             W_ stkoff = BCO_GET_LARGE_ARG;
             StgHalfWord n_payload = BCO_GET_LARGE_ARG;
@@ -2198,10 +2522,10 @@ run_BCO:
                      debugBelch("\tBuilt ");
                      printObj((StgClosure*)pap);
                 );
-            goto nextInsn;
+            NEXT_INSTRUCTION;
         }
 
-        case bci_UNPACK: {
+        INSTRUCTION(bci_UNPACK): {
             /* Unpack N ptr words from t.o.s constructor */
             W_ i;
             W_ n_words = BCO_GET_LARGE_ARG;
@@ -2210,10 +2534,10 @@ run_BCO:
             for (i = 0; i < n_words; i++) {
                 SpW(i) = (W_)con->payload[i];
             }
-            goto nextInsn;
+            NEXT_INSTRUCTION;
         }
 
-        case bci_PACK: {
+        INSTRUCTION(bci_PACK): {
             W_ o_itbl         = BCO_GET_LARGE_ARG;
             W_ n_words        = BCO_GET_LARGE_ARG;
             StgConInfoTable* itbl = CON_INFO_PTR_TO_STRUCT((StgInfoTable *)BCO_LIT(o_itbl));
@@ -2244,220 +2568,220 @@ run_BCO:
                      debugBelch("\tBuilt ");
                      printObj((StgClosure*)tagged_con);
                 );
-            goto nextInsn;
+            NEXT_INSTRUCTION;
         }
 
-        case bci_TESTLT_P: {
+        INSTRUCTION(bci_TESTLT_P): {
             unsigned int discr  = BCO_NEXT;
             int failto = BCO_GET_LARGE_ARG;
             StgClosure* con = UNTAG_CLOSURE((StgClosure*)ReadSpW(0));
             if (GET_TAG(con) >= discr) {
                 bciPtr = failto;
             }
-            goto nextInsn;
+            NEXT_INSTRUCTION;
         }
 
-        case bci_TESTEQ_P: {
+        INSTRUCTION(bci_TESTEQ_P): {
             unsigned int discr  = BCO_NEXT;
             int failto = BCO_GET_LARGE_ARG;
             StgClosure* con = UNTAG_CLOSURE((StgClosure*)ReadSpW(0));
             if (GET_TAG(con) != discr) {
                 bciPtr = failto;
             }
-            goto nextInsn;
+            NEXT_INSTRUCTION;
         }
 
-        case bci_TESTLT_I: {
+        INSTRUCTION(bci_TESTLT_I): {
             int discr   = BCO_GET_LARGE_ARG;
             int failto  = BCO_GET_LARGE_ARG;
             I_ stackInt = (I_)ReadSpW(0);
             if (stackInt >= (I_)BCO_LIT(discr))
                 bciPtr = failto;
-            goto nextInsn;
+            NEXT_INSTRUCTION;
         }
 
-        case bci_TESTLT_I64: {
+        INSTRUCTION(bci_TESTLT_I64): {
             int discr   = BCO_GET_LARGE_ARG;
             int failto  = BCO_GET_LARGE_ARG;
             StgInt64 stackInt = ReadSpW64(0);
             if (stackInt >= BCO_LITI64(discr))
                 bciPtr = failto;
-            goto nextInsn;
+            NEXT_INSTRUCTION;
         }
 
-        case bci_TESTLT_I32: {
+        INSTRUCTION(bci_TESTLT_I32): {
             int discr   = BCO_GET_LARGE_ARG;
             int failto  = BCO_GET_LARGE_ARG;
             StgInt32 stackInt = (StgInt32) ReadSpW(0);
             if (stackInt >= (StgInt32)BCO_LIT(discr))
                 bciPtr = failto;
-            goto nextInsn;
+            NEXT_INSTRUCTION;
         }
 
-        case bci_TESTLT_I16: {
+        INSTRUCTION(bci_TESTLT_I16): {
             int discr   = BCO_GET_LARGE_ARG;
             int failto  = BCO_GET_LARGE_ARG;
             StgInt16 stackInt = (StgInt16) ReadSpW(0);
             if (stackInt >= (StgInt16)BCO_LIT(discr))
                 bciPtr = failto;
-            goto nextInsn;
+            NEXT_INSTRUCTION;
         }
 
-        case bci_TESTLT_I8: {
+        INSTRUCTION(bci_TESTLT_I8): {
             int discr   = BCO_GET_LARGE_ARG;
             int failto  = BCO_GET_LARGE_ARG;
             StgInt8 stackInt = (StgInt8) ReadSpW(0);
             if (stackInt >= (StgInt8)BCO_LIT(discr))
                 bciPtr = failto;
-            goto nextInsn;
+            NEXT_INSTRUCTION;
         }
 
-        case bci_TESTEQ_I: {
+        INSTRUCTION(bci_TESTEQ_I): {
             int discr   = BCO_GET_LARGE_ARG;
             int failto  = BCO_GET_LARGE_ARG;
             I_ stackInt = (I_)ReadSpW(0);
             if (stackInt != (I_)BCO_LIT(discr)) {
                 bciPtr = failto;
             }
-            goto nextInsn;
+            NEXT_INSTRUCTION;
         }
 
-        case bci_TESTEQ_I64: {
+        INSTRUCTION(bci_TESTEQ_I64): {
             int discr   = BCO_GET_LARGE_ARG;
             int failto  = BCO_GET_LARGE_ARG;
             StgInt64 stackInt = ReadSpW64(0);
             if (stackInt != BCO_LITI64(discr)) {
                 bciPtr = failto;
             }
-            goto nextInsn;
+            NEXT_INSTRUCTION;
         }
 
-        case bci_TESTEQ_I32: {
+        INSTRUCTION(bci_TESTEQ_I32): {
             int discr   = BCO_GET_LARGE_ARG;
             int failto  = BCO_GET_LARGE_ARG;
             StgInt32 stackInt = (StgInt32) ReadSpW(0);
             if (stackInt != (StgInt32)BCO_LIT(discr)) {
                 bciPtr = failto;
             }
-            goto nextInsn;
+            NEXT_INSTRUCTION;
         }
 
-        case bci_TESTEQ_I16: {
+        INSTRUCTION(bci_TESTEQ_I16): {
             int discr   = BCO_GET_LARGE_ARG;
             int failto  = BCO_GET_LARGE_ARG;
             StgInt16 stackInt = (StgInt16) ReadSpW(0);
             if (stackInt != (StgInt16)BCO_LIT(discr)) {
                 bciPtr = failto;
             }
-            goto nextInsn;
+            NEXT_INSTRUCTION;
         }
 
-        case bci_TESTEQ_I8: {
+        INSTRUCTION(bci_TESTEQ_I8): {
             int discr   = BCO_GET_LARGE_ARG;
             int failto  = BCO_GET_LARGE_ARG;
             StgInt8 stackInt = (StgInt8) ReadSpW(0);
             if (stackInt != (StgInt8)BCO_LIT(discr)) {
                 bciPtr = failto;
             }
-            goto nextInsn;
+            NEXT_INSTRUCTION;
         }
 
-        case bci_TESTLT_W: {
+        INSTRUCTION(bci_TESTLT_W): {
             int discr   = BCO_GET_LARGE_ARG;
             int failto  = BCO_GET_LARGE_ARG;
             W_ stackWord = (W_)ReadSpW(0);
             if (stackWord >= (W_)BCO_LIT(discr))
                 bciPtr = failto;
-            goto nextInsn;
+            NEXT_INSTRUCTION;
         }
 
-        case bci_TESTLT_W64: {
+        INSTRUCTION(bci_TESTLT_W64): {
             int discr   = BCO_GET_LARGE_ARG;
             int failto  = BCO_GET_LARGE_ARG;
             StgWord64 stackWord = ReadSpW64(0);
             if (stackWord >= BCO_LITW64(discr))
                 bciPtr = failto;
-            goto nextInsn;
+            NEXT_INSTRUCTION;
         }
 
-        case bci_TESTLT_W32: {
+        INSTRUCTION(bci_TESTLT_W32): {
             int discr   = BCO_GET_LARGE_ARG;
             int failto  = BCO_GET_LARGE_ARG;
             StgWord32 stackWord = (StgWord32) ReadSpW(0);
             if (stackWord >= (StgWord32)BCO_LIT(discr))
                 bciPtr = failto;
-            goto nextInsn;
+            NEXT_INSTRUCTION;
         }
 
-        case bci_TESTLT_W16: {
+        INSTRUCTION(bci_TESTLT_W16): {
             int discr   = BCO_GET_LARGE_ARG;
             int failto  = BCO_GET_LARGE_ARG;
             StgWord16 stackWord = (StgInt16) ReadSpW(0);
             if (stackWord >= (StgWord16)BCO_LIT(discr))
                 bciPtr = failto;
-            goto nextInsn;
+            NEXT_INSTRUCTION;
         }
 
-        case bci_TESTLT_W8: {
+        INSTRUCTION(bci_TESTLT_W8): {
             int discr   = BCO_GET_LARGE_ARG;
             int failto  = BCO_GET_LARGE_ARG;
             StgWord8 stackWord = (StgInt8) ReadSpW(0);
             if (stackWord >= (StgWord8)BCO_LIT(discr))
                 bciPtr = failto;
-            goto nextInsn;
+            NEXT_INSTRUCTION;
         }
 
-        case bci_TESTEQ_W: {
+        INSTRUCTION(bci_TESTEQ_W): {
             int discr   = BCO_GET_LARGE_ARG;
             int failto  = BCO_GET_LARGE_ARG;
             W_ stackWord = (W_)ReadSpW(0);
             if (stackWord != (W_)BCO_LIT(discr)) {
                 bciPtr = failto;
             }
-            goto nextInsn;
+            NEXT_INSTRUCTION;
         }
 
-        case bci_TESTEQ_W64: {
+        INSTRUCTION(bci_TESTEQ_W64): {
             int discr   = BCO_GET_LARGE_ARG;
             int failto  = BCO_GET_LARGE_ARG;
             StgWord64 stackWord = ReadSpW64(0);
             if (stackWord != BCO_LITW64(discr)) {
                 bciPtr = failto;
             }
-            goto nextInsn;
+            NEXT_INSTRUCTION;
         }
 
-        case bci_TESTEQ_W32: {
+        INSTRUCTION(bci_TESTEQ_W32): {
             int discr   = BCO_GET_LARGE_ARG;
             int failto  = BCO_GET_LARGE_ARG;
             StgWord32 stackWord = (StgWord32) ReadSpW(0);
             if (stackWord != (StgWord32)BCO_LIT(discr)) {
                 bciPtr = failto;
             }
-            goto nextInsn;
+            NEXT_INSTRUCTION;
         }
 
-        case bci_TESTEQ_W16: {
+        INSTRUCTION(bci_TESTEQ_W16): {
             int discr   = BCO_GET_LARGE_ARG;
             int failto  = BCO_GET_LARGE_ARG;
             StgWord16 stackWord = (StgWord16) ReadSpW(0);
             if (stackWord != (StgWord16)BCO_LIT(discr)) {
                 bciPtr = failto;
             }
-            goto nextInsn;
+            NEXT_INSTRUCTION;
         }
 
-        case bci_TESTEQ_W8: {
+        INSTRUCTION(bci_TESTEQ_W8): {
             int discr   = BCO_GET_LARGE_ARG;
             int failto  = BCO_GET_LARGE_ARG;
             StgWord8 stackWord = (StgWord8) ReadSpW(0);
             if (stackWord != (StgWord8)BCO_LIT(discr)) {
                 bciPtr = failto;
             }
-            goto nextInsn;
+            NEXT_INSTRUCTION;
         }
 
-        case bci_TESTLT_D: {
+        INSTRUCTION(bci_TESTLT_D): {
             int discr   = BCO_GET_LARGE_ARG;
             int failto  = BCO_GET_LARGE_ARG;
             StgDouble stackDbl, discrDbl;
@@ -2466,10 +2790,10 @@ run_BCO:
             if (stackDbl >= discrDbl) {
                 bciPtr = failto;
             }
-            goto nextInsn;
+            NEXT_INSTRUCTION;
         }
 
-        case bci_TESTEQ_D: {
+        INSTRUCTION(bci_TESTEQ_D): {
             int discr   = BCO_GET_LARGE_ARG;
             int failto  = BCO_GET_LARGE_ARG;
             StgDouble stackDbl, discrDbl;
@@ -2478,10 +2802,10 @@ run_BCO:
             if (stackDbl != discrDbl) {
                 bciPtr = failto;
             }
-            goto nextInsn;
+            NEXT_INSTRUCTION;
         }
 
-        case bci_TESTLT_F: {
+        INSTRUCTION(bci_TESTLT_F): {
             int discr   = BCO_GET_LARGE_ARG;
             int failto  = BCO_GET_LARGE_ARG;
             StgFloat stackFlt, discrFlt;
@@ -2490,10 +2814,10 @@ run_BCO:
             if (stackFlt >= discrFlt) {
                 bciPtr = failto;
             }
-            goto nextInsn;
+            NEXT_INSTRUCTION;
         }
 
-        case bci_TESTEQ_F: {
+        INSTRUCTION(bci_TESTEQ_F): {
             int discr   = BCO_GET_LARGE_ARG;
             int failto  = BCO_GET_LARGE_ARG;
             StgFloat stackFlt, discrFlt;
@@ -2502,11 +2826,11 @@ run_BCO:
             if (stackFlt != discrFlt) {
                 bciPtr = failto;
             }
-            goto nextInsn;
+            NEXT_INSTRUCTION;
         }
 
         // Control-flow ish things
-        case bci_ENTER:
+        INSTRUCTION(bci_ENTER):
             // Context-switch check.  We put it here to ensure that
             // the interpreter has done at least *some* work before
             // context switching: sometimes the scheduler can invoke
@@ -2518,50 +2842,50 @@ run_BCO:
             }
             goto eval;
 
-        case bci_RETURN_P:
+        INSTRUCTION(bci_RETURN_P):
             tagged_obj = (StgClosure *)ReadSpW(0);
             Sp_addW(1);
             goto do_return_pointer;
 
-        case bci_RETURN_N:
+        INSTRUCTION(bci_RETURN_N):
             Sp_subW(1);
             SpW(0) = (W_)&stg_ret_n_info;
             goto do_return_nonpointer;
-        case bci_RETURN_F:
+        INSTRUCTION(bci_RETURN_F):
             Sp_subW(1);
             SpW(0) = (W_)&stg_ret_f_info;
             goto do_return_nonpointer;
-        case bci_RETURN_D:
+        INSTRUCTION(bci_RETURN_D):
             Sp_subW(1);
             SpW(0) = (W_)&stg_ret_d_info;
             goto do_return_nonpointer;
-        case bci_RETURN_L:
+        INSTRUCTION(bci_RETURN_L):
             Sp_subW(1);
             SpW(0) = (W_)&stg_ret_l_info;
             goto do_return_nonpointer;
-        case bci_RETURN_V:
+        INSTRUCTION(bci_RETURN_V):
             Sp_subW(1);
             SpW(0) = (W_)&stg_ret_v_info;
             goto do_return_nonpointer;
-        case bci_RETURN_T: {
+        INSTRUCTION(bci_RETURN_T): {
             /* tuple_info and tuple_bco must already be on the stack */
             Sp_subW(1);
             SpW(0) = (W_)&stg_ret_t_info;
             goto do_return_nonpointer;
         }
 
-        case bci_BCO_NAME:
+        INSTRUCTION(bci_BCO_NAME):
             bciPtr++;
-            goto nextInsn;
+            NEXT_INSTRUCTION;
 
-        case bci_SWIZZLE: {
+        INSTRUCTION(bci_SWIZZLE): {
             W_ stkoff = BCO_GET_LARGE_ARG;
             StgInt n = BCO_GET_LARGE_ARG;
             (*(StgInt*)(SafeSpWP(stkoff))) += n;
-            goto nextInsn;
+            NEXT_INSTRUCTION;
         }
 
-        case bci_PRIMCALL: {
+        INSTRUCTION(bci_PRIMCALL): {
             Sp_subW(1);
             SpW(0) = (W_)&stg_primcall_info;
             RETURN_TO_SCHEDULER_NO_PAUSE(ThreadRunGHC, ThreadYielding);
@@ -2577,7 +2901,7 @@ run_BCO:
             ty r = op ((ty) ReadSpW(0));                          \
             SpW(0) = (StgWord) r;                                   \
         }                                                           \
-        goto nextInsn;                                              \
+        NEXT_INSTRUCTION;                                              \
     }
 
 // op :: ty -> ty -> ty
@@ -2592,7 +2916,7 @@ run_BCO:
                 Sp_addW(1);                                                     \
                 SpW(0) = (StgWord) r;                                           \
             };                                                                  \
-            goto nextInsn;                                                      \
+            NEXT_INSTRUCTION;                                                      \
         }
 
 // op :: ty -> Int -> ty
@@ -2607,7 +2931,7 @@ run_BCO:
         Sp_addW(1);                                                     \
         SpW(0) = (StgWord) r;                                           \
     };                                                                  \
-    goto nextInsn;                                                      \
+    NEXT_INSTRUCTION;                                                      \
 }
 
 // op :: ty -> ty -> Int
@@ -2622,113 +2946,113 @@ run_BCO:
         Sp_addW(1);                                                     \
         SpW(0) = (StgWord) r;                                           \
     };                                                                  \
-    goto nextInsn;                                                      \
+    NEXT_INSTRUCTION;                                                      \
 }
 
-        case bci_OP_ADD_64: SIZED_BIN_OP(+, StgInt64)
-        case bci_OP_SUB_64: SIZED_BIN_OP(-, StgInt64)
-        case bci_OP_AND_64: SIZED_BIN_OP(&, StgInt64)
-        case bci_OP_XOR_64: SIZED_BIN_OP(^, StgInt64)
-        case bci_OP_OR_64:  SIZED_BIN_OP(|, StgInt64)
-        case bci_OP_MUL_64: SIZED_BIN_OP(*, StgInt64)
-        case bci_OP_SHL_64: SIZED_BIN_OP_TY_INT(<<, StgWord64)
-        case bci_OP_LSR_64: SIZED_BIN_OP_TY_INT(>>, StgWord64)
-        case bci_OP_ASR_64: SIZED_BIN_OP_TY_INT(>>, StgInt64)
-
-        case bci_OP_NEQ_64:  SIZED_BIN_OP_TY_TY_INT(!=, StgWord64)
-        case bci_OP_EQ_64:   SIZED_BIN_OP_TY_TY_INT(==, StgWord64)
-        case bci_OP_U_GT_64: SIZED_BIN_OP_TY_TY_INT(>, StgWord64)
-        case bci_OP_U_GE_64: SIZED_BIN_OP_TY_TY_INT(>=, StgWord64)
-        case bci_OP_U_LT_64: SIZED_BIN_OP_TY_TY_INT(<, StgWord64)
-        case bci_OP_U_LE_64: SIZED_BIN_OP_TY_TY_INT(<=, StgWord64)
-
-        case bci_OP_S_GT_64: SIZED_BIN_OP_TY_TY_INT(>, StgInt64)
-        case bci_OP_S_GE_64: SIZED_BIN_OP_TY_TY_INT(>=, StgInt64)
-        case bci_OP_S_LT_64: SIZED_BIN_OP_TY_TY_INT(<, StgInt64)
-        case bci_OP_S_LE_64: SIZED_BIN_OP_TY_TY_INT(<=, StgInt64)
-
-        case bci_OP_NOT_64: UN_SIZED_OP(~, StgWord64)
-        case bci_OP_NEG_64: UN_SIZED_OP(-, StgInt64)
-
-
-        case bci_OP_ADD_32: SIZED_BIN_OP(+, StgInt32)
-        case bci_OP_SUB_32: SIZED_BIN_OP(-, StgInt32)
-        case bci_OP_AND_32: SIZED_BIN_OP(&, StgInt32)
-        case bci_OP_XOR_32: SIZED_BIN_OP(^, StgInt32)
-        case bci_OP_OR_32:  SIZED_BIN_OP(|, StgInt32)
-        case bci_OP_MUL_32: SIZED_BIN_OP(*, StgInt32)
-        case bci_OP_SHL_32: SIZED_BIN_OP_TY_INT(<<, StgWord32)
-        case bci_OP_LSR_32: SIZED_BIN_OP_TY_INT(>>, StgWord32)
-        case bci_OP_ASR_32: SIZED_BIN_OP_TY_INT(>>, StgInt32)
-
-        case bci_OP_NEQ_32:  SIZED_BIN_OP_TY_TY_INT(!=, StgWord32)
-        case bci_OP_EQ_32:   SIZED_BIN_OP_TY_TY_INT(==, StgWord32)
-        case bci_OP_U_GT_32: SIZED_BIN_OP_TY_TY_INT(>, StgWord32)
-        case bci_OP_U_GE_32: SIZED_BIN_OP_TY_TY_INT(>=, StgWord32)
-        case bci_OP_U_LT_32: SIZED_BIN_OP_TY_TY_INT(<, StgWord32)
-        case bci_OP_U_LE_32: SIZED_BIN_OP_TY_TY_INT(<=, StgWord32)
-
-        case bci_OP_S_GT_32: SIZED_BIN_OP_TY_TY_INT(>, StgInt32)
-        case bci_OP_S_GE_32: SIZED_BIN_OP_TY_TY_INT(>=, StgInt32)
-        case bci_OP_S_LT_32: SIZED_BIN_OP_TY_TY_INT(<, StgInt32)
-        case bci_OP_S_LE_32: SIZED_BIN_OP_TY_TY_INT(<=, StgInt32)
-
-        case bci_OP_NOT_32: UN_SIZED_OP(~, StgWord32)
-        case bci_OP_NEG_32: UN_SIZED_OP(-, StgInt32)
-
-
-        case bci_OP_ADD_16: SIZED_BIN_OP(+, StgInt16)
-        case bci_OP_SUB_16: SIZED_BIN_OP(-, StgInt16)
-        case bci_OP_AND_16: SIZED_BIN_OP(&, StgInt16)
-        case bci_OP_XOR_16: SIZED_BIN_OP(^, StgInt16)
-        case bci_OP_OR_16:  SIZED_BIN_OP(|, StgInt16)
-        case bci_OP_MUL_16: SIZED_BIN_OP(*, StgInt16)
-        case bci_OP_SHL_16: SIZED_BIN_OP_TY_INT(<<, StgWord16)
-        case bci_OP_LSR_16: SIZED_BIN_OP_TY_INT(>>, StgWord16)
-        case bci_OP_ASR_16: SIZED_BIN_OP_TY_INT(>>, StgInt16)
-
-        case bci_OP_NEQ_16:  SIZED_BIN_OP_TY_TY_INT(!=, StgWord16)
-        case bci_OP_EQ_16:   SIZED_BIN_OP_TY_TY_INT(==, StgWord16)
-        case bci_OP_U_GT_16: SIZED_BIN_OP_TY_TY_INT(>, StgWord16)
-        case bci_OP_U_GE_16: SIZED_BIN_OP_TY_TY_INT(>=, StgWord16)
-        case bci_OP_U_LT_16: SIZED_BIN_OP_TY_TY_INT(<, StgWord16)
-        case bci_OP_U_LE_16: SIZED_BIN_OP_TY_TY_INT(<=, StgWord16)
-
-        case bci_OP_S_GT_16: SIZED_BIN_OP(>, StgInt16)
-        case bci_OP_S_GE_16: SIZED_BIN_OP(>=, StgInt16)
-        case bci_OP_S_LT_16: SIZED_BIN_OP(<, StgInt16)
-        case bci_OP_S_LE_16: SIZED_BIN_OP(<=, StgInt16)
-
-        case bci_OP_NOT_16: UN_SIZED_OP(~, StgWord16)
-        case bci_OP_NEG_16: UN_SIZED_OP(-, StgInt16)
-
-
-        case bci_OP_ADD_08: SIZED_BIN_OP(+, StgInt8)
-        case bci_OP_SUB_08: SIZED_BIN_OP(-, StgInt8)
-        case bci_OP_AND_08: SIZED_BIN_OP(&, StgInt8)
-        case bci_OP_XOR_08: SIZED_BIN_OP(^, StgInt8)
-        case bci_OP_OR_08:  SIZED_BIN_OP(|, StgInt8)
-        case bci_OP_MUL_08: SIZED_BIN_OP(*, StgInt8)
-        case bci_OP_SHL_08: SIZED_BIN_OP_TY_INT(<<, StgWord8)
-        case bci_OP_LSR_08: SIZED_BIN_OP_TY_INT(>>, StgWord8)
-        case bci_OP_ASR_08: SIZED_BIN_OP_TY_INT(>>, StgInt8)
-
-        case bci_OP_NEQ_08:  SIZED_BIN_OP_TY_TY_INT(!=, StgWord8)
-        case bci_OP_EQ_08:   SIZED_BIN_OP_TY_TY_INT(==, StgWord8)
-        case bci_OP_U_GT_08: SIZED_BIN_OP_TY_TY_INT(>, StgWord8)
-        case bci_OP_U_GE_08: SIZED_BIN_OP_TY_TY_INT(>=, StgWord8)
-        case bci_OP_U_LT_08: SIZED_BIN_OP_TY_TY_INT(<, StgWord8)
-        case bci_OP_U_LE_08: SIZED_BIN_OP_TY_TY_INT(<=, StgWord8)
-
-        case bci_OP_S_GT_08: SIZED_BIN_OP_TY_TY_INT(>, StgInt8)
-        case bci_OP_S_GE_08: SIZED_BIN_OP_TY_TY_INT(>=, StgInt8)
-        case bci_OP_S_LT_08: SIZED_BIN_OP_TY_TY_INT(<, StgInt8)
-        case bci_OP_S_LE_08: SIZED_BIN_OP_TY_TY_INT(<=, StgInt8)
-
-        case bci_OP_NOT_08: UN_SIZED_OP(~, StgWord8)
-        case bci_OP_NEG_08: UN_SIZED_OP(-, StgInt8)
-
-        case bci_OP_INDEX_ADDR_64:
+        INSTRUCTION(bci_OP_ADD_64): SIZED_BIN_OP(+, StgInt64)
+        INSTRUCTION(bci_OP_SUB_64): SIZED_BIN_OP(-, StgInt64)
+        INSTRUCTION(bci_OP_AND_64): SIZED_BIN_OP(&, StgInt64)
+        INSTRUCTION(bci_OP_XOR_64): SIZED_BIN_OP(^, StgInt64)
+        INSTRUCTION(bci_OP_OR_64):  SIZED_BIN_OP(|, StgInt64)
+        INSTRUCTION(bci_OP_MUL_64): SIZED_BIN_OP(*, StgInt64)
+        INSTRUCTION(bci_OP_SHL_64): SIZED_BIN_OP_TY_INT(<<, StgWord64)
+        INSTRUCTION(bci_OP_LSR_64): SIZED_BIN_OP_TY_INT(>>, StgWord64)
+        INSTRUCTION(bci_OP_ASR_64): SIZED_BIN_OP_TY_INT(>>, StgInt64)
+
+        INSTRUCTION(bci_OP_NEQ_64):  SIZED_BIN_OP_TY_TY_INT(!=, StgWord64)
+        INSTRUCTION(bci_OP_EQ_64):   SIZED_BIN_OP_TY_TY_INT(==, StgWord64)
+        INSTRUCTION(bci_OP_U_GT_64): SIZED_BIN_OP_TY_TY_INT(>, StgWord64)
+        INSTRUCTION(bci_OP_U_GE_64): SIZED_BIN_OP_TY_TY_INT(>=, StgWord64)
+        INSTRUCTION(bci_OP_U_LT_64): SIZED_BIN_OP_TY_TY_INT(<, StgWord64)
+        INSTRUCTION(bci_OP_U_LE_64): SIZED_BIN_OP_TY_TY_INT(<=, StgWord64)
+
+        INSTRUCTION(bci_OP_S_GT_64): SIZED_BIN_OP_TY_TY_INT(>, StgInt64)
+        INSTRUCTION(bci_OP_S_GE_64): SIZED_BIN_OP_TY_TY_INT(>=, StgInt64)
+        INSTRUCTION(bci_OP_S_LT_64): SIZED_BIN_OP_TY_TY_INT(<, StgInt64)
+        INSTRUCTION(bci_OP_S_LE_64): SIZED_BIN_OP_TY_TY_INT(<=, StgInt64)
+
+        INSTRUCTION(bci_OP_NOT_64): UN_SIZED_OP(~, StgWord64)
+        INSTRUCTION(bci_OP_NEG_64): UN_SIZED_OP(-, StgInt64)
+
+
+        INSTRUCTION(bci_OP_ADD_32): SIZED_BIN_OP(+, StgInt32)
+        INSTRUCTION(bci_OP_SUB_32): SIZED_BIN_OP(-, StgInt32)
+        INSTRUCTION(bci_OP_AND_32): SIZED_BIN_OP(&, StgInt32)
+        INSTRUCTION(bci_OP_XOR_32): SIZED_BIN_OP(^, StgInt32)
+        INSTRUCTION(bci_OP_OR_32):  SIZED_BIN_OP(|, StgInt32)
+        INSTRUCTION(bci_OP_MUL_32): SIZED_BIN_OP(*, StgInt32)
+        INSTRUCTION(bci_OP_SHL_32): SIZED_BIN_OP_TY_INT(<<, StgWord32)
+        INSTRUCTION(bci_OP_LSR_32): SIZED_BIN_OP_TY_INT(>>, StgWord32)
+        INSTRUCTION(bci_OP_ASR_32): SIZED_BIN_OP_TY_INT(>>, StgInt32)
+
+        INSTRUCTION(bci_OP_NEQ_32):  SIZED_BIN_OP_TY_TY_INT(!=, StgWord32)
+        INSTRUCTION(bci_OP_EQ_32):   SIZED_BIN_OP_TY_TY_INT(==, StgWord32)
+        INSTRUCTION(bci_OP_U_GT_32): SIZED_BIN_OP_TY_TY_INT(>, StgWord32)
+        INSTRUCTION(bci_OP_U_GE_32): SIZED_BIN_OP_TY_TY_INT(>=, StgWord32)
+        INSTRUCTION(bci_OP_U_LT_32): SIZED_BIN_OP_TY_TY_INT(<, StgWord32)
+        INSTRUCTION(bci_OP_U_LE_32): SIZED_BIN_OP_TY_TY_INT(<=, StgWord32)
+
+        INSTRUCTION(bci_OP_S_GT_32): SIZED_BIN_OP_TY_TY_INT(>, StgInt32)
+        INSTRUCTION(bci_OP_S_GE_32): SIZED_BIN_OP_TY_TY_INT(>=, StgInt32)
+        INSTRUCTION(bci_OP_S_LT_32): SIZED_BIN_OP_TY_TY_INT(<, StgInt32)
+        INSTRUCTION(bci_OP_S_LE_32): SIZED_BIN_OP_TY_TY_INT(<=, StgInt32)
+
+        INSTRUCTION(bci_OP_NOT_32): UN_SIZED_OP(~, StgWord32)
+        INSTRUCTION(bci_OP_NEG_32): UN_SIZED_OP(-, StgInt32)
+
+
+        INSTRUCTION(bci_OP_ADD_16): SIZED_BIN_OP(+, StgInt16)
+        INSTRUCTION(bci_OP_SUB_16): SIZED_BIN_OP(-, StgInt16)
+        INSTRUCTION(bci_OP_AND_16): SIZED_BIN_OP(&, StgInt16)
+        INSTRUCTION(bci_OP_XOR_16): SIZED_BIN_OP(^, StgInt16)
+        INSTRUCTION(bci_OP_OR_16):  SIZED_BIN_OP(|, StgInt16)
+        INSTRUCTION(bci_OP_MUL_16): SIZED_BIN_OP(*, StgInt16)
+        INSTRUCTION(bci_OP_SHL_16): SIZED_BIN_OP_TY_INT(<<, StgWord16)
+        INSTRUCTION(bci_OP_LSR_16): SIZED_BIN_OP_TY_INT(>>, StgWord16)
+        INSTRUCTION(bci_OP_ASR_16): SIZED_BIN_OP_TY_INT(>>, StgInt16)
+
+        INSTRUCTION(bci_OP_NEQ_16):  SIZED_BIN_OP_TY_TY_INT(!=, StgWord16)
+        INSTRUCTION(bci_OP_EQ_16):   SIZED_BIN_OP_TY_TY_INT(==, StgWord16)
+        INSTRUCTION(bci_OP_U_GT_16): SIZED_BIN_OP_TY_TY_INT(>, StgWord16)
+        INSTRUCTION(bci_OP_U_GE_16): SIZED_BIN_OP_TY_TY_INT(>=, StgWord16)
+        INSTRUCTION(bci_OP_U_LT_16): SIZED_BIN_OP_TY_TY_INT(<, StgWord16)
+        INSTRUCTION(bci_OP_U_LE_16): SIZED_BIN_OP_TY_TY_INT(<=, StgWord16)
+
+        INSTRUCTION(bci_OP_S_GT_16): SIZED_BIN_OP(>, StgInt16)
+        INSTRUCTION(bci_OP_S_GE_16): SIZED_BIN_OP(>=, StgInt16)
+        INSTRUCTION(bci_OP_S_LT_16): SIZED_BIN_OP(<, StgInt16)
+        INSTRUCTION(bci_OP_S_LE_16): SIZED_BIN_OP(<=, StgInt16)
+
+        INSTRUCTION(bci_OP_NOT_16): UN_SIZED_OP(~, StgWord16)
+        INSTRUCTION(bci_OP_NEG_16): UN_SIZED_OP(-, StgInt16)
+
+
+        INSTRUCTION(bci_OP_ADD_08): SIZED_BIN_OP(+, StgInt8)
+        INSTRUCTION(bci_OP_SUB_08): SIZED_BIN_OP(-, StgInt8)
+        INSTRUCTION(bci_OP_AND_08): SIZED_BIN_OP(&, StgInt8)
+        INSTRUCTION(bci_OP_XOR_08): SIZED_BIN_OP(^, StgInt8)
+        INSTRUCTION(bci_OP_OR_08):  SIZED_BIN_OP(|, StgInt8)
+        INSTRUCTION(bci_OP_MUL_08): SIZED_BIN_OP(*, StgInt8)
+        INSTRUCTION(bci_OP_SHL_08): SIZED_BIN_OP_TY_INT(<<, StgWord8)
+        INSTRUCTION(bci_OP_LSR_08): SIZED_BIN_OP_TY_INT(>>, StgWord8)
+        INSTRUCTION(bci_OP_ASR_08): SIZED_BIN_OP_TY_INT(>>, StgInt8)
+
+        INSTRUCTION(bci_OP_NEQ_08):  SIZED_BIN_OP_TY_TY_INT(!=, StgWord8)
+        INSTRUCTION(bci_OP_EQ_08):   SIZED_BIN_OP_TY_TY_INT(==, StgWord8)
+        INSTRUCTION(bci_OP_U_GT_08): SIZED_BIN_OP_TY_TY_INT(>, StgWord8)
+        INSTRUCTION(bci_OP_U_GE_08): SIZED_BIN_OP_TY_TY_INT(>=, StgWord8)
+        INSTRUCTION(bci_OP_U_LT_08): SIZED_BIN_OP_TY_TY_INT(<, StgWord8)
+        INSTRUCTION(bci_OP_U_LE_08): SIZED_BIN_OP_TY_TY_INT(<=, StgWord8)
+
+        INSTRUCTION(bci_OP_S_GT_08): SIZED_BIN_OP_TY_TY_INT(>, StgInt8)
+        INSTRUCTION(bci_OP_S_GE_08): SIZED_BIN_OP_TY_TY_INT(>=, StgInt8)
+        INSTRUCTION(bci_OP_S_LT_08): SIZED_BIN_OP_TY_TY_INT(<, StgInt8)
+        INSTRUCTION(bci_OP_S_LE_08): SIZED_BIN_OP_TY_TY_INT(<=, StgInt8)
+
+        INSTRUCTION(bci_OP_NOT_08): UN_SIZED_OP(~, StgWord8)
+        INSTRUCTION(bci_OP_NEG_08): UN_SIZED_OP(-, StgInt8)
+
+        INSTRUCTION(bci_OP_INDEX_ADDR_64):
         {
             StgWord64* addr = (StgWord64*) SpW(0);
             StgInt offset = (StgInt) SpW(1);
@@ -2736,35 +3060,35 @@ run_BCO:
                 Sp_addW(1);
             }
             SpW64(0) = *(addr+offset);
-            goto nextInsn;
+            NEXT_INSTRUCTION;
         }
 
-        case bci_OP_INDEX_ADDR_32:
+        INSTRUCTION(bci_OP_INDEX_ADDR_32):
         {
             StgWord32* addr = (StgWord32*) SpW(0);
             StgInt offset = (StgInt) SpW(1);
             Sp_addW(1);
             SpW(0) = (StgWord) *(addr+offset);
-            goto nextInsn;
+            NEXT_INSTRUCTION;
         }
-        case bci_OP_INDEX_ADDR_16:
+        INSTRUCTION(bci_OP_INDEX_ADDR_16):
         {
             StgWord16* addr = (StgWord16*) SpW(0);
             StgInt offset = (StgInt) SpW(1);
             Sp_addW(1);
             SpW(0) = (StgWord) *(addr+offset);
-            goto nextInsn;
+            NEXT_INSTRUCTION;
         }
-        case bci_OP_INDEX_ADDR_08:
+        INSTRUCTION(bci_OP_INDEX_ADDR_08):
         {
             StgWord8* addr = (StgWord8*) SpW(0);
             StgInt offset = (StgInt) SpW(1);
             Sp_addW(1);
             SpW(0) = (StgWord) *(addr+offset);
-            goto nextInsn;
+            NEXT_INSTRUCTION;
         }
 
-        case bci_CCALL: {
+        INSTRUCTION(bci_CCALL): {
             void *tok;
             W_ stk_offset             = BCO_GET_LARGE_ARG;
             int o_itbl                = BCO_GET_LARGE_ARG;
@@ -2921,25 +3245,33 @@ run_BCO:
             memcpy(Sp, ret, sizeof(W_) * ret_size);
 #endif
 
-            goto nextInsn;
+            NEXT_INSTRUCTION;
         }
 
-        case bci_JMP: {
+        INSTRUCTION(bci_JMP): {
             /* BCO_NEXT modifies bciPtr, so be conservative. */
             int nextpc = BCO_GET_LARGE_ARG;
             bciPtr     = nextpc;
-            goto nextInsn;
+            NEXT_INSTRUCTION;
         }
 
-        case bci_CASEFAIL:
+        INSTRUCTION(bci_CASEFAIL):
             barf("interpretBCO: hit a CASEFAIL");
 
-            // Errors
+
+
+#if defined(COMPUTED_GOTO)
+        INSTRUCTION(bci_DEFAULT):
+            barf("interpretBCO: unknown or unimplemented opcode %d",
+                 (int)(bci & 0xFF));
+#else
+        // Errors
         default:
             barf("interpretBCO: unknown or unimplemented opcode %d",
                  (int)(bci & 0xFF));
-
         } /* switch on opcode */
+#endif
+
     }
     }
 


=====================================
rts/include/rts/Bytecodes.h
=====================================
@@ -23,6 +23,11 @@
    I hope that's clear :-)
 */
 
+/*
+   Make sure to update jumptable in rts/Interpreter.c when modifying
+   bytecodes! See Note [Instruction dispatch in the bytecode interpreter]
+   for details.
+*/
 #define bci_STKCHECK                    1
 #define bci_PUSH_L                      2
 #define bci_PUSH_LL                     3


=====================================
testsuite/tests/simplCore/should_compile/T26349.hs
=====================================
@@ -0,0 +1,11 @@
+{-# LANGUAGE DeepSubsumption, RankNTypes #-}
+module T26349 where
+
+{-# SPECIALIZE INLINE mapTCMT :: (forall b. IO b -> IO b) -> IO a -> IO a #-}
+mapTCMT :: (forall b. m b -> n b) -> m a -> n a
+mapTCMT f m = f m
+
+{-
+ We'll check
+    tcExpr (mapTCMT) (Check ((forall b. IO b -> IO b) -> IO a_sk -> IO a_sk))
+-}


=====================================
testsuite/tests/simplCore/should_compile/T26349.stderr
=====================================
@@ -0,0 +1,3 @@
+==================== Tidy Core rules ====================
+"USPEC mapTCMT @(*) @IO @IO @_"
+    forall (@a). mapTCMT @(*) @IO @IO @a = mapTCMT_$smapTCMT @a


=====================================
testsuite/tests/simplCore/should_compile/all.T
=====================================
@@ -559,3 +559,4 @@ test('T26051',   [ grep_errmsg(r'\$wspecMe')
 test('T26115', [grep_errmsg(r'DFun')], compile, ['-O -ddump-simpl -dsuppress-uniques'])
 test('T26116', normal, compile, ['-O -ddump-rules'])
 test('T26117', [grep_errmsg(r'==')], compile, ['-O -ddump-simpl -dsuppress-uniques'])
+test('T26349',  normal, compile, ['-O -ddump-rules'])


=====================================
testsuite/tests/simplCore/should_compile/rule2.stderr
=====================================
@@ -10,18 +10,15 @@
 
 
 ==================== Grand total simplifier statistics ====================
-Total ticks:     13
+Total ticks:     11
 
-2 PreInlineUnconditionally
-  1 ds
-  1 f
+1 PreInlineUnconditionally 1 f
 2 UnfoldingDone
   1 GHC.Internal.Base.id
   1 Roman.bar
 1 RuleFired 1 foo/bar
 1 LetFloatFromLet 1
-7 BetaReduction
-  1 ds
+6 BetaReduction
   1 f
   1 a
   1 m


=====================================
utils/genprimopcode/genprimopcode.cabal
=====================================
@@ -32,4 +32,4 @@ Executable genprimopcode
     Build-Depends: base       >= 4   && < 5,
                    array
     if flag(build-tool-depends)
-      build-tool-depends: alex:alex >= 3.2.6, happy:happy >= 1.20.0
+      build-tool-depends: alex:alex >= 3.2.6, happy:happy >= 2.1.5 || == 1.20.0 || == 1.20.1.1



View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/e8930abc9c44487e01b8d52d5db4674...

-- 
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/e8930abc9c44487e01b8d52d5db4674...
You're receiving this email because of your account on gitlab.haskell.org.

    

Cheng Shao (＠TerrorJack)

tags

participants (1)