Marge Bot pushed to branch master at Glasgow Haskell Compiler / GHC Commits: fca9cd7c by sheaf at 2025-12-18T13:18:18-05:00 X86 CodeGen: fix assign_eax_sse_regs We must set %al to the number of SSE2 registers that contain arguments (in case we are dealing with a varargs function). The logic for counting how many arguments reside in SSE2 registers was incorrect, as it used 'isFloatFormat', which incorrectly ignores vector registers. We now instead do case analysis on the register class: is_sse_reg r = case targetClassOfReg platform r of RcFloatOrVector -> True RcInteger -> False This change is necessary to prevent segfaults in T20030_test1j, because subsequent commits change the format calculations, resulting in vector formats more often. - - - - - 53150617 by sheaf at 2025-12-18T13:18:19-05:00 X86 regUsageOfInstr: fix format for IMUL When used with 8-bit operands, the IMUL instruction returns the result in the lower 16 bits of %rax (also known as %ax). This is different than for the other sizes, where an input at 16, 32 or 64 bits will result in 16, 32 or 64 bits of output in both %rax and %rdx. This doesn't affect the behaviour of the compiler, because we don't allow partial writes at sub-word sizes. The rationale is explained in Wrinkle [Don't allow scalar partial writes] in Note [Register formats in liveness analysis], in GHC.CmmToAsm.Reg.Liveness. - - - - - c7a56dd1 by sheaf at 2025-12-18T13:18:19-05:00 Liveness analysis: consider register formats This commit updates the register allocator to be a bit more careful in situations in which a single register is used at multiple different formats, e.g. when xmm1 is used both to store a Double# and a DoubleX2#. This is done by introducing the 'Regs' newtype around 'UniqSet RegWithFormat', for which the combining operations take the larger of the two formats instead of overriding the format. Operations on 'Regs' are defined in 'GHC.CmmToAsm.Reg.Regs'. There is a modest compile-time cost for the additional overhead for tracking register formats, which causes the metric increases of this commit. The subtle aspects of the implementation are outlined in Note [Register formats in liveness analysis] in GHC.CmmToAsm.Reg.Liveness. Fixes #26411 #26611 ------------------------- Metric Increase: T12707 T26425 T3294 ------------------------- - - - - - c2e83339 by sheaf at 2025-12-18T13:18:19-05:00 Register allocator: reload at same format as spill This commit ensures that if we spill a register onto the stack at a given format, we then always reload the register at this same format. This ensures we don't end up in a situation where we spill F64x2 but end up only reloading the lower F64. This first reload would make us believe the whole data is in a register, thus silently losing the upper 64 bits of the spilled register's contents. Fixes #26526 - - - - - 55ab583b by sheaf at 2025-12-18T13:18:19-05:00 Register allocation: writes redefine format As explained in Note [Allocated register formats] in GHC.CmmToAsm.Reg.Linear, we consider all writes to redefine the format of the register. This ensures that in a situation such as movsd .Ln6m(%rip),%v1 shufpd $0,%v1,%v1 we properly consider the broadcast operation to change the format of %v1 from F64 to F64x2. This completes the fix to #26411 (test in T26411b). - - - - - 20 changed files: - compiler/GHC/CmmToAsm/Reg/Graph.hs - compiler/GHC/CmmToAsm/Reg/Graph/Coalesce.hs - compiler/GHC/CmmToAsm/Reg/Graph/Spill.hs - compiler/GHC/CmmToAsm/Reg/Graph/SpillCost.hs - compiler/GHC/CmmToAsm/Reg/Linear.hs - compiler/GHC/CmmToAsm/Reg/Linear/Base.hs - compiler/GHC/CmmToAsm/Reg/Linear/JoinToTargets.hs - compiler/GHC/CmmToAsm/Reg/Liveness.hs - + compiler/GHC/CmmToAsm/Reg/Regs.hs - compiler/GHC/CmmToAsm/Reg/Target.hs - compiler/GHC/CmmToAsm/X86/CodeGen.hs - compiler/GHC/CmmToAsm/X86/Instr.hs - compiler/GHC/Types/Unique/FM.hs - compiler/GHC/Types/Unique/Set.hs - compiler/ghc.cabal.in - + testsuite/tests/simd/should_run/T26411.hs - + testsuite/tests/simd/should_run/T26411.stdout - + testsuite/tests/simd/should_run/T26411b.hs - + testsuite/tests/simd/should_run/T26411b.stdout - testsuite/tests/simd/should_run/all.T Changes: ===================================== compiler/GHC/CmmToAsm/Reg/Graph.hs ===================================== @@ -335,14 +335,14 @@ buildGraph platform code -- Conflicts between virtual and real regs are recorded as exclusions. graphAddConflictSet :: Platform - -> UniqSet RegWithFormat + -> Regs -> Color.Graph VirtualReg RegClass RealReg -> Color.Graph VirtualReg RegClass RealReg graphAddConflictSet platform regs graph = let arch = platformArch platform - virtuals = takeVirtualRegs regs - reals = takeRealRegs regs + virtuals = takeVirtualRegs $ getRegs regs + reals = takeRealRegs $ getRegs regs graph1 = Color.addConflicts virtuals (classOfVirtualReg arch) graph -- NB: we could add "arch" as argument to functions such as "addConflicts" ===================================== compiler/GHC/CmmToAsm/Reg/Graph/Coalesce.hs ===================================== @@ -13,10 +13,8 @@ import GHC.Cmm import GHC.Data.Bag import GHC.Data.Graph.Directed import GHC.Platform (Platform) -import GHC.Types.Unique (getUnique) import GHC.Types.Unique.FM import GHC.Types.Unique.Supply -import GHC.Types.Unique.Set -- | Do register coalescing on this top level thing -- @@ -88,8 +86,8 @@ slurpJoinMovs platform live slurpLI rs (LiveInstr _ Nothing) = rs slurpLI rs (LiveInstr instr (Just live)) | Just (r1, r2) <- takeRegRegMoveInstr platform instr - , elemUniqSet_Directly (getUnique r1) $ liveDieRead live - , elemUniqSet_Directly (getUnique r2) $ liveBorn live + , r1 `elemRegs` liveDieRead live + , r2 `elemRegs` liveBorn live -- only coalesce movs between two virtuals for now, -- else we end up with allocatable regs in the live ===================================== compiler/GHC/CmmToAsm/Reg/Graph/Spill.hs ===================================== @@ -144,7 +144,7 @@ regSpill_top platform regSlotMap cmm -- then record the fact that these slots are now live in those blocks -- in the given slotmap. patchLiveSlot - :: BlockMap IntSet -> BlockId -> UniqSet RegWithFormat-> BlockMap IntSet + :: BlockMap IntSet -> BlockId -> Regs -> BlockMap IntSet patchLiveSlot slotMap blockId regsLive = let @@ -154,7 +154,8 @@ regSpill_top platform regSlotMap cmm moreSlotsLive = IntSet.fromList $ mapMaybe (lookupUFM regSlotMap . regWithFormat_reg) - $ nonDetEltsUniqSet regsLive + $ nonDetEltsUniqSet + $ getRegs regsLive -- See Note [Unique Determinism and code generation] slotMap' ===================================== compiler/GHC/CmmToAsm/Reg/Graph/SpillCost.hs ===================================== @@ -98,7 +98,7 @@ slurpSpillCostInfo platform cfg cmm countBlock info freqMap (BasicBlock blockId instrs) | LiveInfo _ _ blockLive _ <- info , Just rsLiveEntry <- mapLookup blockId blockLive - , rsLiveEntry_virt <- takeVirtualRegs rsLiveEntry + , rsLiveEntry_virt <- takeVirtualRegs $ getRegs rsLiveEntry = countLIs (ceiling $ blockFreq freqMap blockId) rsLiveEntry_virt instrs | otherwise @@ -132,9 +132,9 @@ slurpSpillCostInfo platform cfg cmm mapM_ (incDefs scale) $ nub $ mapMaybe (takeVirtualReg . regWithFormat_reg) written -- Compute liveness for entry to next instruction. - let liveDieRead_virt = takeVirtualRegs (liveDieRead live) - let liveDieWrite_virt = takeVirtualRegs (liveDieWrite live) - let liveBorn_virt = takeVirtualRegs (liveBorn live) + let liveDieRead_virt = takeVirtualRegs $ getRegs (liveDieRead live) + let liveDieWrite_virt = takeVirtualRegs $ getRegs (liveDieWrite live) + let liveBorn_virt = takeVirtualRegs $ getRegs (liveBorn live) let rsLiveAcross = rsLiveEntry `minusUniqSet` liveDieRead_virt ===================================== compiler/GHC/CmmToAsm/Reg/Linear.hs ===================================== @@ -207,7 +207,7 @@ linearRegAlloc :: forall instr. (Instruction instr) => NCGConfig -> [BlockId] -- ^ entry points - -> BlockMap (UniqSet RegWithFormat) + -> BlockMap Regs -- ^ live regs on entry to each basic block -> [SCC (LiveBasicBlock instr)] -- ^ instructions annotated with "deaths" @@ -246,7 +246,7 @@ linearRegAlloc' => NCGConfig -> freeRegs -> [BlockId] -- ^ entry points - -> BlockMap (UniqSet RegWithFormat) -- ^ live regs on entry to each basic block + -> BlockMap Regs -- ^ live regs on entry to each basic block -> [SCC (LiveBasicBlock instr)] -- ^ instructions annotated with "deaths" -> UniqDSM ([NatBasicBlock instr], RegAllocStats, Int) @@ -260,7 +260,7 @@ linearRegAlloc' config initFreeRegs entry_ids block_live sccs linearRA_SCCs :: OutputableRegConstraint freeRegs instr => [BlockId] - -> BlockMap (UniqSet RegWithFormat) + -> BlockMap Regs -> [NatBasicBlock instr] -> [SCC (LiveBasicBlock instr)] -> RegM freeRegs [NatBasicBlock instr] @@ -295,7 +295,7 @@ linearRA_SCCs entry_ids block_live blocksAcc (CyclicSCC blocks : sccs) process :: forall freeRegs instr. (OutputableRegConstraint freeRegs instr) => [BlockId] - -> BlockMap (UniqSet RegWithFormat) + -> BlockMap Regs -> [GenBasicBlock (LiveInstr instr)] -> RegM freeRegs [[NatBasicBlock instr]] process entry_ids block_live = @@ -334,7 +334,7 @@ process entry_ids block_live = -- processBlock :: OutputableRegConstraint freeRegs instr - => BlockMap (UniqSet RegWithFormat) -- ^ live regs on entry to each basic block + => BlockMap Regs -- ^ live regs on entry to each basic block -> LiveBasicBlock instr -- ^ block to do register allocation on -> RegM freeRegs [NatBasicBlock instr] -- ^ block with registers allocated @@ -351,7 +351,7 @@ processBlock block_live (BasicBlock id instrs) -- | Load the freeregs and current reg assignment into the RegM state -- for the basic block with this BlockId. initBlock :: FR freeRegs - => BlockId -> BlockMap (UniqSet RegWithFormat) -> RegM freeRegs () + => BlockId -> BlockMap Regs -> RegM freeRegs () initBlock id block_live = do platform <- getPlatform block_assig <- getBlockAssigR @@ -368,7 +368,7 @@ initBlock id block_live setFreeRegsR (frInitFreeRegs platform) Just live -> setFreeRegsR $ foldl' (flip $ frAllocateReg platform) (frInitFreeRegs platform) - (nonDetEltsUniqSet $ takeRealRegs live) + (nonDetEltsUniqSet $ takeRealRegs $ getRegs live) -- See Note [Unique Determinism and code generation] setAssigR emptyRegMap @@ -381,7 +381,7 @@ initBlock id block_live -- | Do allocation for a sequence of instructions. linearRA :: forall freeRegs instr. (OutputableRegConstraint freeRegs instr) - => BlockMap (UniqSet RegWithFormat) -- ^ map of what vregs are live on entry to each block. + => BlockMap Regs -- ^ map of what vregs are live on entry to each block. -> BlockId -- ^ id of the current block, for debugging. -> [LiveInstr instr] -- ^ liveness annotated instructions in this block. -> RegM freeRegs @@ -406,7 +406,7 @@ linearRA block_live block_id = go [] [] -- | Do allocation for a single instruction. raInsn :: OutputableRegConstraint freeRegs instr - => BlockMap (UniqSet RegWithFormat) -- ^ map of what vregs are love on entry to each block. + => BlockMap Regs -- ^ map of what vregs are love on entry to each block. -> [instr] -- ^ accumulator for instructions already processed. -> BlockId -- ^ the id of the current block, for debugging -> LiveInstr instr -- ^ the instr to have its regs allocated, with liveness info. @@ -427,7 +427,7 @@ raInsn _ new_instrs _ (LiveInstr ii@(Instr i) Nothing) raInsn block_live new_instrs id (LiveInstr (Instr instr) (Just live)) = do platform <- getPlatform - assig <- getAssigR :: RegM freeRegs (UniqFM Reg Loc) + assig <- getAssigR -- If we have a reg->reg move between virtual registers, where the -- src register is not live after this instruction, and the dst @@ -437,12 +437,12 @@ raInsn block_live new_instrs id (LiveInstr (Instr instr) (Just live)) -- (we can't eliminate it if the source register is on the stack, because -- we do not want to use one spill slot for different virtual registers) case takeRegRegMoveInstr platform instr of - Just (src,dst) | Just (RegWithFormat _ fmt) <- lookupUniqSet_Directly (liveDieRead live) (getUnique src), + Just (src,dst) | Just fmt <- lookupReg src (liveDieRead live), isVirtualReg dst, not (dst `elemUFM` assig), isRealReg src || isInReg src assig -> do case src of - RegReal rr -> setAssigR (addToUFM assig dst (InReg $ RealRegUsage rr fmt)) + RegReal rr -> setAssigR (addToUFM assig dst (Loc (InReg rr) fmt)) -- if src is a fixed reg, then we just map dest to this -- reg in the assignment. src must be an allocatable reg, -- otherwise it wouldn't be in r_dying. @@ -461,8 +461,8 @@ raInsn block_live new_instrs id (LiveInstr (Instr instr) (Just live)) return (new_instrs, []) _ -> genRaInsn block_live new_instrs id instr - (map regWithFormat_reg $ nonDetEltsUniqSet $ liveDieRead live) - (map regWithFormat_reg $ nonDetEltsUniqSet $ liveDieWrite live) + (map regWithFormat_reg $ nonDetEltsUniqSet $ getRegs $ liveDieRead live) + (map regWithFormat_reg $ nonDetEltsUniqSet $ getRegs $ liveDieWrite live) -- See Note [Unique Determinism and code generation] raInsn _ _ _ instr @@ -485,13 +485,16 @@ raInsn _ _ _ instr isInReg :: Reg -> RegMap Loc -> Bool -isInReg src assig | Just (InReg _) <- lookupUFM assig src = True - | otherwise = False +isInReg src assig + | Just (Loc (InReg _) _) <- lookupUFM assig src + = True + | otherwise + = False genRaInsn :: forall freeRegs instr. (OutputableRegConstraint freeRegs instr) - => BlockMap (UniqSet RegWithFormat) + => BlockMap Regs -> [instr] -> BlockId -> instr @@ -643,14 +646,16 @@ releaseRegs regs = do loop assig !free (RegReal rr : rs) = loop assig (frReleaseReg platform rr free) rs loop assig !free (r:rs) = case lookupUFM assig r of - Just (InBoth real _) -> loop (delFromUFM assig r) - (frReleaseReg platform (realReg real) free) rs - Just (InReg real) -> loop (delFromUFM assig r) - (frReleaseReg platform (realReg real) free) rs - _ -> loop (delFromUFM assig r) free rs + Just (Loc (InBoth real _) _) -> + loop (delFromUFM assig r) + (frReleaseReg platform real free) rs + Just (Loc (InReg real) _) -> + loop (delFromUFM assig r) + (frReleaseReg platform real free) rs + _ -> + loop (delFromUFM assig r) free rs loop assig free regs - -- ----------------------------------------------------------------------------- -- Clobber real registers @@ -668,17 +673,18 @@ releaseRegs regs = do saveClobberedTemps :: forall instr freeRegs. (Instruction instr, FR freeRegs) - => [RealReg] -- real registers clobbered by this instruction - -> [Reg] -- registers which are no longer live after this insn - -> RegM freeRegs [instr] -- return: instructions to spill any temps that will - -- be clobbered. + => [RealReg] -- ^ real registers clobbered by this instruction + -> [Reg] -- ^ registers which are no longer live after this instruction, + -- because read for the last time + -> RegM freeRegs [instr] -- return: instructions to spill any temps that will + -- be clobbered. saveClobberedTemps [] _ = return [] saveClobberedTemps clobbered dying = do - assig <- getAssigR :: RegM freeRegs (UniqFM Reg Loc) + assig <- getAssigR (assig',instrs) <- nonDetStrictFoldUFM_DirectlyM maybe_spill (assig,[]) assig setAssigR assig' return $ -- mkComment (text "<saveClobberedTemps>") ++ @@ -687,19 +693,21 @@ saveClobberedTemps clobbered dying where -- Unique represents the VirtualReg -- Here we separate the cases which we do want to spill from these we don't. - maybe_spill :: Unique -> (RegMap Loc,[instr]) -> (Loc) -> RegM freeRegs (RegMap Loc,[instr]) + maybe_spill :: Unique + -> (RegMap Loc,[instr]) + -> Loc + -> RegM freeRegs (RegMap Loc,[instr]) maybe_spill !temp !(assig,instrs) !loc = case loc of -- This is non-deterministic but we do not -- currently support deterministic code-generation. -- See Note [Unique Determinism and code generation] - InReg reg - | any (realRegsAlias $ realReg reg) clobbered + Loc (InReg reg) fmt + | any (realRegsAlias reg) clobbered , temp `notElem` map getUnique dying - -> clobber temp (assig,instrs) reg + -> clobber temp (assig,instrs) (RealRegUsage reg fmt) _ -> return (assig,instrs) - -- See Note [UniqFM and the register allocator] clobber :: Unique -> (RegMap Loc,[instr]) -> RealRegUsage -> RegM freeRegs (RegMap Loc,[instr]) clobber temp (assig,instrs) (RealRegUsage reg fmt) @@ -718,7 +726,7 @@ saveClobberedTemps clobbered dying (my_reg : _) -> do setFreeRegsR (frAllocateReg platform my_reg freeRegs) - let new_assign = addToUFM_Directly assig temp (InReg (RealRegUsage my_reg fmt)) + let new_assign = addToUFM_Directly assig temp (Loc (InReg my_reg) fmt) let instr = mkRegRegMoveInstr config fmt (RegReal reg) (RegReal my_reg) @@ -726,12 +734,13 @@ saveClobberedTemps clobbered dying -- (2) no free registers: spill the value [] -> do + (spill, slot) <- spillR (RegWithFormat (RegReal reg) fmt) temp -- record why this reg was spilled for profiling recordSpill (SpillClobber temp) - let new_assign = addToUFM_Directly assig temp (InBoth (RealRegUsage reg fmt) slot) + let new_assign = addToUFM_Directly assig temp (Loc (InBoth reg slot) fmt) return (new_assign, (spill ++ instrs)) @@ -779,9 +788,9 @@ clobberRegs clobbered clobber assig [] = assig - clobber assig ((temp, InBoth reg slot) : rest) - | any (realRegsAlias $ realReg reg) clobbered - = clobber (addToUFM_Directly assig temp (InMem slot)) rest + clobber assig ((temp, Loc (InBoth reg slot) regFmt) : rest) + | any (realRegsAlias reg) clobbered + = clobber (addToUFM_Directly assig temp (Loc (InMem slot) regFmt)) rest clobber assig (_:rest) = clobber assig rest @@ -790,9 +799,9 @@ clobberRegs clobbered -- allocateRegsAndSpill -- Why are we performing a spill? -data SpillLoc = ReadMem StackSlot -- reading from register only in memory - | WriteNew -- writing to a new variable - | WriteMem -- writing to register only in memory +data SpillLoc = ReadMem StackSlot Format -- reading from register only in memory + | WriteNew -- writing to a new variable + | WriteMem -- writing to register only in memory -- Note that ReadNew is not valid, since you don't want to be reading -- from an uninitialized register. We also don't need the location of -- the register in memory, since that will be invalidated by the write. @@ -818,28 +827,36 @@ allocateRegsAndSpill allocateRegsAndSpill _ _ spills alloc [] = return (spills, reverse alloc) -allocateRegsAndSpill reading keep spills alloc (r@(VirtualRegWithFormat vr _fmt):rs) +allocateRegsAndSpill reading keep spills alloc (r@(VirtualRegWithFormat vr vrFmt):rs) = do assig <- toVRegMap <$> getAssigR -- pprTraceM "allocateRegsAndSpill:assig" (ppr (r:rs) $$ ppr assig) -- See Note [UniqFM and the register allocator] let doSpill = allocRegsAndSpill_spill reading keep spills alloc r rs assig case lookupUFM assig vr of -- case (1a): already in a register - Just (InReg my_reg) -> - allocateRegsAndSpill reading keep spills (realReg my_reg:alloc) rs + Just (Loc (InReg my_reg) in_reg_fmt) -> do + -- (RF1) from Note [Allocated register formats]: + -- writes redefine the format the register is used at. + when (not reading && vrFmt /= in_reg_fmt) $ + setAssigR $ toRegMap $ + addToUFM assig vr (Loc (InReg my_reg) vrFmt) + allocateRegsAndSpill reading keep spills (my_reg:alloc) rs -- case (1b): already in a register (and memory) - -- NB1. if we're writing this register, update its assignment to be - -- InReg, because the memory value is no longer valid. - -- NB2. This is why we must process written registers here, even if they - -- are also read by the same instruction. - Just (InBoth my_reg _) - -> do when (not reading) (setAssigR $ toRegMap (addToUFM assig vr (InReg my_reg))) - allocateRegsAndSpill reading keep spills (realReg my_reg:alloc) rs + Just (Loc (InBoth my_reg _) _) -> do + -- NB1. if we're writing this register, update its assignment to be + -- InReg, because the memory value is no longer valid. + -- NB2. This is why we must process written registers here, even if they + -- are also read by the same instruction. + when (not reading) $ + setAssigR $ toRegMap $ + addToUFM assig vr (Loc (InReg my_reg) vrFmt) + allocateRegsAndSpill reading keep spills (my_reg:alloc) rs -- Not already in a register, so we need to find a free one... - Just (InMem slot) | reading -> doSpill (ReadMem slot) - | otherwise -> doSpill WriteMem + Just (Loc (InMem slot) memFmt) + | reading -> doSpill (ReadMem slot memFmt) + | otherwise -> doSpill WriteMem Nothing | reading -> pprPanic "allocateRegsAndSpill: Cannot read from uninitialized register" (ppr vr) -- NOTE: if the input to the NCG contains some @@ -875,7 +892,7 @@ allocRegsAndSpill_spill :: (FR freeRegs, Instruction instr) -> UniqFM VirtualReg Loc -> SpillLoc -> RegM freeRegs ([instr], [RealReg]) -allocRegsAndSpill_spill reading keep spills alloc r@(VirtualRegWithFormat vr fmt) rs assig spill_loc +allocRegsAndSpill_spill reading keep spills alloc r@(VirtualRegWithFormat vr vrFmt) rs assig spill_loc = do platform <- getPlatform freeRegs <- getFreeRegsR let regclass = classOfVirtualReg (platformArch platform) vr @@ -897,7 +914,7 @@ allocRegsAndSpill_spill reading keep spills alloc r@(VirtualRegWithFormat vr fmt spills' <- loadTemp r spill_loc final_reg spills setAssigR $ toRegMap - $ (addToUFM assig vr $! newLocation spill_loc $ RealRegUsage final_reg fmt) + $ (addToUFM assig vr $! newLocation spill_loc $ RealRegUsage final_reg vrFmt) setFreeRegsR $ frAllocateReg platform final_reg freeRegs allocateRegsAndSpill reading keep spills' (final_reg : alloc) rs @@ -911,7 +928,7 @@ allocRegsAndSpill_spill reading keep spills alloc r@(VirtualRegWithFormat vr fmt let candidates' :: UniqFM VirtualReg Loc candidates' = flip delListFromUFM (fmap virtualRegWithFormat_reg keep) $ - filterUFM inRegOrBoth $ + filterUFM (inRegOrBoth . locWithFormat_loc) $ assig -- This is non-deterministic but we do not -- currently support deterministic code-generation. @@ -924,25 +941,25 @@ allocRegsAndSpill_spill reading keep spills alloc r@(VirtualRegWithFormat vr fmt == regclass candidates_inBoth :: [(Unique, RealRegUsage, StackSlot)] candidates_inBoth - = [ (temp, reg, mem) - | (temp, InBoth reg mem) <- candidates - , compat (realReg reg) ] + = [ (temp, RealRegUsage reg fmt, mem) + | (temp, Loc (InBoth reg mem) fmt) <- candidates + , compat reg ] -- the vregs we could kick out that are only in a reg -- this would require writing the reg to a new slot before using it. let candidates_inReg - = [ (temp, reg) - | (temp, InReg reg) <- candidates - , compat (realReg reg) ] + = [ (temp, RealRegUsage reg fmt) + | (temp, Loc (InReg reg) fmt) <- candidates + , compat reg ] let result -- we have a temporary that is in both register and mem, -- just free up its register for use. - | (temp, (RealRegUsage cand_reg _old_fmt), slot) : _ <- candidates_inBoth + | (temp, (RealRegUsage cand_reg old_fmt), slot) : _ <- candidates_inBoth = do spills' <- loadTemp r spill_loc cand_reg spills - let assig1 = addToUFM_Directly assig temp (InMem slot) - let assig2 = addToUFM assig1 vr $! newLocation spill_loc (RealRegUsage cand_reg fmt) + let assig1 = addToUFM_Directly assig temp $ Loc (InMem slot) old_fmt + let assig2 = addToUFM assig1 vr $! newLocation spill_loc (RealRegUsage cand_reg vrFmt) setAssigR $ toRegMap assig2 allocateRegsAndSpill reading keep spills' (cand_reg:alloc) rs @@ -962,8 +979,8 @@ allocRegsAndSpill_spill reading keep spills alloc r@(VirtualRegWithFormat vr fmt -- - the old data is now only in memory, -- - the new data is now allocated to this register; -- make sure to use the new format (#26542) - let assig1 = addToUFM_Directly assig temp_to_push_out (InMem slot) - let assig2 = addToUFM assig1 vr $! newLocation spill_loc (RealRegUsage cand_reg fmt) + let assig1 = addToUFM_Directly assig temp_to_push_out $ Loc (InMem slot) old_reg_fmt + let assig2 = addToUFM assig1 vr $! newLocation spill_loc (RealRegUsage cand_reg vrFmt) setAssigR $ toRegMap assig2 -- if need be, load up a spilled temp into the reg we've just freed up. @@ -980,7 +997,7 @@ allocRegsAndSpill_spill reading keep spills alloc r@(VirtualRegWithFormat vr fmt $ vcat [ text "allocating vreg: " <> text (show vr) , text "assignment: " <> ppr assig - , text "format: " <> ppr fmt + , text "format: " <> ppr vrFmt , text "freeRegs: " <> text (showRegs freeRegs) , text "initFreeRegs: " <> text (showRegs (frInitFreeRegs platform `asTypeOf` freeRegs)) ] @@ -992,9 +1009,12 @@ allocRegsAndSpill_spill reading keep spills alloc r@(VirtualRegWithFormat vr fmt -- | Calculate a new location after a register has been loaded. newLocation :: SpillLoc -> RealRegUsage -> Loc -- if the tmp was read from a slot, then now its in a reg as well -newLocation (ReadMem slot) my_reg = InBoth my_reg slot +newLocation (ReadMem slot memFmt) (RealRegUsage r _regFmt) = + -- See Note [Use spilled format when reloading] + Loc (InBoth r slot) memFmt + -- writes will always result in only the register being available -newLocation _ my_reg = InReg my_reg +newLocation _ (RealRegUsage r regFmt) = Loc (InReg r) regFmt -- | Load up a spilled temporary if we need to (read from memory). loadTemp @@ -1005,11 +1025,91 @@ loadTemp -> [instr] -> RegM freeRegs [instr] -loadTemp (VirtualRegWithFormat vreg fmt) (ReadMem slot) hreg spills +loadTemp (VirtualRegWithFormat vreg _fmt) (ReadMem slot memFmt) hreg spills = do - insn <- loadR (RegWithFormat (RegReal hreg) fmt) slot + -- See Note [Use spilled format when reloading] + insn <- loadR (RegWithFormat (RegReal hreg) memFmt) slot recordSpill (SpillLoad $ getUnique vreg) return $ {- mkComment (text "spill load") : -} insn ++ spills loadTemp _ _ _ spills = return spills + +{- Note [Allocated register formats] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +We uphold the following principle for the format at which we keep track of +alllocated registers: + + RF1. Writes redefine the format. + + When we write to a register 'r' at format 'fmt', we consider the register + to hold that format going forwards. + + (In cases where a partial write is desired, the move instruction should + specify that the destination format is the full register, even if, say, + the instruction only writes to the low 64 bits of the register. + See also Wrinkle [Don't allow scalar partial writes] in + Note [Register formats in liveness analysis] in GHC.CmmToAsm.Reg.Liveness.) + + RF2. Reads from a register do not redefine its format. + + Generally speaking, as explained in Note [Register formats in liveness analysis] + in GHC.CmmToAsm.Reg.Liveness, when computing the used format from a collection + of reads, we take a least upper bound. + +It is particularly important to get (RF1) correct, as otherwise we can end up in +the situation of T26411b, where code such as + + movsd .Ln6m(%rip),%v1 + shufpd $0,%v1,%v1 + +we start off with %v1 :: F64, but after shufpd (which broadcasts the low part +to the high part) we must consider that %v1 :: F64x2. If we fail to do that, +then we will silently discard the top bits in spill/reload operations. +-} + +{- Note [Use spilled format when reloading] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +We always reload at the full format that a register was spilled at. The rationale +is as follows: + + 1. If later instructions only need the lower 64 bits of an XMM register, + then we should have only spilled the lower 64 bits in the first place. + (Whether this is true currently is another question.) + 2. If later instructions need say 128 bits, then we should immediately load + the entire 128 bits, as this avoids multiple load instructions. + +For (2), consider the situation of #26526, where we need to spill around a C +call (because we are using the System V ABI with no callee saved XMM registers). +Before register allocation, we have: + + vmovupd %v1 %v0 + call ... + movsd %v0 %v3 + movhlps %v0 %v4 + +The contents of %v0 need to be preserved across the call. We must spill %v0 at +format F64x2 (as later instructions need the entire 128 bits), and reload it +later. We thus expect something like: + + vmovupd %xmm1 %xmm0 + vmovupd %xmm0 72(%rsp) -- spill to preserve + call ... + vmovupd 72(%rsp) %xmm0 -- restore + movsd %xmm0 %xmm3 + movhlps %xmm0 %xmm4 + +This is certainly better than doing two loads from the stack, e.g. + + call ... + movsd 72(%rsp) %xmm0 -- restore only lower 64 bits + movsd %xmm0 %xmm3 + vmovupd 72(%rsp) %xmm0 -- restore the full 128 bits + movhlps %xmm0 %xmm4 + +The latter being especially risky because we don't want to believe %v0 is 'InBoth' +with format F64. The risk is that, when allocating registers for the 'VMOVUPD' +instruction, we think our data is already in a register and thus doesn't need to +be reloaded from memory, when in fact we have only loaded the lower 64 bits of +the data. +-} ===================================== compiler/GHC/CmmToAsm/Reg/Linear/Base.hs ===================================== @@ -1,3 +1,4 @@ +{-# LANGUAGE LambdaCase #-} {-# LANGUAGE RecordWildCards #-} -- | Put common type definitions here to break recursive module dependencies. @@ -9,7 +10,7 @@ module GHC.CmmToAsm.Reg.Linear.Base ( emptyBlockAssignment, updateBlockAssignment, - Loc(..), + VLoc(..), Loc(..), IgnoreFormat(..), regsOfLoc, RealRegUsage(..), @@ -39,8 +40,6 @@ import GHC.Cmm.Dataflow.Label import GHC.CmmToAsm.Reg.Utils import GHC.CmmToAsm.Format -import Data.Function ( on ) - data ReadingOrWriting = Reading | Writing deriving (Eq,Ord) -- | Used to store the register assignment on entry to a basic block. @@ -70,8 +69,13 @@ updateBlockAssignment :: BlockId -> BlockAssignment freeRegs -> BlockAssignment freeRegs updateBlockAssignment dest (freeRegs, regMap) (BlockAssignment {..}) = - BlockAssignment (mapInsert dest (freeRegs, regMap) blockMap) - (mergeUFM combWithExisting id (mapMaybeUFM fromLoc) (firstUsed) (toVRegMap regMap)) + BlockAssignment + (mapInsert dest (freeRegs, regMap) blockMap) + (mergeUFM combWithExisting id + (mapMaybeUFM (fromVLoc . locWithFormat_loc)) + firstUsed + (toVRegMap regMap) + ) where -- The blocks are processed in dependency order, so if there's already an -- entry in the map then keep that assignment rather than writing the new @@ -79,13 +83,14 @@ updateBlockAssignment dest (freeRegs, regMap) (BlockAssignment {..}) = combWithExisting :: RealReg -> Loc -> Maybe RealReg combWithExisting old_reg _ = Just $ old_reg - fromLoc :: Loc -> Maybe RealReg - fromLoc (InReg rr) = Just $ realReg rr - fromLoc (InBoth rr _) = Just $ realReg rr - fromLoc _ = Nothing - + fromVLoc :: VLoc -> Maybe RealReg + fromVLoc (InReg rr) = Just rr + fromVLoc (InBoth rr _) = Just rr + fromVLoc _ = Nothing --- | Where a vreg is currently stored +-- | Where a vreg is currently stored. +-- +-- -- A temporary can be marked as living in both a register and memory -- (InBoth), for example if it was recently loaded from a spill location. -- This makes it cheap to spill (no save instruction required), but we @@ -96,22 +101,41 @@ updateBlockAssignment dest (freeRegs, regMap) (BlockAssignment {..}) = -- save it in a spill location, but mark it as InBoth because the current -- instruction might still want to read it. -- -data Loc +data VLoc -- | vreg is in a register - = InReg {-# UNPACK #-} !RealRegUsage + = InReg {-# UNPACK #-} !RealReg -- | vreg is held in stack slots - | InMem {-# UNPACK #-} !StackSlot - + | InMem {-# UNPACK #-} !StackSlot -- | vreg is held in both a register and stack slots - | InBoth {-# UNPACK #-} !RealRegUsage - {-# UNPACK #-} !StackSlot + | InBoth {-# UNPACK #-} !RealReg + {-# UNPACK #-} !StackSlot deriving (Eq, Ord, Show) -instance Outputable Loc where +-- | Where a virtual register is stored, together with the format it is stored at. +-- +-- See 'VLoc'. +data Loc + = Loc + { locWithFormat_loc :: {-# UNPACK #-} !VLoc + , locWithFormat_format :: Format + } + +-- | A newtype used to hang off 'Eq' and 'Ord' instances for 'Loc' which +-- ignore the format, as used in 'GHC.CmmToAsm.Reg.Linear.JoinToTargets'. +newtype IgnoreFormat a = IgnoreFormat a +instance Eq (IgnoreFormat Loc) where + IgnoreFormat (Loc l1 _) == IgnoreFormat (Loc l2 _) = l1 == l2 +instance Ord (IgnoreFormat Loc) where + compare (IgnoreFormat (Loc l1 _)) (IgnoreFormat (Loc l2 _)) = compare l1 l2 + +instance Outputable VLoc where ppr l = text (show l) +instance Outputable Loc where + ppr (Loc loc fmt) = parens (ppr loc <+> dcolon <+> ppr fmt) + -- | A 'RealReg', together with the specific 'Format' it is used at. data RealRegUsage = RealRegUsage @@ -122,22 +146,12 @@ data RealRegUsage instance Outputable RealRegUsage where ppr (RealRegUsage r fmt) = ppr r <> dcolon <+> ppr fmt --- NB: these instances only compare the underlying 'RealReg', as that is what --- is important for register allocation. --- --- (It would nonetheless be a good idea to remove these instances.) -instance Eq RealRegUsage where - (==) = (==) `on` realReg -instance Ord RealRegUsage where - compare = compare `on` realReg - -- | Get the reg numbers stored in this Loc. -regsOfLoc :: Loc -> [RealRegUsage] +regsOfLoc :: VLoc -> [RealReg] regsOfLoc (InReg r) = [r] regsOfLoc (InBoth r _) = [r] regsOfLoc (InMem _) = [] - -- | Reasons why instructions might be inserted by the spiller. -- Used when generating stats for -ddrop-asm-stats. -- ===================================== compiler/GHC/CmmToAsm/Reg/Linear/JoinToTargets.hs ===================================== @@ -33,12 +33,14 @@ import GHC.Utils.Outputable import GHC.CmmToAsm.Format import GHC.Types.Unique.Set +import Data.Coerce (coerce) + -- | For a jump instruction at the end of a block, generate fixup code so its -- vregs are in the correct regs for its destination. -- joinToTargets :: (FR freeRegs, Instruction instr) - => BlockMap (UniqSet RegWithFormat) -- ^ maps the unique of the blockid to the set of vregs + => BlockMap Regs -- ^ maps the unique of the blockid to the set of vregs -- that are known to be live on the entry to each block. -> BlockId -- ^ id of the current block @@ -62,7 +64,7 @@ joinToTargets block_live id instr ----- joinToTargets' :: (FR freeRegs, Instruction instr) - => BlockMap (UniqSet RegWithFormat) -- ^ maps the unique of the blockid to the set of vregs + => BlockMap Regs -- ^ maps the unique of the blockid to the set of vregs -- that are known to be live on the entry to each block. -> [NatBasicBlock instr] -- ^ acc blocks of fixup code. @@ -90,23 +92,23 @@ joinToTargets' block_live new_blocks block_id instr (dest:dests) -- adjust the current assignment to remove any vregs that are not live -- on entry to the destination block. let live_set = expectJust $ mapLookup dest block_live - let still_live uniq _ = uniq `elemUniqSet_Directly` live_set + let still_live uniq _ = uniq `elemUniqSet_Directly` getRegs live_set let adjusted_assig = filterUFM_Directly still_live assig -- and free up those registers which are now free. let to_free = - [ r | (reg, loc) <- nonDetUFMToList assig + [ r | (reg, Loc loc _locFmt) <- nonDetUFMToList assig -- This is non-deterministic but we do not -- currently support deterministic code-generation. -- See Note [Unique Determinism and code generation] - , not (elemUniqSet_Directly reg live_set) + , not (elemUniqSet_Directly reg $ getRegs live_set) , r <- regsOfLoc loc ] case lookupBlockAssignment dest block_assig of Nothing -> joinToTargets_first block_live new_blocks block_id instr dest dests - block_assig adjusted_assig $ map realReg to_free + block_assig adjusted_assig to_free Just (_, dest_assig) -> joinToTargets_again @@ -116,7 +118,7 @@ joinToTargets' block_live new_blocks block_id instr (dest:dests) -- this is the first time we jumped to this block. joinToTargets_first :: (FR freeRegs, Instruction instr) - => BlockMap (UniqSet RegWithFormat) + => BlockMap Regs -> [NatBasicBlock instr] -> BlockId -> instr @@ -142,10 +144,9 @@ joinToTargets_first block_live new_blocks block_id instr dest dests joinToTargets' block_live new_blocks block_id instr dests - -- we've jumped to this block before joinToTargets_again :: (Instruction instr, FR freeRegs) - => BlockMap (UniqSet RegWithFormat) + => BlockMap Regs -> [NatBasicBlock instr] -> BlockId -> instr @@ -159,7 +160,9 @@ joinToTargets_again src_assig dest_assig -- the assignments already match, no problem. - | nonDetUFMToList dest_assig == nonDetUFMToList src_assig + | equalIgnoringFormats + (nonDetUFMToList dest_assig) + (nonDetUFMToList src_assig) -- This is non-deterministic but we do not -- currently support deterministic code-generation. -- See Note [Unique Determinism and code generation] @@ -183,7 +186,7 @@ joinToTargets_again -- -- We need to do the R2 -> R3 move before R1 -> R2. -- - let sccs = stronglyConnCompFromEdgedVerticesOrdR graph + let sccs = movementGraphSCCs graph -- debugging {- @@ -267,30 +270,36 @@ makeRegMovementGraph adjusted_assig dest_assig -- expandNode :: a - -> Loc -- ^ source of move - -> Loc -- ^ destination of move - -> [Node Loc a ] - -expandNode vreg loc@(InReg src) (InBoth dst mem) - | src == dst = [DigraphNode vreg loc [InMem mem]] - | otherwise = [DigraphNode vreg loc [InReg dst, InMem mem]] - -expandNode vreg loc@(InMem src) (InBoth dst mem) - | src == mem = [DigraphNode vreg loc [InReg dst]] - | otherwise = [DigraphNode vreg loc [InReg dst, InMem mem]] - -expandNode _ (InBoth _ src) (InMem dst) - | src == dst = [] -- guaranteed to be true - -expandNode _ (InBoth src _) (InReg dst) - | src == dst = [] - -expandNode vreg (InBoth src _) dst - = expandNode vreg (InReg src) dst - -expandNode vreg src dst - | src == dst = [] - | otherwise = [DigraphNode vreg src [dst]] + -> Loc -- ^ source of move + -> Loc -- ^ destination of move + -> [Node Loc a] +expandNode vreg src@(Loc srcLoc srcFmt) dst@(Loc dstLoc dstFmt) = + case (srcLoc, dstLoc) of + (InReg srcReg, InBoth dstReg dstMem) + | srcReg == dstReg + -> [DigraphNode vreg src [Loc (InMem dstMem) dstFmt]] + | otherwise + -> [DigraphNode vreg src [Loc (InReg dstReg) dstFmt + ,Loc (InMem dstMem) dstFmt]] + (InMem srcMem, InBoth dstReg dstMem) + | srcMem == dstMem + -> [DigraphNode vreg src [Loc (InReg dstReg) dstFmt]] + | otherwise + -> [DigraphNode vreg src [Loc (InReg dstReg) dstFmt + ,Loc (InMem dstMem) dstFmt]] + (InBoth _ srcMem, InMem dstMem) + | srcMem == dstMem + -> [] -- guaranteed to be true + (InBoth srcReg _, InReg dstReg) + | srcReg == dstReg + -> [] + (InBoth srcReg _, _) + -> expandNode vreg (Loc (InReg srcReg) srcFmt) dst + _ + | srcLoc == dstLoc + -> [] + | otherwise + -> [DigraphNode vreg src [dst]] -- | Generate fixup code for a particular component in the move graph @@ -327,7 +336,7 @@ handleComponent delta _ (AcyclicSCC (DigraphNode vreg src dsts)) -- require a fixup. -- handleComponent delta instr - (CyclicSCC ((DigraphNode vreg (InReg (RealRegUsage sreg scls)) ((InReg (RealRegUsage dreg dcls): _))) : rest)) + (CyclicSCC ((DigraphNode vreg (Loc (InReg sreg) scls) ((Loc (InReg dreg) dcls: _))) : rest)) -- dest list may have more than one element, if the reg is also InMem. = do -- spill the source into its slot @@ -338,7 +347,7 @@ handleComponent delta instr instrLoad <- loadR (RegWithFormat (RegReal dreg) dcls) slot remainingFixUps <- mapM (handleComponent delta instr) - (stronglyConnCompFromEdgedVerticesOrdR rest) + (movementGraphSCCs rest) -- make sure to do all the reloads after all the spills, -- so we don't end up clobbering the source values. @@ -347,29 +356,37 @@ handleComponent delta instr handleComponent _ _ (CyclicSCC _) = panic "Register Allocator: handleComponent cyclic" +-- Helper functions that use the @Ord (IgnoreFormat Loc)@ instance. + +equalIgnoringFormats :: [(Unique, Loc)] -> [(Unique, Loc)] -> Bool +equalIgnoringFormats = + coerce $ (==) @[(Unique, IgnoreFormat Loc)] +movementGraphSCCs :: [Node Loc Unique] -> [SCC (Node Loc Unique)] +movementGraphSCCs = + coerce $ stronglyConnCompFromEdgedVerticesOrdR @(IgnoreFormat Loc) @Unique -- | Move a vreg between these two locations. -- makeMove :: Instruction instr - => Int -- ^ current C stack delta. - -> Unique -- ^ unique of the vreg that we're moving. - -> Loc -- ^ source location. - -> Loc -- ^ destination location. - -> RegM freeRegs [instr] -- ^ move instruction. + => Int -- ^ current C stack delta + -> Unique -- ^ unique of the vreg that we're moving + -> Loc -- ^ source location + -> Loc -- ^ destination location + -> RegM freeRegs [instr] -- ^ move instruction -makeMove delta vreg src dst +makeMove delta vreg (Loc src _srcFmt) (Loc dst dstFmt) = do config <- getConfig case (src, dst) of - (InReg (RealRegUsage s _), InReg (RealRegUsage d fmt)) -> + (InReg s, InReg d) -> do recordSpill (SpillJoinRR vreg) - return $ [mkRegRegMoveInstr config fmt (RegReal s) (RegReal d)] - (InMem s, InReg (RealRegUsage d cls)) -> + return $ [mkRegRegMoveInstr config dstFmt (RegReal s) (RegReal d)] + (InMem s, InReg d) -> do recordSpill (SpillJoinRM vreg) - return $ mkLoadInstr config (RegWithFormat (RegReal d) cls) delta s - (InReg (RealRegUsage s cls), InMem d) -> + return $ mkLoadInstr config (RegWithFormat (RegReal d) dstFmt) delta s + (InReg s, InMem d) -> do recordSpill (SpillJoinRM vreg) - return $ mkSpillInstr config (RegWithFormat (RegReal s) cls) delta d + return $ mkSpillInstr config (RegWithFormat (RegReal s) dstFmt) delta d _ -> -- we don't handle memory to memory moves. -- they shouldn't happen because we don't share ===================================== compiler/GHC/CmmToAsm/Reg/Liveness.hs ===================================== @@ -30,7 +30,9 @@ module GHC.CmmToAsm.Reg.Liveness ( patchRegsLiveInstr, reverseBlocksInTops, regLiveness, - cmmTopLiveness + cmmTopLiveness, + + module GHC.CmmToAsm.Reg.Regs ) where import GHC.Prelude @@ -41,11 +43,11 @@ import GHC.CmmToAsm.Config import GHC.CmmToAsm.Format import GHC.CmmToAsm.Types import GHC.CmmToAsm.Utils +import GHC.CmmToAsm.Reg.Regs import GHC.Cmm.BlockId import GHC.Cmm.Dataflow.Label import GHC.Cmm -import GHC.CmmToAsm.Reg.Target import GHC.Data.Graph.Directed import GHC.Data.OrdList @@ -189,9 +191,9 @@ data LiveInstr instr data Liveness = Liveness - { liveBorn :: UniqSet RegWithFormat -- ^ registers born in this instruction (written to for first time). - , liveDieRead :: UniqSet RegWithFormat -- ^ registers that died because they were read for the last time. - , liveDieWrite :: UniqSet RegWithFormat} -- ^ registers that died because they were clobbered by something. + { liveBorn :: Regs -- ^ registers born in this instruction (written to for first time). + , liveDieRead :: Regs -- ^ registers that died because they were read for the last time. + , liveDieWrite :: Regs } -- ^ registers that died because they were clobbered by something. -- | Stash regs live on entry to each basic block in the info part of the cmm code. @@ -200,7 +202,7 @@ data LiveInfo (LabelMap RawCmmStatics) -- cmm info table static stuff [BlockId] -- entry points (first one is the -- entry point for the proc). - (BlockMap (UniqSet RegWithFormat)) -- argument locals live on entry to this block + (BlockMap Regs) -- argument locals live on entry to this block (BlockMap IntSet) -- stack slots live on entry to this block @@ -246,8 +248,8 @@ instance Outputable instr , pprRegs (text "# w_dying: ") (liveDieWrite live) ] $+$ space) - where pprRegs :: SDoc -> UniqSet RegWithFormat -> SDoc - pprRegs name regs + where pprRegs :: SDoc -> Regs -> SDoc + pprRegs name ( Regs regs ) | isEmptyUniqSet regs = empty | otherwise = name <> (pprUFM (getUniqSet regs) (hcat . punctuate space . map ppr)) @@ -330,7 +332,7 @@ slurpConflicts :: Instruction instr => Platform -> LiveCmmDecl statics instr - -> (Bag (UniqSet RegWithFormat), Bag (Reg, Reg)) + -> (Bag Regs, Bag (Reg, Reg)) slurpConflicts platform live = slurpCmm (emptyBag, emptyBag) live @@ -364,23 +366,22 @@ slurpConflicts platform live = let -- regs that die because they are read for the last time at the start of an instruction -- are not live across it. - rsLiveAcross = rsLiveEntry `minusUniqSet` (liveDieRead live) + rsLiveAcross = rsLiveEntry `minusRegs` (liveDieRead live) -- regs live on entry to the next instruction. -- be careful of orphans, make sure to delete dying regs _after_ unioning -- in the ones that are born here. - rsLiveNext = (rsLiveAcross `unionUniqSets` (liveBorn live)) - `minusUniqSet` (liveDieWrite live) + rsLiveNext = (rsLiveAcross `unionRegsMaxFmt` (liveBorn live)) + `minusCoveredRegs` (liveDieWrite live) -- orphan vregs are the ones that die in the same instruction they are born in. -- these are likely to be results that are never used, but we still -- need to assign a hreg to them.. - rsOrphans = intersectUniqSets + rsOrphans = intersectRegsMaxFmt (liveBorn live) - (unionUniqSets (liveDieWrite live) (liveDieRead live)) + (unionRegsMaxFmt (liveDieWrite live) (liveDieRead live)) - -- - rsConflicts = unionUniqSets rsLiveNext rsOrphans + rsConflicts = unionRegsMaxFmt rsLiveNext rsOrphans in case takeRegRegMoveInstr platform instr of Just rr -> slurpLIs rsLiveNext @@ -619,7 +620,7 @@ patchEraseLive platform patchF cmm | LiveInfo static id blockMap mLiveSlots <- info = let -- See Note [Unique Determinism and code generation] - blockMap' = mapMap (mapRegFormatSet patchF) blockMap + blockMap' = mapMap (mapRegs patchF) blockMap info' = LiveInfo static id blockMap' mLiveSlots in CmmProc info' label live $ map patchSCC sccs @@ -648,8 +649,8 @@ patchEraseLive platform patchF cmm | r1 == r2 = True -- destination reg is never used - | elemUniqSet_Directly (getUnique r2) (liveBorn live) - , elemUniqSet_Directly (getUnique r2) (liveDieRead live) || elemUniqSet_Directly (getUnique r2) (liveDieWrite live) + | r2 `elemRegs` liveBorn live + , r2 `elemRegs` liveDieRead live || r2 `elemRegs` liveDieWrite live = True | otherwise = False @@ -673,9 +674,9 @@ patchRegsLiveInstr platform patchF li (patchRegsOfInstr platform instr patchF) (Just live { -- WARNING: have to go via lists here because patchF changes the uniq in the Reg - liveBorn = mapRegFormatSet patchF $ liveBorn live - , liveDieRead = mapRegFormatSet patchF $ liveDieRead live - , liveDieWrite = mapRegFormatSet patchF $ liveDieWrite live }) + liveBorn = mapRegs patchF $ liveBorn live + , liveDieRead = mapRegs patchF $ liveDieRead live + , liveDieWrite = mapRegs patchF $ liveDieWrite live }) -- See Note [Unique Determinism and code generation] -------------------------------------------------------------------------------- @@ -865,7 +866,7 @@ computeLiveness -> [SCC (LiveBasicBlock instr)] -> ([SCC (LiveBasicBlock instr)], -- instructions annotated with list of registers -- which are "dead after this instruction". - BlockMap (UniqSet RegWithFormat)) -- blocks annotated with set of live registers + BlockMap Regs) -- blocks annotated with set of live registers -- on entry to the block. computeLiveness platform sccs @@ -880,11 +881,11 @@ computeLiveness platform sccs livenessSCCs :: Instruction instr => Platform - -> BlockMap (UniqSet RegWithFormat) + -> BlockMap Regs -> [SCC (LiveBasicBlock instr)] -- accum -> [SCC (LiveBasicBlock instr)] -> ( [SCC (LiveBasicBlock instr)] - , BlockMap (UniqSet RegWithFormat)) + , BlockMap Regs) livenessSCCs _ blockmap done [] = (done, blockmap) @@ -913,13 +914,14 @@ livenessSCCs platform blockmap done linearLiveness :: Instruction instr - => BlockMap (UniqSet RegWithFormat) -> [LiveBasicBlock instr] - -> (BlockMap (UniqSet RegWithFormat), [LiveBasicBlock instr]) + => BlockMap Regs -> [LiveBasicBlock instr] + -> (BlockMap Regs, [LiveBasicBlock instr]) linearLiveness = mapAccumL (livenessBlock platform) -- probably the least efficient way to compare two -- BlockMaps for equality. + equalBlockMaps :: BlockMap Regs -> BlockMap Regs -> Bool equalBlockMaps a b = a' == b' where a' = mapToList a @@ -933,14 +935,14 @@ livenessSCCs platform blockmap done livenessBlock :: Instruction instr => Platform - -> BlockMap (UniqSet RegWithFormat) + -> BlockMap Regs -> LiveBasicBlock instr - -> (BlockMap (UniqSet RegWithFormat), LiveBasicBlock instr) + -> (BlockMap Regs, LiveBasicBlock instr) livenessBlock platform blockmap (BasicBlock block_id instrs) = let (regsLiveOnEntry, instrs1) - = livenessBack platform emptyUniqSet blockmap [] (reverse instrs) + = livenessBack platform noRegs blockmap [] (reverse instrs) blockmap' = mapInsert block_id regsLiveOnEntry blockmap instrs2 = livenessForward platform regsLiveOnEntry instrs1 @@ -955,23 +957,26 @@ livenessBlock platform blockmap (BasicBlock block_id instrs) livenessForward :: Instruction instr => Platform - -> UniqSet RegWithFormat -- regs live on this instr + -> Regs -- regs live on this instr -> [LiveInstr instr] -> [LiveInstr instr] livenessForward _ _ [] = [] livenessForward platform rsLiveEntry (li@(LiveInstr instr mLive) : lis) | Just live <- mLive = let - RU _ written = regUsageOfInstr platform instr + RU _ rsWritten = regUsageOfInstr platform instr -- Regs that are written to but weren't live on entry to this instruction -- are recorded as being born here. - rsBorn = mkUniqSet - $ filter (\ r -> not $ elemUniqSet_Directly (getUnique r) rsLiveEntry) - $ written + rsBorn = mkRegsMaxFmt + [ reg + | reg@( RegWithFormat r _ ) <- rsWritten + , not $ r `elemRegs` rsLiveEntry + ] - rsLiveNext = (rsLiveEntry `unionUniqSets` rsBorn) - `minusUniqSet` (liveDieRead live) - `minusUniqSet` (liveDieWrite live) + -- See Note [Register formats in liveness analysis] + rsLiveNext = (rsLiveEntry `addRegsMaxFmt` rsWritten) + `minusRegs` (liveDieRead live) -- (FmtFwd1) + `minusRegs` (liveDieWrite live) -- (FmtFwd2) in LiveInstr instr (Just live { liveBorn = rsBorn }) : livenessForward platform rsLiveNext lis @@ -986,11 +991,11 @@ livenessForward platform rsLiveEntry (li@(LiveInstr instr mLive) : lis) livenessBack :: Instruction instr => Platform - -> UniqSet RegWithFormat -- regs live on this instr - -> BlockMap (UniqSet RegWithFormat) -- regs live on entry to other BBs - -> [LiveInstr instr] -- instructions (accum) - -> [LiveInstr instr] -- instructions - -> (UniqSet RegWithFormat, [LiveInstr instr]) + -> Regs -- ^ regs live on this instr + -> BlockMap Regs -- ^ regs live on entry to other BBs + -> [LiveInstr instr] -- ^ instructions (accum) + -> [LiveInstr instr] -- ^ instructions + -> (Regs, [LiveInstr instr]) livenessBack _ liveregs _ done [] = (liveregs, done) @@ -998,15 +1003,14 @@ livenessBack platform liveregs blockmap acc (instr : instrs) = let !(!liveregs', instr') = liveness1 platform liveregs blockmap instr in livenessBack platform liveregs' blockmap (instr' : acc) instrs - -- don't bother tagging comments or deltas with liveness liveness1 :: Instruction instr => Platform - -> UniqSet RegWithFormat - -> BlockMap (UniqSet RegWithFormat) + -> Regs + -> BlockMap Regs -> LiveInstr instr - -> (UniqSet RegWithFormat, LiveInstr instr) + -> (Regs, LiveInstr instr) liveness1 _ liveregs _ (LiveInstr instr _) | isMetaInstr instr @@ -1017,14 +1021,14 @@ liveness1 platform liveregs blockmap (LiveInstr instr _) | not_a_branch = (liveregs1, LiveInstr instr (Just $ Liveness - { liveBorn = emptyUniqSet + { liveBorn = noRegs , liveDieRead = r_dying , liveDieWrite = w_dying })) | otherwise = (liveregs_br, LiveInstr instr (Just $ Liveness - { liveBorn = emptyUniqSet + { liveBorn = noRegs , liveDieRead = r_dying_br , liveDieWrite = w_dying })) @@ -1033,21 +1037,22 @@ liveness1 platform liveregs blockmap (LiveInstr instr _) -- registers that were written here are dead going backwards. -- registers that were read here are live going backwards. - liveregs1 = (liveregs `delListFromUniqSet` written) - `addListToUniqSet` read + -- As for the formats, see Note [Register formats in liveness analysis] + liveregs1 = (liveregs `minusCoveredRegs` mkRegsMaxFmt written) -- (FmtBwd2) + `addRegsMaxFmt` read -- (FmtBwd1) - -- registers that are not live beyond this point, are recorded - -- as dying here. - r_dying = mkUniqSet + -- registers that are not live beyond this point are recorded + -- as dying here. + r_dying = mkRegsMaxFmt [ reg | reg@(RegWithFormat r _) <- read , not $ any (\ w -> getUnique w == getUnique r) written - , not (elementOfUniqSet reg liveregs) ] + , not $ r `elemRegs` liveregs ] - w_dying = mkUniqSet + w_dying = mkRegsMaxFmt [ reg - | reg <- written - , not (elementOfUniqSet reg liveregs) ] + | reg@(RegWithFormat r _) <- written + , not $ r `elemRegs` liveregs ] -- union in the live regs from all the jump destinations of this -- instruction. @@ -1057,14 +1062,91 @@ liveness1 platform liveregs blockmap (LiveInstr instr _) targetLiveRegs target = case mapLookup target blockmap of Just ra -> ra - Nothing -> emptyUniqSet - - live_from_branch = unionManyUniqSets (map targetLiveRegs targets) - - liveregs_br = liveregs1 `unionUniqSets` live_from_branch + Nothing -> noRegs -- registers that are live only in the branch targets should -- be listed as dying here. - live_branch_only = live_from_branch `minusUniqSet` liveregs - r_dying_br = (r_dying `unionUniqSets` live_branch_only) - -- See Note [Unique Determinism and code generation] + live_from_branch = unionManyRegsMaxFmt (map targetLiveRegs targets) + liveregs_br = liveregs1 `unionRegsMaxFmt` live_from_branch + live_branch_only = live_from_branch `minusRegs` liveregs + r_dying_br = r_dying `unionRegsMaxFmt` live_branch_only + -- NB: we treat registers live in branches similar to any other + -- registers read by the instruction, so the logic here matches + -- the logic in the definition of 'r_dying' above. + +{- Note [Register formats in liveness analysis] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +We keep track of which format each virtual register is live at, and make use +of this information during liveness analysis. + +First, we do backwards liveness analysis: + + (FmtBwd1) Take the larger format when computing registers live going backwards. + + Suppose for example that we have: + + <previous instructions> + movps %v0 %v1 + movupd %v0 %v2 + + Here we read %v0 both at format F64 and F64x2, so we must consider it live + at format F64x2, going backwards, in the previous instructions. + Not doing so caused #26411. + + (FmtBwd2) Only consider fully clobbered registers to be dead going backwards. + + Consider for example the liveness of %v0 going backwards in the following + instruction block: + + movlhps %v5 %v0 -- write the upper F64 of %v0 + movupd %v1 %v2 -- some unrelated instruction + movsd %v3 %v0 -- write the lower F64 of %v0 + movupd %v0 %v4 -- read %v0 at format F64x2 + + We must not consider %v0 to be dead going backwards from 'movsd %v3 %v0'. + If we do, that means we think %v0 is dead during 'movupd %v1 %v2', and thus + that we can assign both %v0 and %v2 to the same real register. However, this + would be catastrophic, as 'movupd %v1 %v2' would then clobber the data + written to '%v0' in 'movlhps %v5 %v0'. + + Wrinkle [Don't allow scalar partial writes] + + We don't allow partial writes within scalar registers, for many reasons: + + - partial writes can cause partial register stalls, which can have + disastrous performance implications (as seen in #20405) + - partial writes makes register allocation more difficult, as they can + require preserving the contents of a register across many instructions, + as in: + + mulw %v0 -- 32-bit write to %rax + <many instructions> + mulb %v1 -- 16-bit partial write to %rax + + The current register allocator is not equipped for spilling real + registers (only virtual registers), which means that e.g. on i386 we + end up with only 2 allocatable real GP registers for <many instructions>, + which is insufficient for instructions that require 3 registers. + + We could allow this to be customised depending on the architecture, but + currently we simply never allow scalar partial writes. + +The forwards analysis is a bit simpler: + + (FmtFwd1) Remove without considering format when dead going forwards. + + If a register is no longer read after an instruction, then it is dead + going forwards. The format doesn't matter. + + (FmtFwd2) Consider all writes as making a register dead going forwards. + + If we write to the lower 64 bits of a 128 bit register, we don't currently + have a way to say "the lower 64 bits are dead but the top 64 bits are still live". + We would need a notion of partial register, similar to 'VirtualRegHi' for + the top 32 bits of a I32x2 virtual register. + + As a result, the current approach is to consider the entire register to + be dead. This might cause us to unnecessarily spill/reload an entire vector + register to avoid its lower bits getting clobbered even though later + instructions might only care about its upper bits. +-} ===================================== compiler/GHC/CmmToAsm/Reg/Regs.hs ===================================== @@ -0,0 +1,119 @@ +{-# LANGUAGE DerivingStrategies #-} + +module GHC.CmmToAsm.Reg.Regs ( + Regs(..), + noRegs, + addRegMaxFmt, addRegsMaxFmt, + mkRegsMaxFmt, + minusCoveredRegs, + minusRegs, + unionRegsMaxFmt, + unionManyRegsMaxFmt, + intersectRegsMaxFmt, + shrinkingRegs, + mapRegs, + elemRegs, lookupReg, + + ) where + +import GHC.Prelude + +import GHC.Platform.Reg ( Reg ) +import GHC.CmmToAsm.Format ( Format, RegWithFormat(..), isVecFormat ) + +import GHC.Utils.Outputable ( Outputable ) +import GHC.Types.Unique ( Uniquable(..) ) +import GHC.Types.Unique.Set + +import Data.Coerce ( coerce ) + +----------------------------------------------------------------------------- + +-- | A set of registers, with their respective formats, mostly for use in +-- register liveness analysis. See Note [Register formats in liveness analysis] +-- in GHC.CmmToAsm.Reg.Liveness. +newtype Regs = Regs { getRegs :: UniqSet RegWithFormat } + deriving newtype (Eq, Outputable) + +maxRegWithFormat :: RegWithFormat -> RegWithFormat -> RegWithFormat +maxRegWithFormat r1@(RegWithFormat _ fmt1) r2@(RegWithFormat _ fmt2) + = if fmt1 >= fmt2 + then r1 + else r2 + -- Re-using one of the arguments avoids allocating a new 'RegWithFormat', + -- compared with returning 'RegWithFormat r1 (max fmt1 fmt2)'. + +noRegs :: Regs +noRegs = Regs emptyUniqSet + +addRegsMaxFmt :: Regs -> [RegWithFormat] -> Regs +addRegsMaxFmt = foldl' addRegMaxFmt + +mkRegsMaxFmt :: [RegWithFormat] -> Regs +mkRegsMaxFmt = addRegsMaxFmt noRegs + +addRegMaxFmt :: Regs -> RegWithFormat -> Regs +addRegMaxFmt = coerce $ strictAddOneToUniqSet_C maxRegWithFormat + -- Don't build up thunks when combining with 'maxRegWithFormat' + +-- | Remove 2nd argument registers from the 1st argument, but only +-- if the format in the second argument is at least as large as the format +-- in the first argument. +minusCoveredRegs :: Regs -> Regs -> Regs +minusCoveredRegs = coerce $ minusUniqSet_C f + where + f :: RegWithFormat -> RegWithFormat -> Maybe RegWithFormat + f r1@(RegWithFormat _ fmt1) (RegWithFormat _ fmt2) = + if fmt2 >= fmt1 + || + not ( isVecFormat fmt1 ) + -- See Wrinkle [Don't allow scalar partial writes] + -- in Note [Register formats in liveness analysis] in GHC.CmmToAsm.Reg.Liveness. + then Nothing + else Just r1 + +-- | Remove 2nd argument registers from the 1st argument, regardless of format. +-- +-- See also 'minusCoveredRegs', which looks at the formats. +minusRegs :: Regs -> Regs -> Regs +minusRegs = coerce $ minusUniqSet @RegWithFormat + +unionRegsMaxFmt :: Regs -> Regs -> Regs +unionRegsMaxFmt = coerce $ strictUnionUniqSets_C maxRegWithFormat + -- Don't build up thunks when combining with 'maxRegWithFormat' + +unionManyRegsMaxFmt :: [Regs] -> Regs +unionManyRegsMaxFmt = coerce $ strictUnionManyUniqSets_C maxRegWithFormat + -- Don't build up thunks when combining with 'maxRegWithFormat' + +intersectRegsMaxFmt :: Regs -> Regs -> Regs +intersectRegsMaxFmt = coerce $ strictIntersectUniqSets_C maxRegWithFormat + -- Don't build up thunks when combining with 'maxRegWithFormat' + +-- | Computes the set of registers in both arguments whose size is smaller in +-- the second argument than in the first. +shrinkingRegs :: Regs -> Regs -> Regs +shrinkingRegs = coerce $ minusUniqSet_C f + where + f :: RegWithFormat -> RegWithFormat -> Maybe RegWithFormat + f (RegWithFormat _ fmt1) r2@(RegWithFormat _ fmt2) + | fmt2 < fmt1 + = Just r2 + | otherwise + = Nothing + +-- | Map a function that may change the 'Unique' of the register, +-- which entails going via lists. +-- +-- See Note [UniqSet invariant] in GHC.Types.Unique.Set. +mapRegs :: (Reg -> Reg) -> Regs -> Regs +mapRegs f (Regs live) = + Regs $ + mapUniqSet (\ (RegWithFormat r fmt) -> RegWithFormat (f r) fmt) live + +elemRegs :: Reg -> Regs -> Bool +elemRegs r (Regs live) = elemUniqSet_Directly (getUnique r) live + +lookupReg :: Reg -> Regs -> Maybe Format +lookupReg r (Regs live) = + regWithFormat_format <$> lookupUniqSet_Directly live (getUnique r) ===================================== compiler/GHC/CmmToAsm/Reg/Target.hs ===================================== @@ -15,7 +15,6 @@ module GHC.CmmToAsm.Reg.Target ( targetMkVirtualReg, targetRegDotColor, targetClassOfReg, - mapRegFormatSet, ) where @@ -27,10 +26,8 @@ import GHC.Platform.Reg.Class import GHC.CmmToAsm.Format import GHC.Utils.Outputable -import GHC.Utils.Misc import GHC.Utils.Panic import GHC.Types.Unique -import GHC.Types.Unique.Set import GHC.Platform import qualified GHC.CmmToAsm.X86.Regs as X86 @@ -142,6 +139,3 @@ targetClassOfReg platform reg = case reg of RegVirtual vr -> classOfVirtualReg (platformArch platform) vr RegReal rr -> targetClassOfRealReg platform rr - -mapRegFormatSet :: HasDebugCallStack => (Reg -> Reg) -> UniqSet RegWithFormat -> UniqSet RegWithFormat -mapRegFormatSet f = mapUniqSet (\ ( RegWithFormat r fmt ) -> RegWithFormat ( f r ) fmt) ===================================== compiler/GHC/CmmToAsm/X86/CodeGen.hs ===================================== @@ -54,7 +54,9 @@ import GHC.CmmToAsm.CFG import GHC.CmmToAsm.Format import GHC.CmmToAsm.Config import GHC.Platform.Reg +import GHC.CmmToAsm.Reg.Target (targetClassOfReg) import GHC.Platform +import GHC.Platform.Reg.Class.Unified (RegClass(..)) -- Our intermediate code: import GHC.Types.Basic @@ -4697,7 +4699,14 @@ genCCall64 addr conv dest_regs args = do -- It's not safe to omit this assignment, even if the number -- of SSE2 regs in use is zero. If %al is larger than 8 -- on entry to a varargs function, seg faults ensue. - nb_sse_regs_used = count (isFloatFormat . regWithFormat_format) arg_regs_used + is_sse_reg (RegWithFormat r _) = + -- NB: use 'targetClassOfRealReg' to compute whether this is an SSE + -- register or not, as we may have decided to e.g. store a 64-bit + -- integer in an xmm register. + case targetClassOfReg platform r of + RcFloatOrVector -> True + RcInteger -> False + nb_sse_regs_used = count is_sse_reg arg_regs_used assign_eax_sse_regs = unitOL (MOV II32 (OpImm (ImmInt nb_sse_regs_used)) (OpReg eax)) -- Note: we do this on Windows as well. It's not entirely clear why ===================================== compiler/GHC/CmmToAsm/X86/Instr.hs ===================================== @@ -114,9 +114,12 @@ data Instr -- | X86 scalar move instruction. -- - -- When used at a vector format, only moves the lower 64 bits of data; - -- the rest of the data in the destination may either be zeroed or - -- preserved, depending on the specific format and operands. + -- The format is the format the destination is written to. For an XMM + -- register, using a scalar format means that we don't care about the + -- upper bits, while using a vector format means that we care about the + -- upper bits, even though we are only writing to the lower bits. + -- + -- See also Note [Allocated register formats] in GHC.CmmToAsm.Reg.Linear. | MOV Format Operand Operand -- N.B. Due to AT&T assembler quirks, when used with 'II64' -- 'Format' immediate source and memory target operand, the source @@ -410,18 +413,27 @@ data FMAPermutation = FMA132 | FMA213 | FMA231 regUsageOfInstr :: Platform -> Instr -> RegUsage regUsageOfInstr platform instr = case instr of - MOV fmt src dst + + -- Recall that MOV is always a scalar move instruction, but when the destination + -- is an XMM register, we make the distinction between: + -- + -- - a scalar format, meaning that from now on we no longer care about the top bits + -- of the register, and + -- - a vector format, meaning that we still care about what's in the high bits. + -- + -- See Note [Allocated register formats] in GHC.CmmToAsm.Reg.Linear. + MOV dst_fmt src dst -- MOVSS/MOVSD preserve the upper half of vector registers, -- but only for reg-2-reg moves - | VecFormat _ sFmt <- fmt + | VecFormat _ sFmt <- dst_fmt , isFloatScalarFormat sFmt , OpReg {} <- src , OpReg {} <- dst - -> usageRM fmt src dst + -> usageRM dst_fmt src dst -- other MOV instructions zero any remaining upper part of the destination -- (largely to avoid partial register stalls) | otherwise - -> usageRW fmt src dst + -> usageRW dst_fmt src dst MOVD fmt1 fmt2 src dst -> -- NB: MOVD and MOVQ always zero any remaining upper part of destination, -- so the destination is "written" not "modified". @@ -437,7 +449,7 @@ regUsageOfInstr platform instr IMUL fmt src dst -> usageRM fmt src dst -- Result of IMULB will be in just in %ax - IMUL2 II8 src -> mkRU (mk II8 eax:use_R II8 src []) [mk II8 eax] + IMUL2 II8 src -> mkRU (mk II8 eax:use_R II8 src []) [mk II16 eax] -- Result of IMUL for wider values, will be split between %dx/%edx/%rdx and -- %ax/%eax/%rax. IMUL2 fmt src -> mkRU (mk fmt eax:use_R fmt src []) [mk fmt eax,mk fmt edx] ===================================== compiler/GHC/Types/Unique/FM.hs ===================================== @@ -38,6 +38,7 @@ module GHC.Types.Unique.FM ( listToUFM_C, listToIdentityUFM, addToUFM,addToUFM_C,addToUFM_Acc,addToUFM_L, + strictAddToUFM_C, addListToUFM,addListToUFM_C, addToUFM_Directly, addListToUFM_Directly, @@ -62,6 +63,7 @@ module GHC.Types.Unique.FM ( minusUFM_C, intersectUFM, intersectUFM_C, + strictIntersectUFM_C, disjointUFM, equalKeysUFM, diffUFM, @@ -178,6 +180,16 @@ addToUFM_C addToUFM_C f (UFM m) k v = UFM (M.insertWith (flip f) (getKey $ getUnique k) v m) +strictAddToUFM_C + :: Uniquable key + => (elt -> elt -> elt) -- ^ old -> new -> result + -> UniqFM key elt -- ^ old + -> key -> elt -- ^ new + -> UniqFM key elt -- ^ result +-- Arguments of combining function of MS.insertWith and strictAddToUFM_C are flipped. +strictAddToUFM_C f (UFM m) k v = + UFM (MS.insertWith (flip f) (getKey $ getUnique k) v m) + addToUFM_Acc :: Uniquable key => (elt -> elts -> elts) -- Add to existing @@ -391,6 +403,13 @@ intersectUFM_C -> UniqFM key elt3 intersectUFM_C f (UFM x) (UFM y) = UFM (M.intersectionWith f x y) +strictIntersectUFM_C + :: (elt1 -> elt2 -> elt3) + -> UniqFM key elt1 + -> UniqFM key elt2 + -> UniqFM key elt3 +strictIntersectUFM_C f (UFM x) (UFM y) = UFM (MS.intersectionWith f x y) + disjointUFM :: UniqFM key elt1 -> UniqFM key elt2 -> Bool disjointUFM (UFM x) (UFM y) = M.disjoint x y ===================================== compiler/GHC/Types/Unique/Set.hs ===================================== @@ -19,12 +19,14 @@ module GHC.Types.Unique.Set ( emptyUniqSet, unitUniqSet, mkUniqSet, - addOneToUniqSet, addListToUniqSet, + addOneToUniqSet, addListToUniqSet, strictAddOneToUniqSet_C, delOneFromUniqSet, delOneFromUniqSet_Directly, delListFromUniqSet, delListFromUniqSet_Directly, unionUniqSets, unionManyUniqSets, - minusUniqSet, uniqSetMinusUFM, uniqSetMinusUDFM, - intersectUniqSets, + strictUnionUniqSets_C, strictUnionManyUniqSets_C, + minusUniqSet, minusUniqSet_C, + uniqSetMinusUFM, uniqSetMinusUDFM, + intersectUniqSets, strictIntersectUniqSets_C, disjointUniqSets, restrictUniqSetToUFM, uniqSetAny, uniqSetAll, @@ -109,6 +111,10 @@ addListToUniqSet :: Uniquable a => UniqSet a -> [a] -> UniqSet a addListToUniqSet = foldl' addOneToUniqSet {-# INLINEABLE addListToUniqSet #-} +strictAddOneToUniqSet_C :: Uniquable a => (a -> a -> a) -> UniqSet a -> a -> UniqSet a +strictAddOneToUniqSet_C f (UniqSet set) x = + UniqSet (strictAddToUFM_C f set x x) + delOneFromUniqSet :: Uniquable a => UniqSet a -> a -> UniqSet a delOneFromUniqSet (UniqSet s) a = UniqSet (delFromUFM s a) @@ -127,15 +133,29 @@ delListFromUniqSet_Directly (UniqSet s) l = unionUniqSets :: UniqSet a -> UniqSet a -> UniqSet a unionUniqSets (UniqSet s) (UniqSet t) = UniqSet (plusUFM s t) +strictUnionUniqSets_C :: (a -> a -> a) -> UniqSet a -> UniqSet a -> UniqSet a +strictUnionUniqSets_C f (UniqSet s) (UniqSet t) = + UniqSet (strictPlusUFM_C f s t) + unionManyUniqSets :: [UniqSet a] -> UniqSet a unionManyUniqSets = foldl' (flip unionUniqSets) emptyUniqSet +strictUnionManyUniqSets_C :: (a -> a -> a) -> [UniqSet a] -> UniqSet a +strictUnionManyUniqSets_C f = foldl' (flip (strictUnionUniqSets_C f)) emptyUniqSet + minusUniqSet :: UniqSet a -> UniqSet a -> UniqSet a minusUniqSet (UniqSet s) (UniqSet t) = UniqSet (minusUFM s t) +minusUniqSet_C :: (a -> a -> Maybe a) -> UniqSet a -> UniqSet a -> UniqSet a +minusUniqSet_C f (UniqSet s) (UniqSet t) = UniqSet (minusUFM_C f s t) + intersectUniqSets :: UniqSet a -> UniqSet a -> UniqSet a intersectUniqSets (UniqSet s) (UniqSet t) = UniqSet (intersectUFM s t) +strictIntersectUniqSets_C :: (a -> a -> a) -> UniqSet a -> UniqSet a -> UniqSet a +strictIntersectUniqSets_C f (UniqSet s) (UniqSet t) = + UniqSet (strictIntersectUFM_C f s t) + disjointUniqSets :: UniqSet a -> UniqSet a -> Bool disjointUniqSets (UniqSet s) (UniqSet t) = disjointUFM s t ===================================== compiler/ghc.cabal.in ===================================== @@ -310,6 +310,7 @@ Library GHC.CmmToAsm.Reg.Linear.X86 GHC.CmmToAsm.Reg.Linear.X86_64 GHC.CmmToAsm.Reg.Liveness + GHC.CmmToAsm.Reg.Regs GHC.CmmToAsm.Reg.Target GHC.CmmToAsm.Reg.Utils GHC.CmmToAsm.RV64 ===================================== testsuite/tests/simd/should_run/T26411.hs ===================================== @@ -0,0 +1,57 @@ +{-# LANGUAGE MagicHash #-} +{-# LANGUAGE UnboxedTuples #-} + +module Main where + +import GHC.Exts + +data DoubleX32 = DoubleX32 + DoubleX2# DoubleX2# DoubleX2# DoubleX2# + DoubleX2# DoubleX2# DoubleX2# DoubleX2# + DoubleX2# DoubleX2# DoubleX2# DoubleX2# + DoubleX2# DoubleX2# DoubleX2# DoubleX2# + +doubleX32ToList :: DoubleX32 -> [Double] +doubleX32ToList (DoubleX32 v0 v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11 v12 v13 v14 v15) + = a v0 . a v1 . a v2 . a v3 . a v4 . a v5 . a v6 . a v7 . a v8 . a v9 . a v10 . a v11 . a v12 . a v13 . a v14 . a v15 $ [] + where + a v xs = case unpackDoubleX2# v of + (# x0, x1 #) -> D# x0 : D# x1 : xs + +doubleX32FromList :: [Double] -> DoubleX32 +doubleX32FromList [D# x0, D# x1, D# x2, D# x3, D# x4, D# x5, D# x6, D# x7, D# x8, D# x9, D# x10, D# x11, D# x12, D# x13, D# x14, D# x15, D# x16, D# x17, D# x18, D# x19, D# x20, D# x21, D# x22, D# x23, D# x24, D# x25, D# x26, D# x27, D# x28, D# x29, D# x30, D# x31] + = DoubleX32 + (packDoubleX2# (# x0, x1 #)) (packDoubleX2# (# x2, x3 #)) (packDoubleX2# (# x4, x5 #)) (packDoubleX2# (# x6, x7 #)) + (packDoubleX2# (# x8, x9 #)) (packDoubleX2# (# x10, x11 #)) (packDoubleX2# (# x12, x13 #)) (packDoubleX2# (# x14, x15 #)) + (packDoubleX2# (# x16, x17 #)) (packDoubleX2# (# x18, x19 #)) (packDoubleX2# (# x20, x21 #)) (packDoubleX2# (# x22, x23 #)) + (packDoubleX2# (# x24, x25 #)) (packDoubleX2# (# x26, x27 #)) (packDoubleX2# (# x28, x29 #)) (packDoubleX2# (# x30, x31 #)) + +negateDoubleX32 :: DoubleX32 -> DoubleX32 +negateDoubleX32 (DoubleX32 v0 v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11 v12 v13 v14 v15) + = DoubleX32 + (negateDoubleX2# v0) (negateDoubleX2# v1) (negateDoubleX2# v2) (negateDoubleX2# v3) + (negateDoubleX2# v4) (negateDoubleX2# v5) (negateDoubleX2# v6) (negateDoubleX2# v7) + (negateDoubleX2# v8) (negateDoubleX2# v9) (negateDoubleX2# v10) (negateDoubleX2# v11) + (negateDoubleX2# v12) (negateDoubleX2# v13) (negateDoubleX2# v14) (negateDoubleX2# v15) + +recipDoubleX2# :: DoubleX2# -> DoubleX2# +recipDoubleX2# v = divideDoubleX2# (broadcastDoubleX2# 1.0##) v +{-# INLINE recipDoubleX2# #-} + +recipDoubleX32 :: DoubleX32 -> DoubleX32 +recipDoubleX32 (DoubleX32 v0 v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11 v12 v13 v14 v15) + = DoubleX32 + (recipDoubleX2# v0) (recipDoubleX2# v1) (recipDoubleX2# v2) (recipDoubleX2# v3) + (recipDoubleX2# v4) (recipDoubleX2# v5) (recipDoubleX2# v6) (recipDoubleX2# v7) + (recipDoubleX2# v8) (recipDoubleX2# v9) (recipDoubleX2# v10) (recipDoubleX2# v11) + (recipDoubleX2# v12) (recipDoubleX2# v13) (recipDoubleX2# v14) (recipDoubleX2# v15) + +main :: IO () +main = do + let a = doubleX32FromList [0..31] + b = negateDoubleX32 a + c = recipDoubleX32 a + print $ doubleX32ToList b + putStrLn $ if doubleX32ToList b == map negate [0..31] then "OK" else "Wrong" + print $ doubleX32ToList c + putStrLn $ if doubleX32ToList c == map recip [0..31] then "OK" else "Wrong" ===================================== testsuite/tests/simd/should_run/T26411.stdout ===================================== @@ -0,0 +1,4 @@ +[-0.0,-1.0,-2.0,-3.0,-4.0,-5.0,-6.0,-7.0,-8.0,-9.0,-10.0,-11.0,-12.0,-13.0,-14.0,-15.0,-16.0,-17.0,-18.0,-19.0,-20.0,-21.0,-22.0,-23.0,-24.0,-25.0,-26.0,-27.0,-28.0,-29.0,-30.0,-31.0] +OK +[Infinity,1.0,0.5,0.3333333333333333,0.25,0.2,0.16666666666666666,0.14285714285714285,0.125,0.1111111111111111,0.1,9.090909090909091e-2,8.333333333333333e-2,7.692307692307693e-2,7.142857142857142e-2,6.666666666666667e-2,6.25e-2,5.8823529411764705e-2,5.555555555555555e-2,5.263157894736842e-2,5.0e-2,4.7619047619047616e-2,4.5454545454545456e-2,4.3478260869565216e-2,4.1666666666666664e-2,4.0e-2,3.8461538461538464e-2,3.7037037037037035e-2,3.571428571428571e-2,3.4482758620689655e-2,3.333333333333333e-2,3.225806451612903e-2] +OK ===================================== testsuite/tests/simd/should_run/T26411b.hs ===================================== @@ -0,0 +1,73 @@ +{-# LANGUAGE MagicHash #-} +{-# LANGUAGE UnboxedTuples #-} + +module Main (main) where + +import GHC.Exts + +data DoubleX32 = DoubleX32 + DoubleX2# DoubleX2# DoubleX2# DoubleX2# + DoubleX2# DoubleX2# DoubleX2# DoubleX2# + DoubleX2# DoubleX2# DoubleX2# DoubleX2# + DoubleX2# DoubleX2# DoubleX2# DoubleX2# + +doubleX32ToList :: DoubleX32 -> [Double] +doubleX32ToList (DoubleX32 v0 v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11 v12 v13 v14 v15) + = a v0 . a v1 . a v2 . a v3 . a v4 . a v5 . a v6 . a v7 . a v8 . a v9 . a v10 . a v11 . a v12 . a v13 . a v14 . a v15 $ [] + where + a v xs = case unpackDoubleX2# v of + (# x0, x1 #) -> D# x0 : D# x1 : xs +{-# INLINE doubleX32ToList #-} + +doubleX32FromList :: [Double] -> DoubleX32 +doubleX32FromList [D# x0, D# x1, D# x2, D# x3, D# x4, D# x5, D# x6, D# x7, D# x8, D# x9, D# x10, D# x11, D# x12, D# x13, D# x14, D# x15, D# x16, D# x17, D# x18, D# x19, D# x20, D# x21, D# x22, D# x23, D# x24, D# x25, D# x26, D# x27, D# x28, D# x29, D# x30, D# x31] + = DoubleX32 + (packDoubleX2# (# x0, x1 #)) (packDoubleX2# (# x2, x3 #)) (packDoubleX2# (# x4, x5 #)) (packDoubleX2# (# x6, x7 #)) + (packDoubleX2# (# x8, x9 #)) (packDoubleX2# (# x10, x11 #)) (packDoubleX2# (# x12, x13 #)) (packDoubleX2# (# x14, x15 #)) + (packDoubleX2# (# x16, x17 #)) (packDoubleX2# (# x18, x19 #)) (packDoubleX2# (# x20, x21 #)) (packDoubleX2# (# x22, x23 #)) + (packDoubleX2# (# x24, x25 #)) (packDoubleX2# (# x26, x27 #)) (packDoubleX2# (# x28, x29 #)) (packDoubleX2# (# x30, x31 #)) +{-# NOINLINE doubleX32FromList #-} + +broadcastDoubleX32 :: Double -> DoubleX32 +broadcastDoubleX32 (D# x) + = let !v = broadcastDoubleX2# x + in DoubleX32 v v v v v v v v v v v v v v v v +{-# INLINE broadcastDoubleX32 #-} + +negateDoubleX32 :: DoubleX32 -> DoubleX32 +negateDoubleX32 (DoubleX32 v0 v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11 v12 v13 v14 v15) + = DoubleX32 + (negateDoubleX2# v0) (negateDoubleX2# v1) (negateDoubleX2# v2) (negateDoubleX2# v3) + (negateDoubleX2# v4) (negateDoubleX2# v5) (negateDoubleX2# v6) (negateDoubleX2# v7) + (negateDoubleX2# v8) (negateDoubleX2# v9) (negateDoubleX2# v10) (negateDoubleX2# v11) + (negateDoubleX2# v12) (negateDoubleX2# v13) (negateDoubleX2# v14) (negateDoubleX2# v15) +{-# NOINLINE negateDoubleX32 #-} + +recipDoubleX2# :: DoubleX2# -> DoubleX2# +recipDoubleX2# v = divideDoubleX2# (broadcastDoubleX2# 1.0##) v +{-# NOINLINE recipDoubleX2# #-} + +recipDoubleX32 :: DoubleX32 -> DoubleX32 +recipDoubleX32 (DoubleX32 v0 v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11 v12 v13 v14 v15) + = DoubleX32 + (recipDoubleX2# v0) (recipDoubleX2# v1) (recipDoubleX2# v2) (recipDoubleX2# v3) + (recipDoubleX2# v4) (recipDoubleX2# v5) (recipDoubleX2# v6) (recipDoubleX2# v7) + (recipDoubleX2# v8) (recipDoubleX2# v9) (recipDoubleX2# v10) (recipDoubleX2# v11) + (recipDoubleX2# v12) (recipDoubleX2# v13) (recipDoubleX2# v14) (recipDoubleX2# v15) +{-# NOINLINE recipDoubleX32 #-} + +divideDoubleX32 :: DoubleX32 -> DoubleX32 -> DoubleX32 +divideDoubleX32 (DoubleX32 u0 u1 u2 u3 u4 u5 u6 u7 u8 u9 u10 u11 u12 u13 u14 u15) (DoubleX32 v0 v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11 v12 v13 v14 v15) + = DoubleX32 + (divideDoubleX2# u0 v0) (divideDoubleX2# u1 v1) (divideDoubleX2# u2 v2) (divideDoubleX2# u3 v3) + (divideDoubleX2# u4 v4) (divideDoubleX2# u5 v5) (divideDoubleX2# u6 v6) (divideDoubleX2# u7 v7) + (divideDoubleX2# u8 v8) (divideDoubleX2# u9 v9) (divideDoubleX2# u10 v10) (divideDoubleX2# u11 v11) + (divideDoubleX2# u12 v12) (divideDoubleX2# u13 v13) (divideDoubleX2# u14 v14) (divideDoubleX2# u15 v15) +{-# INLINE divideDoubleX32 #-} + +main :: IO () +main = do + let a = doubleX32FromList [0..31] + b = divideDoubleX32 (broadcastDoubleX32 1.0) a + print $ doubleX32ToList b + putStrLn $ if doubleX32ToList b == map recip [0..31] then "OK" else "Wrong" ===================================== testsuite/tests/simd/should_run/T26411b.stdout ===================================== @@ -0,0 +1,2 @@ +[Infinity,1.0,0.5,0.3333333333333333,0.25,0.2,0.16666666666666666,0.14285714285714285,0.125,0.1111111111111111,0.1,9.090909090909091e-2,8.333333333333333e-2,7.692307692307693e-2,7.142857142857142e-2,6.666666666666667e-2,6.25e-2,5.8823529411764705e-2,5.555555555555555e-2,5.263157894736842e-2,5.0e-2,4.7619047619047616e-2,4.5454545454545456e-2,4.3478260869565216e-2,4.1666666666666664e-2,4.0e-2,3.8461538461538464e-2,3.7037037037037035e-2,3.571428571428571e-2,3.4482758620689655e-2,3.333333333333333e-2,3.225806451612903e-2] +OK ===================================== testsuite/tests/simd/should_run/all.T ===================================== @@ -89,6 +89,7 @@ test('simd012', [], compile_and_run, ['']) test('simd013', [ req_c , unless(arch('x86_64'), skip) # because the C file uses Intel intrinsics + , extra_ways(["optasm"]) # #26526 demonstrated a bug in the optasm way ], compile_and_run, ['simd013C.c']) test('simd014', @@ -145,6 +146,8 @@ test('T22187_run', [],compile_and_run,['']) test('T25062_V16', [], compile_and_run, ['']) test('T25561', [], compile_and_run, ['']) test('T26542', [], compile_and_run, ['']) +test('T26411', [], compile_and_run, ['']) +test('T26411b', [], compile_and_run, ['-O']) # Even if the CPU we run on doesn't support *executing* those tests we should try to # compile them. View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/6a2b43e3395902e88ec371c98cdb4af... -- View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/6a2b43e3395902e88ec371c98cdb4af... You're receiving this email because of your account on gitlab.haskell.org.