[Git][ghc/ghc][wip/mp/iface-patches-9.10] 6 commits: Replace `SizedSeq` with `FlatBag` for flattened structure
Matthew Pickering pushed to branch wip/mp/iface-patches-9.10 at Glasgow Haskell Compiler / GHC Commits: 0b16f517 by Fendor at 2025-12-19T10:13:20+00:00 Replace `SizedSeq` with `FlatBag` for flattened structure LinkedLists are notoriously memory ineffiecient when all we do is traversing a structure. As 'UnlinkedBCO' has been identified as a data structure that impacts the overall memory usage of GHCi sessions, we avoid linked lists and prefer flattened structure for storing. We introduce a new memory efficient representation of sequential elements that has special support for the cases: * Empty * Singleton * Tuple Elements This improves sharing in the 'Empty' case and avoids the overhead of 'Array' until its constant overhead is justified. (cherry picked from commit 45ab59dfe39ce32ec1db25644ed84bd44a9506f6) - - - - - add9f1c0 by Fendor at 2025-12-19T10:13:37+00:00 Compact FlatBag array representation `Array` contains three additional `Word`'s we do not need in `FlatBag`. Move `FlatBag` to `SmallArray`. Expand the API of SmallArray by `sizeofSmallArray` and add common traversal functions, such as `mapSmallArray` and `foldMapSmallArray`. Additionally, allow users to force the elements of a `SmallArray` via `rnfSmallArray`. (cherry picked from commit ac6a01bf3ef41460a5f0b4516c9784fa153e40b0) - - - - - 409aa3bc by Fendor at 2025-12-19T10:14:31+00:00 Prefer packed representation for CompiledByteCode As there are many 'CompiledByteCode' objects alive during a GHCi session, representing its element in a more packed manner improves space behaviour at a minimal cost. When running GHCi on the agda codebase, we find around 380 live 'CompiledByteCode' objects. Packing their respective 'UnlinkedByteCode' can save quite some pointers. (cherry picked from commit 0e00847fc15d76a3923029f072cb9e4bd62f6947) - - - - - 0cc7d458 by Fendor at 2025-12-19T10:19:12+00:00 Avoid UArray when indexing is not required `UnlinkedBCO`'s can occur many times in the heap. Each `UnlinkedBCO` references two `UArray`'s but never indexes them. They are only needed to encode the elements into a `ByteArray#`. The three words for the lower bound, upper bound and number of elements are essentially unused, thus we replace `UArray` with a wrapper around `ByteArray#`. This saves us up to three words for each `UnlinkedBCO`. Further, to avoid re-allocating these words for `ResolvedBCO`, we repeat the procedure for `ResolvedBCO` and add custom `Binary` and `Show` instances. For example, agda's repl session has around 360_000 UnlinkedBCO's, so avoiding these three words is already saving us around 8MB residency. - - - - - 1ecec34b by Fendor at 2025-12-19T10:23:59+00:00 Fix sharing of 'IfaceTyConInfo' during core to iface type translation During heap analysis, we noticed that during generation of 'mi_extra_decls' we have lots of duplicates for the instances: * `IfaceTyConInfo NotPromoted IfaceNormalTyCon` * `IfaceTyConInfo IsPromoted IfaceNormalTyCon` which should be shared instead of duplicated. This duplication increased the number of live bytes by around 200MB while loading the agda codebase into GHCi. These instances are created during `CoreToIface` translation, in particular `toIfaceTyCon`. The generated core looks like: toIfaceTyCon = \ tc_sjJw -> case $wtoIfaceTyCon tc_sjJw of { (# ww_sjJz, ww1_sjNL, ww2_sjNM #) -> IfaceTyCon ww_sjJz (IfaceTyConInfo ww1_sjNL ww2_sjNM) } whichs removes causes the sharing to work propery. Adding explicit sharing, with NOINLINE annotations, changes the core to: toIfaceTyCon = \ tc_sjJq -> case $wtoIfaceTyCon tc_sjJq of { (# ww_sjNB, ww1_sjNC #) -> IfaceTyCon ww_sjNB ww1_sjNC } which looks much more like sharing is happening. We confirmed via ghc-debug that all duplications were eliminated and the number of live bytes are noticeably reduced. - - - - - a92cf018 by Fendor at 2025-12-19T10:28:16+00:00 Never UNPACK `FastMutInt` for counting z-encoded `FastString`s In `FastStringTable`, we count the number of z-encoded FastStrings that exist in a GHC session. We used to UNPACK the counters to not waste memory, but live retainer analysis showed that we allocate a lot of `FastMutInt`s, retained by `mkFastZString`. We lazily compute the `FastZString`, only incrementing the counter when the `FastZString` is forced. The function `mkFastStringWith` calls `mkZFastString` and boxes the `FastMutInt`, leading to the following core: mkFastStringWith = \ mk_fs _ -> = case stringTable of { FastStringTable _ n_zencs segments# _ -> ... case ((mk_fs (I# ...) (FastMutInt n_zencs)) `cast` Co:2 :: ...) ... Marking this field as `NOUNPACK` avoids this reboxing, eliminating the allocation of a fresh `FastMutInt` on every `FastString` allocation. - - - - - 15 changed files: - compiler/GHC/ByteCode/Asm.hs - compiler/GHC/ByteCode/Linker.hs - compiler/GHC/ByteCode/Types.hs - compiler/GHC/Data/FastString.hs - + compiler/GHC/Data/FlatBag.hs - compiler/GHC/Data/SmallArray.hs - compiler/GHC/Iface/Type.hs - compiler/GHC/Linker/Loader.hs - compiler/GHC/StgToByteCode.hs - compiler/ghc.cabal.in - libraries/ghci/GHCi/CreateBCO.hs - libraries/ghci/GHCi/ResolvedBCO.hs - testsuite/tests/count-deps/CountDepsAst.stdout - testsuite/tests/count-deps/CountDepsParser.stdout - testsuite/tests/ghci/should_run/BinaryArray.hs Changes: ===================================== compiler/GHC/ByteCode/Asm.hs ===================================== @@ -34,6 +34,7 @@ import GHC.Utils.Panic import GHC.Core.TyCon import GHC.Data.FastString +import GHC.Data.FlatBag import GHC.Data.SizedSeq import GHC.StgToCmm.Layout ( ArgRep(..) ) @@ -71,9 +72,9 @@ bcoFreeNames bco where bco_refs (UnlinkedBCO _ _ _ _ nonptrs ptrs) = unionManyUniqDSets ( - mkUniqDSet [ n | BCOPtrName n <- ssElts ptrs ] : - mkUniqDSet [ n | BCONPtrItbl n <- ssElts nonptrs ] : - map bco_refs [ bco | BCOPtrBCO bco <- ssElts ptrs ] + mkUniqDSet [ n | BCOPtrName n <- elemsFlatBag ptrs ] : + mkUniqDSet [ n | BCONPtrItbl n <- elemsFlatBag nonptrs ] : + map bco_refs [ bco | BCOPtrBCO bco <- elemsFlatBag ptrs ] ) -- ----------------------------------------------------------------------------- @@ -90,7 +91,7 @@ bcoFreeNames bco assembleBCOs :: Interp -> Profile - -> [ProtoBCO Name] + -> FlatBag (ProtoBCO Name) -> [TyCon] -> AddrEnv -> Maybe ModBreaks @@ -129,7 +130,7 @@ assembleBCOs interp profile proto_bcos tycons top_strs modbreaks = do -- top-level string literal bindings] in GHC.StgToByteCode for some discussion -- about why. -- -mallocStrings :: Interp -> [UnlinkedBCO] -> IO [UnlinkedBCO] +mallocStrings :: Interp -> FlatBag UnlinkedBCO -> IO (FlatBag UnlinkedBCO) mallocStrings interp ulbcos = do let bytestrings = reverse (execState (mapM_ collect ulbcos) []) ptrs <- interpCmd interp (MallocStrings bytestrings) @@ -170,7 +171,7 @@ assembleOneBCO interp profile pbco = do -- TODO: the profile should be bundled with the interpreter: the rts ways are -- fixed for an interpreter ubco <- assembleBCO (profilePlatform profile) pbco - [ubco'] <- mallocStrings interp [ubco] + UnitFlatBag ubco' <- mallocStrings interp (UnitFlatBag ubco) return ubco' assembleBCO :: Platform -> ProtoBCO Name -> IO UnlinkedBCO @@ -213,9 +214,9 @@ assembleBCO platform (ProtoBCO { protoBCOName = nm (text "bytecode instruction count mismatch") let asm_insns = ssElts final_insns - insns_arr = Array.listArray (0, fromIntegral n_insns - 1) asm_insns - bitmap_arr = mkBitmapArray bsize bitmap - ul_bco = UnlinkedBCO nm arity insns_arr bitmap_arr final_lits final_ptrs + !insns_arr = mkBCOByteArray $ Array.listArray (0 :: Int, fromIntegral n_insns - 1) asm_insns + !bitmap_arr = mkBCOByteArray $ mkBitmapArray bsize bitmap + ul_bco = UnlinkedBCO nm arity insns_arr bitmap_arr (fromSizedSeq final_lits) (fromSizedSeq final_ptrs) -- 8 Aug 01: Finalisers aren't safe when attached to non-primitive -- objects, since they might get run too early. Disable this until @@ -224,7 +225,7 @@ assembleBCO platform (ProtoBCO { protoBCOName = nm return ul_bco -mkBitmapArray :: Word -> [StgWord] -> UArray Int Word64 +mkBitmapArray :: Word -> [StgWord] -> UArray Int Word -- Here the return type must be an array of Words, not StgWords, -- because the underlying ByteArray# will end up as a component -- of a BCO object. ===================================== compiler/GHC/ByteCode/Linker.hs ===================================== @@ -64,13 +64,13 @@ linkBCO interp pkgs_loaded le bco_ix (UnlinkedBCO _ arity insns bitmap lits0 ptrs0) = do -- fromIntegral Word -> Word64 should be a no op if Word is Word64 -- otherwise it will result in a cast to longlong on 32bit systems. - lits <- mapM (fmap fromIntegral . lookupLiteral interp pkgs_loaded le) (ssElts lits0) - ptrs <- mapM (resolvePtr interp pkgs_loaded le bco_ix) (ssElts ptrs0) - let lits' = listArray (0 :: Int, fromIntegral (sizeSS lits0)-1) lits + (lits :: [Word]) <- mapM (fmap fromIntegral . lookupLiteral interp pkgs_loaded le) (elemsFlatBag lits0) + ptrs <- mapM (resolvePtr interp pkgs_loaded le bco_ix) (elemsFlatBag ptrs0) + let lits' = listArray (0 :: Int, fromIntegral (sizeFlatBag lits0)-1) lits return (ResolvedBCO isLittleEndian arity insns bitmap - lits' + (mkBCOByteArray lits') (addListToSS emptySS ptrs)) lookupLiteral :: Interp -> PkgsLoaded -> LinkerEnv -> BCONPtr -> IO Word ===================================== compiler/GHC/ByteCode/Types.hs ===================================== @@ -1,6 +1,8 @@ {-# LANGUAGE GeneralizedNewtypeDeriving #-} {-# LANGUAGE RecordWildCards #-} {-# LANGUAGE TypeApplications #-} +{-# LANGUAGE MagicHash #-} +{-# LANGUAGE UnliftedNewtypes #-} -- -- (c) The University of Glasgow 2002-2006 -- @@ -8,6 +10,7 @@ -- | Bytecode assembler types module GHC.ByteCode.Types ( CompiledByteCode(..), seqCompiledByteCode + , BCOByteArray(..), mkBCOByteArray , FFIInfo(..) , RegBitmap(..) , NativeCallType(..), NativeCallInfo(..), voidTupleReturnInfo, voidPrimCallInfo @@ -18,12 +21,13 @@ module GHC.ByteCode.Types , CgBreakInfo(..) , ModBreaks (..), BreakIndex, emptyModBreaks , CCostCentre + , FlatBag, sizeFlatBag, fromSizedSeq, elemsFlatBag ) where import GHC.Prelude import GHC.Data.FastString -import GHC.Data.SizedSeq +import GHC.Data.FlatBag import GHC.Types.Name import GHC.Types.Name.Env import GHC.Utils.Outputable @@ -33,10 +37,10 @@ import GHCi.BreakArray import GHCi.RemoteTypes import GHCi.FFI import Control.DeepSeq +import GHCi.ResolvedBCO ( BCOByteArray(..), mkBCOByteArray ) import Foreign import Data.Array -import Data.Array.Base ( UArray(..) ) import Data.ByteString (ByteString) import Data.IntMap (IntMap) import qualified Data.IntMap as IntMap @@ -50,7 +54,7 @@ import Language.Haskell.Syntax.Module.Name (ModuleName) -- Compiled Byte Code data CompiledByteCode = CompiledByteCode - { bc_bcos :: [UnlinkedBCO] -- Bunch of interpretable bindings + { bc_bcos :: FlatBag UnlinkedBCO -- Bunch of interpretable bindings , bc_itbls :: ItblEnv -- A mapping from DataCons to their itbls , bc_ffis :: [FFIInfo] -- ffi blocks we allocated , bc_strs :: AddrEnv -- malloc'd top-level strings @@ -62,7 +66,7 @@ newtype FFIInfo = FFIInfo (RemotePtr C_ffi_cif) deriving (Show, NFData) instance Outputable CompiledByteCode where - ppr CompiledByteCode{..} = ppr bc_bcos + ppr CompiledByteCode{..} = ppr $ elemsFlatBag bc_bcos -- Not a real NFData instance, because ModBreaks contains some things -- we can't rnf @@ -152,10 +156,10 @@ data UnlinkedBCO = UnlinkedBCO { unlinkedBCOName :: !Name, unlinkedBCOArity :: {-# UNPACK #-} !Int, - unlinkedBCOInstrs :: !(UArray Int Word16), -- insns - unlinkedBCOBitmap :: !(UArray Int Word64), -- bitmap - unlinkedBCOLits :: !(SizedSeq BCONPtr), -- non-ptrs - unlinkedBCOPtrs :: !(SizedSeq BCOPtr) -- ptrs + unlinkedBCOInstrs :: !(BCOByteArray Word16), -- insns + unlinkedBCOBitmap :: !(BCOByteArray Word), -- bitmap + unlinkedBCOLits :: !(FlatBag BCONPtr), -- non-ptrs + unlinkedBCOPtrs :: !(FlatBag BCOPtr) -- ptrs } instance NFData UnlinkedBCO where @@ -210,8 +214,8 @@ seqCgBreakInfo CgBreakInfo{..} = instance Outputable UnlinkedBCO where ppr (UnlinkedBCO nm _arity _insns _bitmap lits ptrs) = sep [text "BCO", ppr nm, text "with", - ppr (sizeSS lits), text "lits", - ppr (sizeSS ptrs), text "ptrs" ] + ppr (sizeFlatBag lits), text "lits", + ppr (sizeFlatBag ptrs), text "ptrs" ] instance Outputable CgBreakInfo where ppr info = text "CgBreakInfo" <+> ===================================== compiler/GHC/Data/FastString.hs ===================================== @@ -307,9 +307,18 @@ and updates to multiple buckets with low synchronization overhead. See Note [Updating the FastString table] on how it's updated. -} data FastStringTable = FastStringTable - {-# UNPACK #-} !FastMutInt -- the unique ID counter shared with all buckets - {-# UNPACK #-} !FastMutInt -- number of computed z-encodings for all buckets - (Array# (IORef FastStringTableSegment)) -- concurrent segments + {-# UNPACK #-} !FastMutInt + -- ^ The unique ID counter shared with all buckets + -- + -- We unpack the 'FastMutInt' counter as it is always consumed strictly. + {-# NOUNPACK #-} !FastMutInt + -- ^ Number of computed z-encodings for all buckets. + -- + -- We mark this as 'NOUNPACK' as this 'FastMutInt' is retained by a thunk + -- in 'mkFastStringWith' and needs to be boxed any way. + -- If this is unpacked, then we box this single 'FastMutInt' once for each + -- allocated FastString. + (Array# (IORef FastStringTableSegment)) -- ^ concurrent segments data FastStringTableSegment = FastStringTableSegment {-# UNPACK #-} !(MVar ()) -- the lock for write in each segment ===================================== compiler/GHC/Data/FlatBag.hs ===================================== @@ -0,0 +1,129 @@ +{-# LANGUAGE UnboxedTuples #-} +module GHC.Data.FlatBag + ( FlatBag(EmptyFlatBag, UnitFlatBag, TupleFlatBag) + , emptyFlatBag + , unitFlatBag + , sizeFlatBag + , elemsFlatBag + , mappendFlatBag + -- * Construction + , fromList + , fromSizedSeq + ) where + +import GHC.Prelude + +import GHC.Data.SizedSeq (SizedSeq, ssElts, sizeSS) + +import Control.DeepSeq + +import GHC.Data.SmallArray + +-- | Store elements in a flattened representation. +-- +-- A 'FlatBag' is a data structure that stores an ordered list of elements +-- in a flat structure, avoiding the overhead of a linked list. +-- Use this data structure, if the code requires the following properties: +-- +-- * Elements are stored in a long-lived object, and benefit from a flattened +-- representation. +-- * The 'FlatBag' will be traversed but not extended or filtered. +-- * The number of elements should be known. +-- * Sharing of the empty case improves memory behaviour. +-- +-- A 'FlagBag' aims to have as little overhead as possible to store its elements. +-- To achieve that, it distinguishes between the empty case, singleton, tuple +-- and general case. +-- Thus, we only pay for the additional three words of an 'Array' if we have at least +-- three elements. +data FlatBag a + = EmptyFlatBag + | UnitFlatBag !a + | TupleFlatBag !a !a + | FlatBag {-# UNPACK #-} !(SmallArray a) + +instance Functor FlatBag where + fmap _ EmptyFlatBag = EmptyFlatBag + fmap f (UnitFlatBag a) = UnitFlatBag $ f a + fmap f (TupleFlatBag a b) = TupleFlatBag (f a) (f b) + fmap f (FlatBag e) = FlatBag $ mapSmallArray f e + +instance Foldable FlatBag where + foldMap _ EmptyFlatBag = mempty + foldMap f (UnitFlatBag a) = f a + foldMap f (TupleFlatBag a b) = f a `mappend` f b + foldMap f (FlatBag arr) = foldMapSmallArray f arr + + length = fromIntegral . sizeFlatBag + +instance Traversable FlatBag where + traverse _ EmptyFlatBag = pure EmptyFlatBag + traverse f (UnitFlatBag a) = UnitFlatBag <$> f a + traverse f (TupleFlatBag a b) = TupleFlatBag <$> f a <*> f b + traverse f fl@(FlatBag arr) = fromList (fromIntegral $ sizeofSmallArray arr) <$> traverse f (elemsFlatBag fl) + +instance NFData a => NFData (FlatBag a) where + rnf EmptyFlatBag = () + rnf (UnitFlatBag a) = rnf a + rnf (TupleFlatBag a b) = rnf a `seq` rnf b + rnf (FlatBag arr) = rnfSmallArray arr + +-- | Create an empty 'FlatBag'. +-- +-- The empty 'FlatBag' is shared over all instances. +emptyFlatBag :: FlatBag a +emptyFlatBag = EmptyFlatBag + +-- | Create a singleton 'FlatBag'. +unitFlatBag :: a -> FlatBag a +unitFlatBag = UnitFlatBag + +-- | Calculate the size of +sizeFlatBag :: FlatBag a -> Word +sizeFlatBag EmptyFlatBag = 0 +sizeFlatBag UnitFlatBag{} = 1 +sizeFlatBag TupleFlatBag{} = 2 +sizeFlatBag (FlatBag arr) = fromIntegral $ sizeofSmallArray arr + +-- | Get all elements that are stored in the 'FlatBag'. +elemsFlatBag :: FlatBag a -> [a] +elemsFlatBag EmptyFlatBag = [] +elemsFlatBag (UnitFlatBag a) = [a] +elemsFlatBag (TupleFlatBag a b) = [a, b] +elemsFlatBag (FlatBag arr) = + [indexSmallArray arr i | i <- [0 .. sizeofSmallArray arr - 1]] + +-- | Combine two 'FlatBag's. +-- +-- The new 'FlatBag' contains all elements from both 'FlatBag's. +-- +-- If one of the 'FlatBag's is empty, the old 'FlatBag' is reused. +mappendFlatBag :: FlatBag a -> FlatBag a -> FlatBag a +mappendFlatBag EmptyFlatBag b = b +mappendFlatBag a EmptyFlatBag = a +mappendFlatBag (UnitFlatBag a) (UnitFlatBag b) = TupleFlatBag a b +mappendFlatBag a b = + fromList (sizeFlatBag a + sizeFlatBag b) + (elemsFlatBag a ++ elemsFlatBag b) + +-- | Store the list in a flattened memory representation, avoiding the memory overhead +-- of a linked list. +-- +-- The size 'n' needs to be smaller or equal to the length of the list. +-- If it is smaller than the length of the list, overflowing elements are +-- discarded. It is undefined behaviour to set 'n' to be bigger than the +-- length of the list. +fromList :: Word -> [a] -> FlatBag a +fromList n elts = + case elts of + [] -> EmptyFlatBag + [a] -> UnitFlatBag a + [a, b] -> TupleFlatBag a b + xs -> + FlatBag (listToArray (fromIntegral n) fst snd (zip [0..] xs)) + +-- | Convert a 'SizedSeq' into its flattened representation. +-- A 'FlatBag a' is more memory efficient than '[a]', if no further modification +-- is necessary. +fromSizedSeq :: SizedSeq a -> FlatBag a +fromSizedSeq s = fromList (sizeSS s) (ssElts s) ===================================== compiler/GHC/Data/SmallArray.hs ===================================== @@ -11,13 +11,18 @@ module GHC.Data.SmallArray , freezeSmallArray , unsafeFreezeSmallArray , indexSmallArray + , sizeofSmallArray , listToArray + , mapSmallArray + , foldMapSmallArray + , rnfSmallArray ) where import GHC.Exts import GHC.Prelude import GHC.ST +import Control.DeepSeq data SmallArray a = SmallArray (SmallArray# a) @@ -64,6 +69,14 @@ unsafeFreezeSmallArray (SmallMutableArray ma) s = case unsafeFreezeSmallArray# ma s of (# s', a #) -> (# s', SmallArray a #) +-- | Get the size of a 'SmallArray' +sizeofSmallArray + :: SmallArray a + -> Int +{-# INLINE sizeofSmallArray #-} +sizeofSmallArray (SmallArray sa#) = + case sizeofSmallArray# sa# of + s -> I# s -- | Index a small-array (no bounds checking!) indexSmallArray @@ -71,9 +84,51 @@ indexSmallArray -> Int -- ^ index -> a {-# INLINE indexSmallArray #-} -indexSmallArray (SmallArray sa#) (I# i) = case indexSmallArray# sa# i of - (# v #) -> v +indexSmallArray (SmallArray sa#) (I# i) = + case indexSmallArray# sa# i of + (# v #) -> v +-- | Map a function over the elements of a 'SmallArray' +-- +mapSmallArray :: (a -> b) -> SmallArray a -> SmallArray b +{-# INLINE mapSmallArray #-} +mapSmallArray f sa = runST $ ST $ \s -> + let + n = sizeofSmallArray sa + go !i saMut# state# + | i < n = + let + a = indexSmallArray sa i + newState# = writeSmallArray saMut# i (f a) state# + in + go (i + 1) saMut# newState# + | otherwise = state# + in + case newSmallArray n (error "SmallArray: internal error, uninitialised elements") s of + (# s', mutArr #) -> + case go 0 mutArr s' of + s'' -> unsafeFreezeSmallArray mutArr s'' + +-- | Fold the values of a 'SmallArray' into a 'Monoid m' of choice +foldMapSmallArray :: Monoid m => (a -> m) -> SmallArray a -> m +{-# INLINE foldMapSmallArray #-} +foldMapSmallArray f sa = go 0 + where + n = sizeofSmallArray sa + go i + | i < n = f (indexSmallArray sa i) `mappend` go (i + 1) + | otherwise = mempty + +-- | Force the elements of the given 'SmallArray' +-- +rnfSmallArray :: NFData a => SmallArray a -> () +{-# INLINE rnfSmallArray #-} +rnfSmallArray sa = go 0 + where + n = sizeofSmallArray sa + go !i + | i < n = rnf (indexSmallArray sa i) `seq` go (i + 1) + | otherwise = () -- | Convert a list into an array. listToArray :: Int -> (e -> Int) -> (e -> a) -> [e] -> SmallArray a ===================================== compiler/GHC/Iface/Type.hs ===================================== @@ -419,12 +419,51 @@ data IfaceTyConInfo -- Used only to guide pretty-printing , ifaceTyConSort :: IfaceTyConSort } deriving (Eq, Ord) --- This smart constructor allows sharing of the two most common --- cases. See #19194 +-- | This smart constructor allows sharing of the two most common +-- cases. See Note [Sharing IfaceTyConInfo] mkIfaceTyConInfo :: PromotionFlag -> IfaceTyConSort -> IfaceTyConInfo -mkIfaceTyConInfo IsPromoted IfaceNormalTyCon = IfaceTyConInfo IsPromoted IfaceNormalTyCon -mkIfaceTyConInfo NotPromoted IfaceNormalTyCon = IfaceTyConInfo NotPromoted IfaceNormalTyCon -mkIfaceTyConInfo prom sort = IfaceTyConInfo prom sort +mkIfaceTyConInfo IsPromoted IfaceNormalTyCon = promotedNormalTyConInfo +mkIfaceTyConInfo NotPromoted IfaceNormalTyCon = notPromotedNormalTyConInfo +mkIfaceTyConInfo prom sort = IfaceTyConInfo prom sort + +{-# NOINLINE promotedNormalTyConInfo #-} +-- | See Note [Sharing IfaceTyConInfo] +promotedNormalTyConInfo :: IfaceTyConInfo +promotedNormalTyConInfo = IfaceTyConInfo IsPromoted IfaceNormalTyCon + +{-# NOINLINE notPromotedNormalTyConInfo #-} +-- | See Note [Sharing IfaceTyConInfo] +notPromotedNormalTyConInfo :: IfaceTyConInfo +notPromotedNormalTyConInfo = IfaceTyConInfo NotPromoted IfaceNormalTyCon + +{- +Note [Sharing IfaceTyConInfo] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +'IfaceTyConInfo' occurs an awful lot in 'ModIface', see #19194 for an example. +But almost all of them are + + IfaceTyConInfo IsPromoted IfaceNormalTyCon + IfaceTyConInfo NotPromoted IfaceNormalTyCon. + +The smart constructor `mkIfaceTyConInfo` arranges to share these instances, +thus: + + promotedNormalTyConInfo = IfaceTyConInfo IsPromoted IfaceNormalTyCon + notPromotedNormalTyConInfo = IfaceTyConInfo NotPromoted IfaceNormalTyCon + + mkIfaceTyConInfo IsPromoted IfaceNormalTyCon = promotedNormalTyConInfo + mkIfaceTyConInfo NotPromoted IfaceNormalTyCon = notPromotedNormalTyConInfo + mkIfaceTyConInfo prom sort = IfaceTyConInfo prom sort + +But ALAS, the (nested) CPR transform can lose this sharing, completely +negating the effect of `mkIfaceTyConInfo`: see #24530 and #19326. + +Sticking-plaster solution: add a NOINLINE pragma to those top-level constants. +When we fix the CPR bug we can remove the NOINLINE pragmas. + +This one change leads to an 15% reduction in residency for GHC when embedding +'mi_extra_decls': see !12222. +-} data IfaceMCoercion = IfaceMRefl ===================================== compiler/GHC/Linker/Loader.hs ===================================== @@ -94,6 +94,7 @@ import Control.Monad import qualified Data.Set as Set import Data.Char (isSpace) import Data.Foldable (for_) +import qualified Data.Foldable as Foldable import Data.IORef import Data.List (intercalate, isPrefixOf, nub, partition, sortOn) import Data.Maybe @@ -947,7 +948,8 @@ linkSomeBCOs :: Interp linkSomeBCOs interp pkgs_loaded le mods = foldr fun do_link mods [] where - fun CompiledByteCode{..} inner accum = inner (bc_bcos : accum) + fun CompiledByteCode{..} inner accum = + inner (Foldable.toList bc_bcos : accum) do_link [] = return [] do_link mods = do ===================================== compiler/GHC/StgToByteCode.hs ===================================== @@ -63,6 +63,7 @@ import GHC.StgToCmm.Closure ( NonVoid(..), fromNonVoid, idPrimRepU, import GHC.StgToCmm.Layout import GHC.Runtime.Heap.Layout hiding (WordOff, ByteOff, wordsToBytes) import GHC.Data.Bitmap +import GHC.Data.FlatBag as FlatBag import GHC.Data.OrdList import GHC.Data.Maybe import GHC.Types.Name.Env (mkNameEnv) @@ -119,14 +120,14 @@ byteCodeGen hsc_env this_mod binds tycs mb_modBreaks (BcM_State{..}, proto_bcos) <- runBc hsc_env this_mod mb_modBreaks $ do let flattened_binds = concatMap flattenBind (reverse lifted_binds) - mapM schemeTopBind flattened_binds + FlatBag.fromList (fromIntegral $ length flattened_binds) <$> mapM schemeTopBind flattened_binds when (notNull ffis) (panic "GHC.StgToByteCode.byteCodeGen: missing final emitBc?") putDumpFileMaybe logger Opt_D_dump_BCOs "Proto-BCOs" FormatByteCode - (vcat (intersperse (char ' ') (map ppr proto_bcos))) + (vcat (intersperse (char ' ') (map ppr $ elemsFlatBag proto_bcos))) cbc <- assembleBCOs interp profile proto_bcos tycs stringPtrs (case modBreaks of ===================================== compiler/ghc.cabal.in ===================================== @@ -415,6 +415,7 @@ Library GHC.Data.FastString GHC.Data.FastString.Env GHC.Data.FiniteMap + GHC.Data.FlatBag GHC.Data.Graph.Base GHC.Data.Graph.Color GHC.Data.Graph.Collapse ===================================== libraries/ghci/GHCi/CreateBCO.hs ===================================== @@ -68,9 +68,6 @@ createBCO arr bco return (HValue final_bco) } -toWordArray :: UArray Int Word64 -> UArray Int Word -toWordArray = amap fromIntegral - linkBCO' :: Array Int HValue -> ResolvedBCO -> IO BCO linkBCO' arr ResolvedBCO{..} = do let @@ -80,11 +77,10 @@ linkBCO' arr ResolvedBCO{..} = do !(I# arity#) = resolvedBCOArity !(EmptyArr empty#) = emptyArr -- See Note [BCO empty array] - - barr a = case a of UArray _lo _hi n b -> if n == 0 then empty# else b - insns_barr = barr resolvedBCOInstrs - bitmap_barr = barr (toWordArray resolvedBCOBitmap) - literals_barr = barr (toWordArray resolvedBCOLits) + barr arr# = if I# (sizeofByteArray# arr#) == 0 then empty# else arr# + insns_barr = barr (getBCOByteArray resolvedBCOInstrs) + bitmap_barr = barr (getBCOByteArray resolvedBCOBitmap) + literals_barr = barr (getBCOByteArray resolvedBCOLits) PtrsArr marr <- mkPtrsArray arr n_ptrs ptrs IO $ \s -> ===================================== libraries/ghci/GHCi/ResolvedBCO.hs ===================================== @@ -1,9 +1,12 @@ {-# LANGUAGE RecordWildCards, DeriveGeneric, GeneralizedNewtypeDeriving, - BangPatterns, CPP #-} + BangPatterns, CPP, MagicHash, FlexibleInstances, FlexibleContexts, + TypeApplications, ScopedTypeVariables, UnboxedTuples #-} module GHCi.ResolvedBCO ( ResolvedBCO(..) , ResolvedBCOPtr(..) , isLittleEndian + , BCOByteArray(..) + , mkBCOByteArray ) where import Prelude -- See note [Why do we import Prelude here?] @@ -11,11 +14,19 @@ import GHC.Data.SizedSeq import GHCi.RemoteTypes import GHCi.BreakArray -import Data.Array.Unboxed import Data.Binary +import Data.Binary.Put (putBuilder) import GHC.Generics -import GHCi.BinaryArray +import Foreign.Ptr +import Data.Array.Byte +import qualified Data.Binary.Get.Internal as Binary +import qualified Data.ByteString.Builder as BB +import qualified Data.ByteString.Builder.Internal as BB +import GHC.Exts +import Data.Array.Base (UArray(..)) + +import GHC.IO #include "MachDeps.h" @@ -32,19 +43,35 @@ isLittleEndian = True -- | A 'ResolvedBCO' is one in which all the 'Name' references have been -- resolved to actual addresses or 'RemoteHValues'. -- --- Note, all arrays are zero-indexed (we assume this when --- serializing/deserializing) data ResolvedBCO = ResolvedBCO { resolvedBCOIsLE :: Bool, resolvedBCOArity :: {-# UNPACK #-} !Int, - resolvedBCOInstrs :: UArray Int Word16, -- insns - resolvedBCOBitmap :: UArray Int Word64, -- bitmap - resolvedBCOLits :: UArray Int Word64, -- non-ptrs + resolvedBCOInstrs :: BCOByteArray Word16, -- insns + resolvedBCOBitmap :: BCOByteArray Word, -- bitmap + resolvedBCOLits :: BCOByteArray Word, -- non-ptrs resolvedBCOPtrs :: (SizedSeq ResolvedBCOPtr) -- ptrs } deriving (Generic, Show) +-- | Wrapper for a 'ByteArray#'. +-- The phantom type tells what elements are stored in the 'ByteArray#'. +-- Creating a 'ByteArray#' can be achieved using 'UArray''s API, +-- where the underlying 'ByteArray#' can be unpacked. +data BCOByteArray a + = BCOByteArray { + getBCOByteArray :: !ByteArray# + } + +mkBCOByteArray :: UArray Int a -> BCOByteArray a +mkBCOByteArray (UArray _ _ _ arr) = BCOByteArray arr + +instance Show (BCOByteArray Word16) where + showsPrec _ _ = showString "BCOByteArray Word16" + +instance Show (BCOByteArray Word) where + showsPrec _ _ = showString "BCOByteArray Word" + -- | The Binary instance for ResolvedBCOs. -- -- Note, that we do encode the endianness, however there is no support for mixed @@ -54,12 +81,16 @@ instance Binary ResolvedBCO where put ResolvedBCO{..} = do put resolvedBCOIsLE put resolvedBCOArity - putArray resolvedBCOInstrs - putArray resolvedBCOBitmap - putArray resolvedBCOLits + put resolvedBCOInstrs + put resolvedBCOBitmap + put resolvedBCOLits put resolvedBCOPtrs - get = ResolvedBCO - <$> get <*> get <*> getArray <*> getArray <*> getArray <*> get + get = ResolvedBCO <$> get <*> get <*> get <*> get <*> get <*> get + +instance Binary (BCOByteArray a) where + put = putBCOByteArray + get = decodeBCOByteArray + data ResolvedBCOPtr = ResolvedBCORef {-# UNPACK #-} !Int @@ -75,3 +106,65 @@ data ResolvedBCOPtr deriving (Generic, Show) instance Binary ResolvedBCOPtr + +-- -------------------------------------------------------- +-- Serialisers for 'BCOByteArray' +-- -------------------------------------------------------- + +putBCOByteArray :: BCOByteArray a -> Put +putBCOByteArray (BCOByteArray bar) = do + put (I# (sizeofByteArray# bar)) + putBuilder $ byteArrayBuilder bar + +decodeBCOByteArray :: Get (BCOByteArray a) +decodeBCOByteArray = do + n <- get + getByteArray n + +byteArrayBuilder :: ByteArray# -> BB.Builder +byteArrayBuilder arr# = BB.builder $ go 0 (I# (sizeofByteArray# arr#)) + where + go :: Int -> Int -> BB.BuildStep a -> BB.BuildStep a + go !inStart !inEnd k (BB.BufferRange outStart outEnd) + -- There is enough room in this output buffer to write all remaining array + -- contents + | inRemaining <= outRemaining = do + copyByteArrayToAddr arr# inStart outStart inRemaining + k (BB.BufferRange (outStart `plusPtr` inRemaining) outEnd) + -- There is only enough space for a fraction of the remaining contents + | otherwise = do + copyByteArrayToAddr arr# inStart outStart outRemaining + let !inStart' = inStart + outRemaining + return $! BB.bufferFull 1 outEnd (go inStart' inEnd k) + where + inRemaining = inEnd - inStart + outRemaining = outEnd `minusPtr` outStart + + copyByteArrayToAddr :: ByteArray# -> Int -> Ptr a -> Int -> IO () + copyByteArrayToAddr src# (I# src_off#) (Ptr dst#) (I# len#) = + IO $ \s -> case copyByteArrayToAddr# src# src_off# dst# len# s of + s' -> (# s', () #) + +getByteArray :: Int -> Get (BCOByteArray a) +getByteArray nbytes@(I# nbytes#) = do + let !(MutableByteArray arr#) = unsafeDupablePerformIO $ + IO $ \s -> case newByteArray# nbytes# s of + (# s', mbar #) -> (# s', MutableByteArray mbar #) + let go 0 _ = return () + go !remaining !off = do + Binary.readNWith n $ \ptr -> + copyAddrToByteArray ptr arr# off n + go (remaining - n) (off + n) + where n = min chunkSize remaining + go nbytes 0 + return $! unsafeDupablePerformIO $ + IO $ \s -> case unsafeFreezeByteArray# arr# s of + (# s', bar #) -> (# s', BCOByteArray bar #) + where + chunkSize = 10*1024 + + copyAddrToByteArray :: Ptr a -> MutableByteArray# RealWorld + -> Int -> Int -> IO () + copyAddrToByteArray (Ptr src#) dst# (I# dst_off#) (I# len#) = + IO $ \s -> case copyAddrToByteArray# src# dst# dst_off# len# s of + s' -> (# s', () #) ===================================== testsuite/tests/count-deps/CountDepsAst.stdout ===================================== @@ -63,6 +63,7 @@ GHC.Data.FastMutInt GHC.Data.FastString GHC.Data.FastString.Env GHC.Data.FiniteMap +GHC.Data.FlatBag GHC.Data.Graph.Directed GHC.Data.Graph.UnVar GHC.Data.List.Infinite @@ -70,6 +71,7 @@ GHC.Data.List.SetOps GHC.Data.Maybe GHC.Data.OrdList GHC.Data.Pair +GHC.Data.SmallArray GHC.Data.Strict GHC.Data.StringBuffer GHC.Data.TrieMap ===================================== testsuite/tests/count-deps/CountDepsParser.stdout ===================================== @@ -64,6 +64,7 @@ GHC.Data.FastMutInt GHC.Data.FastString GHC.Data.FastString.Env GHC.Data.FiniteMap +GHC.Data.FlatBag GHC.Data.Graph.Directed GHC.Data.Graph.UnVar GHC.Data.List.Infinite @@ -71,6 +72,7 @@ GHC.Data.List.SetOps GHC.Data.Maybe GHC.Data.OrdList GHC.Data.Pair +GHC.Data.SmallArray GHC.Data.Strict GHC.Data.StringBuffer GHC.Data.TrieMap ===================================== testsuite/tests/ghci/should_run/BinaryArray.hs ===================================== @@ -1,11 +1,15 @@ -{-# LANGUAGE FlexibleContexts #-} +{-# LANGUAGE FlexibleContexts, MagicHash, ScopedTypeVariables #-} import Data.Binary.Get import Data.Binary.Put +import Data.Binary (get, put) +import Data.Array.Byte import Data.Array.Unboxed as AU import Data.Array.IO (IOUArray) import Data.Array.MArray (MArray) import Data.Array as A +import Data.Array.Base as A import GHCi.BinaryArray +import GHCi.ResolvedBCO import GHC.Word roundtripTest :: (IArray UArray a, MArray IOUArray a IO, Eq a) @@ -18,6 +22,17 @@ roundtripTest arr = | otherwise -> putStrLn "failed to round-trip" Left _ -> putStrLn "deserialization failed" +roundtripTestByteArray :: forall a . (IArray UArray a, MArray IOUArray a IO, Eq a) + => UArray Int a -> IO () +roundtripTestByteArray (UArray _ _ _ arr#) = + let val = BCOByteArray arr# :: BCOByteArray a + ser = Data.Binary.Put.runPut $ put val + in case Data.Binary.Get.runGetOrFail (get :: Get (BCOByteArray a)) ser of + Right (_, _, BCOByteArray arr'# ) + | ByteArray arr# == ByteArray arr'# -> return () + | otherwise -> putStrLn "failed to round-trip" + Left _ -> putStrLn "deserialization failed" + main :: IO () main = do roundtripTest (AU.listArray (1,500) [1..] :: UArray Int Int) @@ -27,3 +42,10 @@ main = do roundtripTest (AU.listArray (1,500) [1..] :: UArray Int Word32) roundtripTest (AU.listArray (1,500) [1..] :: UArray Int Word64) roundtripTest (AU.listArray (1,500) ['a'..] :: UArray Int Char) + roundtripTestByteArray (AU.listArray (1,500) [1..] :: UArray Int Int) + roundtripTestByteArray (AU.listArray (1,500) [1..] :: UArray Int Word) + roundtripTestByteArray (AU.listArray (1,500) [1..] :: UArray Int Word8) + roundtripTestByteArray (AU.listArray (1,500) [1..] :: UArray Int Word16) + roundtripTestByteArray (AU.listArray (1,500) [1..] :: UArray Int Word32) + roundtripTestByteArray (AU.listArray (1,500) [1..] :: UArray Int Word64) + roundtripTestByteArray (AU.listArray (1,500) ['a'..] :: UArray Int Char) View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/b15caeb035b418021ab45b259a33a33... -- View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/b15caeb035b418021ab45b259a33a33... You're receiving this email because of your account on gitlab.haskell.org.
participants (1)
-
Matthew Pickering (@mpickering)