[Git][ghc/ghc][wip/mp/iface-patches-9.10] 6 commits: Replace `SizedSeq` with `FlatBag` for flattened structure

20 Dec 2025

Matthew Pickering pushed to branch wip/mp/iface-patches-9.10 at Glasgow Haskell Compiler / GHC


Commits:
0b16f517 by Fendor at 2025-12-19T10:13:20+00:00
Replace `SizedSeq` with `FlatBag` for flattened structure

LinkedLists are notoriously memory ineffiecient when all we do is
traversing a structure.
As 'UnlinkedBCO' has been identified as a data structure that impacts
the overall memory usage of GHCi sessions, we avoid linked lists and
prefer flattened structure for storing.

We introduce a new memory efficient representation of sequential
elements that has special support for the cases:

* Empty
* Singleton
* Tuple Elements

This improves sharing in the 'Empty' case and avoids the overhead of
'Array' until its constant overhead is justified.

(cherry picked from commit 45ab59dfe39ce32ec1db25644ed84bd44a9506f6)

- - - - -
add9f1c0 by Fendor at 2025-12-19T10:13:37+00:00
Compact FlatBag array representation

`Array` contains three additional `Word`'s we do not need in `FlatBag`. Move
`FlatBag` to `SmallArray`.

Expand the API of SmallArray by `sizeofSmallArray` and add common
traversal functions, such as `mapSmallArray` and `foldMapSmallArray`.
Additionally, allow users to force the elements of a `SmallArray`
via `rnfSmallArray`.

(cherry picked from commit ac6a01bf3ef41460a5f0b4516c9784fa153e40b0)

- - - - -
409aa3bc by Fendor at 2025-12-19T10:14:31+00:00
Prefer packed representation for CompiledByteCode

As there are many 'CompiledByteCode' objects alive during a GHCi
session, representing its element in a more packed manner improves space
behaviour at a minimal cost.

When running GHCi on the agda codebase, we find around 380 live
'CompiledByteCode' objects. Packing their respective 'UnlinkedByteCode'
can save quite some pointers.

(cherry picked from commit 0e00847fc15d76a3923029f072cb9e4bd62f6947)

- - - - -
0cc7d458 by Fendor at 2025-12-19T10:19:12+00:00
Avoid UArray when indexing is not required

`UnlinkedBCO`'s can occur many times in the heap. Each `UnlinkedBCO`
references two `UArray`'s but never indexes them. They are only needed
to encode the elements into a `ByteArray#`. The three words for
the lower bound, upper bound and number of elements are essentially
unused, thus we replace `UArray` with a wrapper around `ByteArray#`.
This saves us up to three words for each `UnlinkedBCO`.

Further, to avoid re-allocating these words for `ResolvedBCO`, we repeat
the procedure for `ResolvedBCO` and add custom `Binary` and `Show` instances.

For example, agda's repl session has around 360_000 UnlinkedBCO's,
so avoiding these three words is already saving us around 8MB residency.

- - - - -
1ecec34b by Fendor at 2025-12-19T10:23:59+00:00
Fix sharing of 'IfaceTyConInfo' during core to iface type translation

During heap analysis, we noticed that during generation of
'mi_extra_decls' we have lots of duplicates for the instances:

* `IfaceTyConInfo NotPromoted IfaceNormalTyCon`
* `IfaceTyConInfo IsPromoted IfaceNormalTyCon`

which should be shared instead of duplicated. This duplication increased
the number of live bytes by around 200MB while loading the agda codebase
into GHCi.

These instances are created during `CoreToIface` translation, in
particular `toIfaceTyCon`.

The generated core looks like:

    toIfaceTyCon
      = \ tc_sjJw ->
          case $wtoIfaceTyCon tc_sjJw of
          { (# ww_sjJz, ww1_sjNL, ww2_sjNM #) ->
          IfaceTyCon ww_sjJz (IfaceTyConInfo ww1_sjNL ww2_sjNM)
          }

whichs removes causes the sharing to work propery.

Adding explicit sharing, with NOINLINE annotations, changes the core to:

    toIfaceTyCon
      = \ tc_sjJq ->
          case $wtoIfaceTyCon tc_sjJq of { (# ww_sjNB, ww1_sjNC #) ->
          IfaceTyCon ww_sjNB ww1_sjNC
          }

which looks much more like sharing is happening.
We confirmed via ghc-debug that all duplications were eliminated and the
number of live bytes are noticeably reduced.

- - - - -
a92cf018 by Fendor at 2025-12-19T10:28:16+00:00
Never UNPACK `FastMutInt` for counting z-encoded `FastString`s

In `FastStringTable`, we count the number of z-encoded FastStrings
that exist in a GHC session.
We used to UNPACK the counters to not waste memory, but live retainer
analysis showed that we allocate a lot of `FastMutInt`s, retained by
`mkFastZString`.

We lazily compute the `FastZString`, only incrementing the counter when the `FastZString` is
forced.
The function `mkFastStringWith` calls `mkZFastString` and boxes the
`FastMutInt`, leading to the following core:

    mkFastStringWith
      = \ mk_fs _  ->
             = case stringTable of
                { FastStringTable _ n_zencs segments# _ ->
                    ...
                         case ((mk_fs (I# ...) (FastMutInt n_zencs))
                            `cast` Co:2 :: ...)
                            ...

Marking this field as `NOUNPACK` avoids this reboxing, eliminating the
allocation of a fresh `FastMutInt` on every `FastString` allocation.

- - - - -


15 changed files:

- compiler/GHC/ByteCode/Asm.hs
- compiler/GHC/ByteCode/Linker.hs
- compiler/GHC/ByteCode/Types.hs
- compiler/GHC/Data/FastString.hs
- + compiler/GHC/Data/FlatBag.hs
- compiler/GHC/Data/SmallArray.hs
- compiler/GHC/Iface/Type.hs
- compiler/GHC/Linker/Loader.hs
- compiler/GHC/StgToByteCode.hs
- compiler/ghc.cabal.in
- libraries/ghci/GHCi/CreateBCO.hs
- libraries/ghci/GHCi/ResolvedBCO.hs
- testsuite/tests/count-deps/CountDepsAst.stdout
- testsuite/tests/count-deps/CountDepsParser.stdout
- testsuite/tests/ghci/should_run/BinaryArray.hs


Changes:

=====================================
compiler/GHC/ByteCode/Asm.hs
=====================================
@@ -34,6 +34,7 @@ import GHC.Utils.Panic
 
 import GHC.Core.TyCon
 import GHC.Data.FastString
+import GHC.Data.FlatBag
 import GHC.Data.SizedSeq
 
 import GHC.StgToCmm.Layout     ( ArgRep(..) )
@@ -71,9 +72,9 @@ bcoFreeNames bco
   where
     bco_refs (UnlinkedBCO _ _ _ _ nonptrs ptrs)
         = unionManyUniqDSets (
-             mkUniqDSet [ n | BCOPtrName n <- ssElts ptrs ] :
-             mkUniqDSet [ n | BCONPtrItbl n <- ssElts nonptrs ] :
-             map bco_refs [ bco | BCOPtrBCO bco <- ssElts ptrs ]
+             mkUniqDSet [ n | BCOPtrName n <- elemsFlatBag ptrs ] :
+             mkUniqDSet [ n | BCONPtrItbl n <- elemsFlatBag nonptrs ] :
+             map bco_refs [ bco | BCOPtrBCO bco <- elemsFlatBag ptrs ]
           )
 
 -- -----------------------------------------------------------------------------
@@ -90,7 +91,7 @@ bcoFreeNames bco
 assembleBCOs
   :: Interp
   -> Profile
-  -> [ProtoBCO Name]
+  -> FlatBag (ProtoBCO Name)
   -> [TyCon]
   -> AddrEnv
   -> Maybe ModBreaks
@@ -129,7 +130,7 @@ assembleBCOs interp profile proto_bcos tycons top_strs modbreaks = do
 -- top-level string literal bindings] in GHC.StgToByteCode for some discussion
 -- about why.
 --
-mallocStrings :: Interp -> [UnlinkedBCO] -> IO [UnlinkedBCO]
+mallocStrings ::  Interp -> FlatBag UnlinkedBCO -> IO (FlatBag UnlinkedBCO)
 mallocStrings interp ulbcos = do
   let bytestrings = reverse (execState (mapM_ collect ulbcos) [])
   ptrs <- interpCmd interp (MallocStrings bytestrings)
@@ -170,7 +171,7 @@ assembleOneBCO interp profile pbco = do
   -- TODO: the profile should be bundled with the interpreter: the rts ways are
   -- fixed for an interpreter
   ubco <- assembleBCO (profilePlatform profile) pbco
-  [ubco'] <- mallocStrings interp [ubco]
+  UnitFlatBag ubco' <- mallocStrings interp (UnitFlatBag ubco)
   return ubco'
 
 assembleBCO :: Platform -> ProtoBCO Name -> IO UnlinkedBCO
@@ -213,9 +214,9 @@ assembleBCO platform (ProtoBCO { protoBCOName       = nm
              (text "bytecode instruction count mismatch")
 
   let asm_insns = ssElts final_insns
-      insns_arr = Array.listArray (0, fromIntegral n_insns - 1) asm_insns
-      bitmap_arr = mkBitmapArray bsize bitmap
-      ul_bco = UnlinkedBCO nm arity insns_arr bitmap_arr final_lits final_ptrs
+      !insns_arr =  mkBCOByteArray $ Array.listArray (0 :: Int, fromIntegral n_insns - 1) asm_insns
+      !bitmap_arr = mkBCOByteArray $ mkBitmapArray bsize bitmap
+      ul_bco = UnlinkedBCO nm arity insns_arr bitmap_arr (fromSizedSeq final_lits) (fromSizedSeq final_ptrs)
 
   -- 8 Aug 01: Finalisers aren't safe when attached to non-primitive
   -- objects, since they might get run too early.  Disable this until
@@ -224,7 +225,7 @@ assembleBCO platform (ProtoBCO { protoBCOName       = nm
 
   return ul_bco
 
-mkBitmapArray :: Word -> [StgWord] -> UArray Int Word64
+mkBitmapArray :: Word -> [StgWord] -> UArray Int Word
 -- Here the return type must be an array of Words, not StgWords,
 -- because the underlying ByteArray# will end up as a component
 -- of a BCO object.


=====================================
compiler/GHC/ByteCode/Linker.hs
=====================================
@@ -64,13 +64,13 @@ linkBCO interp pkgs_loaded le bco_ix
            (UnlinkedBCO _ arity insns bitmap lits0 ptrs0) = do
   -- fromIntegral Word -> Word64 should be a no op if Word is Word64
   -- otherwise it will result in a cast to longlong on 32bit systems.
-  lits <- mapM (fmap fromIntegral . lookupLiteral interp pkgs_loaded le) (ssElts lits0)
-  ptrs <- mapM (resolvePtr interp pkgs_loaded le bco_ix) (ssElts ptrs0)
-  let lits' = listArray (0 :: Int, fromIntegral (sizeSS lits0)-1) lits
+  (lits :: [Word]) <- mapM (fmap fromIntegral . lookupLiteral interp pkgs_loaded le) (elemsFlatBag lits0)
+  ptrs <- mapM (resolvePtr interp pkgs_loaded le bco_ix) (elemsFlatBag ptrs0)
+  let lits' = listArray (0 :: Int, fromIntegral (sizeFlatBag lits0)-1) lits
   return (ResolvedBCO isLittleEndian arity
               insns
               bitmap
-              lits'
+              (mkBCOByteArray lits')
               (addListToSS emptySS ptrs))
 
 lookupLiteral :: Interp -> PkgsLoaded -> LinkerEnv -> BCONPtr -> IO Word


=====================================
compiler/GHC/ByteCode/Types.hs
=====================================
@@ -1,6 +1,8 @@
 {-# LANGUAGE GeneralizedNewtypeDeriving #-}
 {-# LANGUAGE RecordWildCards            #-}
 {-# LANGUAGE TypeApplications           #-}
+{-# LANGUAGE MagicHash                  #-}
+{-# LANGUAGE UnliftedNewtypes           #-}
 --
 --  (c) The University of Glasgow 2002-2006
 --
@@ -8,6 +10,7 @@
 -- | Bytecode assembler types
 module GHC.ByteCode.Types
   ( CompiledByteCode(..), seqCompiledByteCode
+  , BCOByteArray(..), mkBCOByteArray
   , FFIInfo(..)
   , RegBitmap(..)
   , NativeCallType(..), NativeCallInfo(..), voidTupleReturnInfo, voidPrimCallInfo
@@ -18,12 +21,13 @@ module GHC.ByteCode.Types
   , CgBreakInfo(..)
   , ModBreaks (..), BreakIndex, emptyModBreaks
   , CCostCentre
+  , FlatBag, sizeFlatBag, fromSizedSeq, elemsFlatBag
   ) where
 
 import GHC.Prelude
 
 import GHC.Data.FastString
-import GHC.Data.SizedSeq
+import GHC.Data.FlatBag
 import GHC.Types.Name
 import GHC.Types.Name.Env
 import GHC.Utils.Outputable
@@ -33,10 +37,10 @@ import GHCi.BreakArray
 import GHCi.RemoteTypes
 import GHCi.FFI
 import Control.DeepSeq
+import GHCi.ResolvedBCO ( BCOByteArray(..), mkBCOByteArray )
 
 import Foreign
 import Data.Array
-import Data.Array.Base  ( UArray(..) )
 import Data.ByteString (ByteString)
 import Data.IntMap (IntMap)
 import qualified Data.IntMap as IntMap
@@ -50,7 +54,7 @@ import Language.Haskell.Syntax.Module.Name (ModuleName)
 -- Compiled Byte Code
 
 data CompiledByteCode = CompiledByteCode
-  { bc_bcos   :: [UnlinkedBCO]  -- Bunch of interpretable bindings
+  { bc_bcos   :: FlatBag UnlinkedBCO -- Bunch of interpretable bindings
   , bc_itbls  :: ItblEnv        -- A mapping from DataCons to their itbls
   , bc_ffis   :: [FFIInfo]      -- ffi blocks we allocated
   , bc_strs   :: AddrEnv        -- malloc'd top-level strings
@@ -62,7 +66,7 @@ newtype FFIInfo = FFIInfo (RemotePtr C_ffi_cif)
   deriving (Show, NFData)
 
 instance Outputable CompiledByteCode where
-  ppr CompiledByteCode{..} = ppr bc_bcos
+  ppr CompiledByteCode{..} = ppr $ elemsFlatBag bc_bcos
 
 -- Not a real NFData instance, because ModBreaks contains some things
 -- we can't rnf
@@ -152,10 +156,10 @@ data UnlinkedBCO
    = UnlinkedBCO {
         unlinkedBCOName   :: !Name,
         unlinkedBCOArity  :: {-# UNPACK #-} !Int,
-        unlinkedBCOInstrs :: !(UArray Int Word16),      -- insns
-        unlinkedBCOBitmap :: !(UArray Int Word64),      -- bitmap
-        unlinkedBCOLits   :: !(SizedSeq BCONPtr),       -- non-ptrs
-        unlinkedBCOPtrs   :: !(SizedSeq BCOPtr)         -- ptrs
+        unlinkedBCOInstrs :: !(BCOByteArray Word16),      -- insns
+        unlinkedBCOBitmap :: !(BCOByteArray Word),      -- bitmap
+        unlinkedBCOLits   :: !(FlatBag BCONPtr),       -- non-ptrs
+        unlinkedBCOPtrs   :: !(FlatBag BCOPtr)         -- ptrs
    }
 
 instance NFData UnlinkedBCO where
@@ -210,8 +214,8 @@ seqCgBreakInfo CgBreakInfo{..} =
 instance Outputable UnlinkedBCO where
    ppr (UnlinkedBCO nm _arity _insns _bitmap lits ptrs)
       = sep [text "BCO", ppr nm, text "with",
-             ppr (sizeSS lits), text "lits",
-             ppr (sizeSS ptrs), text "ptrs" ]
+             ppr (sizeFlatBag lits), text "lits",
+             ppr (sizeFlatBag ptrs), text "ptrs" ]
 
 instance Outputable CgBreakInfo where
    ppr info = text "CgBreakInfo" <+>


=====================================
compiler/GHC/Data/FastString.hs
=====================================
@@ -307,9 +307,18 @@ and updates to multiple buckets with low synchronization overhead.
 See Note [Updating the FastString table] on how it's updated.
 -}
 data FastStringTable = FastStringTable
-  {-# UNPACK #-} !FastMutInt -- the unique ID counter shared with all buckets
-  {-# UNPACK #-} !FastMutInt -- number of computed z-encodings for all buckets
-  (Array# (IORef FastStringTableSegment)) -- concurrent segments
+  {-# UNPACK #-} !FastMutInt
+  -- ^ The unique ID counter shared with all buckets
+  --
+  -- We unpack the 'FastMutInt' counter as it is always consumed strictly.
+  {-# NOUNPACK #-} !FastMutInt
+  -- ^ Number of computed z-encodings for all buckets.
+  --
+  -- We mark this as 'NOUNPACK' as this 'FastMutInt' is retained by a thunk
+  -- in 'mkFastStringWith' and needs to be boxed any way.
+  -- If this is unpacked, then we box this single 'FastMutInt' once for each
+  -- allocated FastString.
+  (Array# (IORef FastStringTableSegment)) -- ^  concurrent segments
 
 data FastStringTableSegment = FastStringTableSegment
   {-# UNPACK #-} !(MVar ())  -- the lock for write in each segment


=====================================
compiler/GHC/Data/FlatBag.hs
=====================================
@@ -0,0 +1,129 @@
+{-# LANGUAGE UnboxedTuples #-}
+module GHC.Data.FlatBag
+  ( FlatBag(EmptyFlatBag, UnitFlatBag, TupleFlatBag)
+  , emptyFlatBag
+  , unitFlatBag
+  , sizeFlatBag
+  , elemsFlatBag
+  , mappendFlatBag
+  -- * Construction
+  , fromList
+  , fromSizedSeq
+  ) where
+
+import GHC.Prelude
+
+import GHC.Data.SizedSeq (SizedSeq, ssElts, sizeSS)
+
+import Control.DeepSeq
+
+import GHC.Data.SmallArray
+
+-- | Store elements in a flattened representation.
+--
+-- A 'FlatBag' is a data structure that stores an ordered list of elements
+-- in a flat structure, avoiding the overhead of a linked list.
+-- Use this data structure, if the code requires the following properties:
+--
+-- * Elements are stored in a long-lived object, and benefit from a flattened
+--   representation.
+-- * The 'FlatBag' will be traversed but not extended or filtered.
+-- * The number of elements should be known.
+-- * Sharing of the empty case improves memory behaviour.
+--
+-- A 'FlagBag' aims to have as little overhead as possible to store its elements.
+-- To achieve that, it distinguishes between the empty case, singleton, tuple
+-- and general case.
+-- Thus, we only pay for the additional three words of an 'Array' if we have at least
+-- three elements.
+data FlatBag a
+  = EmptyFlatBag
+  | UnitFlatBag !a
+  | TupleFlatBag !a !a
+  | FlatBag {-# UNPACK #-} !(SmallArray a)
+
+instance Functor FlatBag where
+  fmap _ EmptyFlatBag = EmptyFlatBag
+  fmap f (UnitFlatBag a) = UnitFlatBag $ f a
+  fmap f (TupleFlatBag a b) = TupleFlatBag (f a) (f b)
+  fmap f (FlatBag e) = FlatBag $ mapSmallArray f e
+
+instance Foldable FlatBag where
+  foldMap _ EmptyFlatBag = mempty
+  foldMap f (UnitFlatBag a) = f a
+  foldMap f (TupleFlatBag a b) = f a `mappend` f b
+  foldMap f (FlatBag arr) = foldMapSmallArray f arr
+
+  length = fromIntegral . sizeFlatBag
+
+instance Traversable FlatBag where
+  traverse _ EmptyFlatBag = pure EmptyFlatBag
+  traverse f (UnitFlatBag a) = UnitFlatBag <$> f a
+  traverse f (TupleFlatBag a b) = TupleFlatBag <$> f a <*> f b
+  traverse f fl@(FlatBag arr) = fromList (fromIntegral $ sizeofSmallArray arr) <$> traverse f (elemsFlatBag fl)
+
+instance NFData a => NFData (FlatBag a) where
+  rnf EmptyFlatBag = ()
+  rnf (UnitFlatBag a) = rnf a
+  rnf (TupleFlatBag a b) = rnf a `seq` rnf b
+  rnf (FlatBag arr) = rnfSmallArray arr
+
+-- | Create an empty 'FlatBag'.
+--
+-- The empty 'FlatBag' is shared over all instances.
+emptyFlatBag :: FlatBag a
+emptyFlatBag = EmptyFlatBag
+
+-- | Create a singleton 'FlatBag'.
+unitFlatBag :: a -> FlatBag a
+unitFlatBag = UnitFlatBag
+
+-- | Calculate the size of
+sizeFlatBag :: FlatBag a -> Word
+sizeFlatBag EmptyFlatBag = 0
+sizeFlatBag UnitFlatBag{} = 1
+sizeFlatBag TupleFlatBag{} = 2
+sizeFlatBag (FlatBag arr) = fromIntegral $ sizeofSmallArray arr
+
+-- | Get all elements that are stored in the 'FlatBag'.
+elemsFlatBag :: FlatBag a -> [a]
+elemsFlatBag EmptyFlatBag = []
+elemsFlatBag (UnitFlatBag a) = [a]
+elemsFlatBag (TupleFlatBag a b) = [a, b]
+elemsFlatBag (FlatBag arr) =
+  [indexSmallArray arr i | i <- [0 .. sizeofSmallArray arr - 1]]
+
+-- | Combine two 'FlatBag's.
+--
+-- The new 'FlatBag' contains all elements from both 'FlatBag's.
+--
+-- If one of the 'FlatBag's is empty, the old 'FlatBag' is reused.
+mappendFlatBag :: FlatBag a -> FlatBag a -> FlatBag a
+mappendFlatBag EmptyFlatBag b = b
+mappendFlatBag a EmptyFlatBag = a
+mappendFlatBag (UnitFlatBag a) (UnitFlatBag b) = TupleFlatBag a b
+mappendFlatBag a b =
+  fromList (sizeFlatBag a + sizeFlatBag b)
+           (elemsFlatBag a ++ elemsFlatBag b)
+
+-- | Store the list in a flattened memory representation, avoiding the memory overhead
+-- of a linked list.
+--
+-- The size 'n' needs to be smaller or equal to the length of the list.
+-- If it is smaller than the length of the list, overflowing elements are
+-- discarded. It is undefined behaviour to set 'n' to be bigger than the
+-- length of the list.
+fromList :: Word -> [a] -> FlatBag a
+fromList n elts =
+  case elts of
+    [] -> EmptyFlatBag
+    [a] -> UnitFlatBag a
+    [a, b] -> TupleFlatBag a b
+    xs ->
+      FlatBag (listToArray (fromIntegral n) fst snd (zip [0..] xs))
+
+-- | Convert a 'SizedSeq' into its flattened representation.
+-- A 'FlatBag a' is more memory efficient than '[a]', if no further modification
+-- is necessary.
+fromSizedSeq :: SizedSeq a -> FlatBag a
+fromSizedSeq s = fromList (sizeSS s) (ssElts s)


=====================================
compiler/GHC/Data/SmallArray.hs
=====================================
@@ -11,13 +11,18 @@ module GHC.Data.SmallArray
   , freezeSmallArray
   , unsafeFreezeSmallArray
   , indexSmallArray
+  , sizeofSmallArray
   , listToArray
+  , mapSmallArray
+  , foldMapSmallArray
+  , rnfSmallArray
   )
 where
 
 import GHC.Exts
 import GHC.Prelude
 import GHC.ST
+import Control.DeepSeq
 
 data SmallArray a = SmallArray (SmallArray# a)
 
@@ -64,6 +69,14 @@ unsafeFreezeSmallArray (SmallMutableArray ma) s =
   case unsafeFreezeSmallArray# ma s of
     (# s', a #) -> (# s', SmallArray a #)
 
+-- | Get the size of a 'SmallArray'
+sizeofSmallArray
+  :: SmallArray a
+  -> Int
+{-# INLINE sizeofSmallArray #-}
+sizeofSmallArray (SmallArray sa#) =
+  case sizeofSmallArray# sa# of
+    s -> I# s
 
 -- | Index a small-array (no bounds checking!)
 indexSmallArray
@@ -71,9 +84,51 @@ indexSmallArray
   -> Int          -- ^ index
   -> a
 {-# INLINE indexSmallArray #-}
-indexSmallArray (SmallArray sa#) (I# i) = case indexSmallArray# sa# i of
-  (# v #) -> v
+indexSmallArray (SmallArray sa#) (I# i) =
+  case indexSmallArray# sa# i of
+    (# v #) -> v
 
+-- | Map a function over the elements of a 'SmallArray'
+--
+mapSmallArray :: (a -> b) -> SmallArray a -> SmallArray b
+{-# INLINE mapSmallArray #-}
+mapSmallArray f sa = runST $ ST $ \s ->
+  let
+    n = sizeofSmallArray sa
+    go !i saMut# state#
+      | i < n =
+        let
+          a = indexSmallArray sa i
+          newState# = writeSmallArray saMut# i (f a) state#
+        in
+          go (i + 1) saMut# newState#
+      | otherwise = state#
+  in
+  case newSmallArray n (error "SmallArray: internal error, uninitialised elements") s of
+    (# s', mutArr #) ->
+      case go 0 mutArr s' of
+        s'' -> unsafeFreezeSmallArray mutArr s''
+
+-- | Fold the values of a 'SmallArray' into a 'Monoid m' of choice
+foldMapSmallArray :: Monoid m => (a -> m) -> SmallArray a -> m
+{-# INLINE foldMapSmallArray #-}
+foldMapSmallArray f sa = go 0
+  where
+    n = sizeofSmallArray sa
+    go i
+      | i < n = f (indexSmallArray sa i) `mappend` go (i + 1)
+      | otherwise = mempty
+
+-- | Force the elements of the given 'SmallArray'
+--
+rnfSmallArray :: NFData a => SmallArray a -> ()
+{-# INLINE rnfSmallArray #-}
+rnfSmallArray sa = go 0
+  where
+    n = sizeofSmallArray sa
+    go !i
+      | i < n = rnf (indexSmallArray sa i) `seq` go (i + 1)
+      | otherwise = ()
 
 -- | Convert a list into an array.
 listToArray :: Int -> (e -> Int) -> (e -> a) -> [e] -> SmallArray a


=====================================
compiler/GHC/Iface/Type.hs
=====================================
@@ -419,12 +419,51 @@ data IfaceTyConInfo   -- Used only to guide pretty-printing
                    , ifaceTyConSort       :: IfaceTyConSort }
     deriving (Eq, Ord)

--- This smart constructor allows sharing of the two most common
--- cases. See #19194
+-- | This smart constructor allows sharing of the two most common
+-- cases. See Note [Sharing IfaceTyConInfo]
 mkIfaceTyConInfo :: PromotionFlag -> IfaceTyConSort -> IfaceTyConInfo
-mkIfaceTyConInfo IsPromoted  IfaceNormalTyCon = IfaceTyConInfo IsPromoted  IfaceNormalTyCon
-mkIfaceTyConInfo NotPromoted IfaceNormalTyCon = IfaceTyConInfo NotPromoted IfaceNormalTyCon
-mkIfaceTyConInfo prom        sort             = IfaceTyConInfo prom        sort
+mkIfaceTyConInfo IsPromoted  IfaceNormalTyCon = promotedNormalTyConInfo
+mkIfaceTyConInfo NotPromoted IfaceNormalTyCon = notPromotedNormalTyConInfo
+mkIfaceTyConInfo prom        sort             = IfaceTyConInfo prom sort
+
+{-# NOINLINE promotedNormalTyConInfo #-}
+-- | See Note [Sharing IfaceTyConInfo]
+promotedNormalTyConInfo :: IfaceTyConInfo
+promotedNormalTyConInfo = IfaceTyConInfo IsPromoted IfaceNormalTyCon
+
+{-# NOINLINE notPromotedNormalTyConInfo #-}
+-- | See Note [Sharing IfaceTyConInfo]
+notPromotedNormalTyConInfo :: IfaceTyConInfo
+notPromotedNormalTyConInfo = IfaceTyConInfo NotPromoted IfaceNormalTyCon
+
+{-
+Note [Sharing IfaceTyConInfo]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+'IfaceTyConInfo' occurs an awful lot in 'ModIface', see #19194 for an example.
+But almost all of them are
+
+   IfaceTyConInfo IsPromoted IfaceNormalTyCon
+   IfaceTyConInfo NotPromoted IfaceNormalTyCon.
+
+The smart constructor `mkIfaceTyConInfo` arranges to share these instances,
+thus:
+
+  promotedNormalTyConInfo    = IfaceTyConInfo IsPromoted  IfaceNormalTyCon
+  notPromotedNormalTyConInfo = IfaceTyConInfo NotPromoted IfaceNormalTyCon
+
+  mkIfaceTyConInfo IsPromoted  IfaceNormalTyCon = promotedNormalTyConInfo
+  mkIfaceTyConInfo NotPromoted IfaceNormalTyCon = notPromotedNormalTyConInfo
+  mkIfaceTyConInfo prom        sort             = IfaceTyConInfo prom sort
+
+But ALAS, the (nested) CPR transform can lose this sharing, completely
+negating the effect of `mkIfaceTyConInfo`: see #24530 and #19326.
+
+Sticking-plaster solution: add a NOINLINE pragma to those top-level constants.
+When we fix the CPR bug we can remove the NOINLINE pragmas.
+
+This one change leads to an 15% reduction in residency for GHC when embedding
+'mi_extra_decls': see !12222.
+-}
 
 data IfaceMCoercion
   = IfaceMRefl


=====================================
compiler/GHC/Linker/Loader.hs
=====================================
@@ -94,6 +94,7 @@ import Control.Monad
 import qualified Data.Set as Set
 import Data.Char (isSpace)
 import Data.Foldable (for_)
+import qualified Data.Foldable as Foldable
 import Data.IORef
 import Data.List (intercalate, isPrefixOf, nub, partition, sortOn)
 import Data.Maybe
@@ -947,7 +948,8 @@ linkSomeBCOs :: Interp
 
 linkSomeBCOs interp pkgs_loaded le mods = foldr fun do_link mods []
  where
-  fun CompiledByteCode{..} inner accum = inner (bc_bcos : accum)
+  fun CompiledByteCode{..} inner accum =
+    inner (Foldable.toList bc_bcos : accum)
 
   do_link [] = return []
   do_link mods = do


=====================================
compiler/GHC/StgToByteCode.hs
=====================================
@@ -63,6 +63,7 @@ import GHC.StgToCmm.Closure ( NonVoid(..), fromNonVoid, idPrimRepU,
 import GHC.StgToCmm.Layout
 import GHC.Runtime.Heap.Layout hiding (WordOff, ByteOff, wordsToBytes)
 import GHC.Data.Bitmap
+import GHC.Data.FlatBag as FlatBag
 import GHC.Data.OrdList
 import GHC.Data.Maybe
 import GHC.Types.Name.Env (mkNameEnv)
@@ -119,14 +120,14 @@ byteCodeGen hsc_env this_mod binds tycs mb_modBreaks
         (BcM_State{..}, proto_bcos) <-
            runBc hsc_env this_mod mb_modBreaks $ do
              let flattened_binds = concatMap flattenBind (reverse lifted_binds)
-             mapM schemeTopBind flattened_binds
+             FlatBag.fromList (fromIntegral $ length flattened_binds) <$> mapM schemeTopBind flattened_binds
 
         when (notNull ffis)
              (panic "GHC.StgToByteCode.byteCodeGen: missing final emitBc?")
 
         putDumpFileMaybe logger Opt_D_dump_BCOs
            "Proto-BCOs" FormatByteCode
-           (vcat (intersperse (char ' ') (map ppr proto_bcos)))
+           (vcat (intersperse (char ' ') (map ppr $ elemsFlatBag proto_bcos)))
 
         cbc <- assembleBCOs interp profile proto_bcos tycs stringPtrs
           (case modBreaks of


=====================================
compiler/ghc.cabal.in
=====================================
@@ -415,6 +415,7 @@ Library
         GHC.Data.FastString
         GHC.Data.FastString.Env
         GHC.Data.FiniteMap
+        GHC.Data.FlatBag
         GHC.Data.Graph.Base
         GHC.Data.Graph.Color
         GHC.Data.Graph.Collapse


=====================================
libraries/ghci/GHCi/CreateBCO.hs
=====================================
@@ -68,9 +68,6 @@ createBCO arr bco
                   return (HValue final_bco) }
 
 
-toWordArray :: UArray Int Word64 -> UArray Int Word
-toWordArray = amap fromIntegral
-
 linkBCO' :: Array Int HValue -> ResolvedBCO -> IO BCO
 linkBCO' arr ResolvedBCO{..} = do
   let
@@ -80,11 +77,10 @@ linkBCO' arr ResolvedBCO{..} = do
       !(I# arity#)  = resolvedBCOArity
 
       !(EmptyArr empty#) = emptyArr -- See Note [BCO empty array]
-
-      barr a = case a of UArray _lo _hi n b -> if n == 0 then empty# else b
-      insns_barr = barr resolvedBCOInstrs
-      bitmap_barr = barr (toWordArray resolvedBCOBitmap)
-      literals_barr = barr (toWordArray resolvedBCOLits)
+      barr arr# = if I# (sizeofByteArray# arr#) == 0 then empty# else arr#
+      insns_barr = barr (getBCOByteArray resolvedBCOInstrs)
+      bitmap_barr = barr (getBCOByteArray resolvedBCOBitmap)
+      literals_barr = barr (getBCOByteArray resolvedBCOLits)
 
   PtrsArr marr <- mkPtrsArray arr n_ptrs ptrs
   IO $ \s ->


=====================================
libraries/ghci/GHCi/ResolvedBCO.hs
=====================================
@@ -1,9 +1,12 @@
 {-# LANGUAGE RecordWildCards, DeriveGeneric, GeneralizedNewtypeDeriving,
-    BangPatterns, CPP #-}
+    BangPatterns, CPP, MagicHash, FlexibleInstances, FlexibleContexts,
+    TypeApplications, ScopedTypeVariables, UnboxedTuples #-}
 module GHCi.ResolvedBCO
   ( ResolvedBCO(..)
   , ResolvedBCOPtr(..)
   , isLittleEndian
+  , BCOByteArray(..)
+  , mkBCOByteArray
   ) where
 
 import Prelude -- See note [Why do we import Prelude here?]
@@ -11,11 +14,19 @@ import GHC.Data.SizedSeq
 import GHCi.RemoteTypes
 import GHCi.BreakArray
 
-import Data.Array.Unboxed
 import Data.Binary
+import Data.Binary.Put (putBuilder)
 import GHC.Generics
-import GHCi.BinaryArray
 
+import Foreign.Ptr
+import Data.Array.Byte
+import qualified Data.Binary.Get.Internal as Binary
+import qualified Data.ByteString.Builder as BB
+import qualified Data.ByteString.Builder.Internal as BB
+import GHC.Exts
+import Data.Array.Base (UArray(..))
+
+import GHC.IO
 
 #include "MachDeps.h"
 
@@ -32,19 +43,35 @@ isLittleEndian = True
 -- | A 'ResolvedBCO' is one in which all the 'Name' references have been
 -- resolved to actual addresses or 'RemoteHValues'.
 --
--- Note, all arrays are zero-indexed (we assume this when
--- serializing/deserializing)
 data ResolvedBCO
    = ResolvedBCO {
         resolvedBCOIsLE   :: Bool,
         resolvedBCOArity  :: {-# UNPACK #-} !Int,
-        resolvedBCOInstrs :: UArray Int Word16,         -- insns
-        resolvedBCOBitmap :: UArray Int Word64,         -- bitmap
-        resolvedBCOLits   :: UArray Int Word64,         -- non-ptrs
+        resolvedBCOInstrs :: BCOByteArray Word16,       -- insns
+        resolvedBCOBitmap :: BCOByteArray Word,         -- bitmap
+        resolvedBCOLits   :: BCOByteArray Word,         -- non-ptrs
         resolvedBCOPtrs   :: (SizedSeq ResolvedBCOPtr)  -- ptrs
    }
    deriving (Generic, Show)
 
+-- | Wrapper for a 'ByteArray#'.
+-- The phantom type tells what elements are stored in the 'ByteArray#'.
+-- Creating a 'ByteArray#' can be achieved using 'UArray''s API,
+-- where the underlying 'ByteArray#' can be unpacked.
+data BCOByteArray a
+  = BCOByteArray {
+        getBCOByteArray :: !ByteArray#
+  }
+
+mkBCOByteArray :: UArray Int a -> BCOByteArray a
+mkBCOByteArray (UArray _ _ _ arr) = BCOByteArray arr
+
+instance Show (BCOByteArray Word16) where
+  showsPrec _ _ = showString "BCOByteArray Word16"
+
+instance Show (BCOByteArray Word) where
+  showsPrec _ _ = showString "BCOByteArray Word"
+
 -- | The Binary instance for ResolvedBCOs.
 --
 -- Note, that we do encode the endianness, however there is no support for mixed
@@ -54,12 +81,16 @@ instance Binary ResolvedBCO where
   put ResolvedBCO{..} = do
     put resolvedBCOIsLE
     put resolvedBCOArity
-    putArray resolvedBCOInstrs
-    putArray resolvedBCOBitmap
-    putArray resolvedBCOLits
+    put resolvedBCOInstrs
+    put resolvedBCOBitmap
+    put resolvedBCOLits
     put resolvedBCOPtrs
-  get = ResolvedBCO
-        <$> get <*> get <*> getArray <*> getArray <*> getArray <*> get
+  get = ResolvedBCO <$> get <*> get <*> get <*> get <*> get <*> get
+
+instance Binary (BCOByteArray a) where
+  put = putBCOByteArray
+  get = decodeBCOByteArray
+
 
 data ResolvedBCOPtr
   = ResolvedBCORef {-# UNPACK #-} !Int
@@ -75,3 +106,65 @@ data ResolvedBCOPtr
   deriving (Generic, Show)
 
 instance Binary ResolvedBCOPtr
+
+-- --------------------------------------------------------
+-- Serialisers for 'BCOByteArray'
+-- --------------------------------------------------------
+
+putBCOByteArray :: BCOByteArray a -> Put
+putBCOByteArray (BCOByteArray bar) = do
+  put (I# (sizeofByteArray# bar))
+  putBuilder $ byteArrayBuilder bar
+
+decodeBCOByteArray :: Get (BCOByteArray a)
+decodeBCOByteArray = do
+  n <- get
+  getByteArray n
+
+byteArrayBuilder :: ByteArray# -> BB.Builder
+byteArrayBuilder arr# = BB.builder $ go 0 (I# (sizeofByteArray# arr#))
+  where
+    go :: Int -> Int -> BB.BuildStep a -> BB.BuildStep a
+    go !inStart !inEnd k (BB.BufferRange outStart outEnd)
+      -- There is enough room in this output buffer to write all remaining array
+      -- contents
+      | inRemaining <= outRemaining = do
+          copyByteArrayToAddr arr# inStart outStart inRemaining
+          k (BB.BufferRange (outStart `plusPtr` inRemaining) outEnd)
+      -- There is only enough space for a fraction of the remaining contents
+      | otherwise = do
+          copyByteArrayToAddr arr# inStart outStart outRemaining
+          let !inStart' = inStart + outRemaining
+          return $! BB.bufferFull 1 outEnd (go inStart' inEnd k)
+      where
+        inRemaining  = inEnd - inStart
+        outRemaining = outEnd `minusPtr` outStart
+
+    copyByteArrayToAddr :: ByteArray# -> Int -> Ptr a -> Int -> IO ()
+    copyByteArrayToAddr src# (I# src_off#) (Ptr dst#) (I# len#) =
+        IO $ \s -> case copyByteArrayToAddr# src# src_off# dst# len# s of
+                     s' -> (# s', () #)
+
+getByteArray :: Int -> Get (BCOByteArray a)
+getByteArray nbytes@(I# nbytes#) = do
+    let !(MutableByteArray arr#) = unsafeDupablePerformIO $
+          IO $ \s -> case newByteArray# nbytes# s of
+                (# s', mbar #) -> (# s', MutableByteArray mbar #)
+    let go 0 _ = return ()
+        go !remaining !off = do
+            Binary.readNWith n $ \ptr ->
+              copyAddrToByteArray ptr arr# off n
+            go (remaining - n) (off + n)
+          where n = min chunkSize remaining
+    go nbytes 0
+    return $! unsafeDupablePerformIO $
+      IO $ \s -> case unsafeFreezeByteArray# arr# s of
+          (# s', bar #) -> (# s', BCOByteArray bar #)
+  where
+    chunkSize = 10*1024
+
+    copyAddrToByteArray :: Ptr a -> MutableByteArray# RealWorld
+                        -> Int -> Int -> IO ()
+    copyAddrToByteArray (Ptr src#) dst# (I# dst_off#) (I# len#) =
+        IO $ \s -> case copyAddrToByteArray# src# dst# dst_off# len# s of
+                     s' -> (# s', () #)


=====================================
testsuite/tests/count-deps/CountDepsAst.stdout
=====================================
@@ -63,6 +63,7 @@ GHC.Data.FastMutInt
 GHC.Data.FastString
 GHC.Data.FastString.Env
 GHC.Data.FiniteMap
+GHC.Data.FlatBag
 GHC.Data.Graph.Directed
 GHC.Data.Graph.UnVar
 GHC.Data.List.Infinite
@@ -70,6 +71,7 @@ GHC.Data.List.SetOps
 GHC.Data.Maybe
 GHC.Data.OrdList
 GHC.Data.Pair
+GHC.Data.SmallArray
 GHC.Data.Strict
 GHC.Data.StringBuffer
 GHC.Data.TrieMap


=====================================
testsuite/tests/count-deps/CountDepsParser.stdout
=====================================
@@ -64,6 +64,7 @@ GHC.Data.FastMutInt
 GHC.Data.FastString
 GHC.Data.FastString.Env
 GHC.Data.FiniteMap
+GHC.Data.FlatBag
 GHC.Data.Graph.Directed
 GHC.Data.Graph.UnVar
 GHC.Data.List.Infinite
@@ -71,6 +72,7 @@ GHC.Data.List.SetOps
 GHC.Data.Maybe
 GHC.Data.OrdList
 GHC.Data.Pair
+GHC.Data.SmallArray
 GHC.Data.Strict
 GHC.Data.StringBuffer
 GHC.Data.TrieMap


=====================================
testsuite/tests/ghci/should_run/BinaryArray.hs
=====================================
@@ -1,11 +1,15 @@
-{-# LANGUAGE FlexibleContexts #-}
+{-# LANGUAGE FlexibleContexts, MagicHash, ScopedTypeVariables #-}
 import Data.Binary.Get
 import Data.Binary.Put
+import Data.Binary (get, put)
+import Data.Array.Byte
 import Data.Array.Unboxed as AU
 import Data.Array.IO (IOUArray)
 import Data.Array.MArray (MArray)
 import Data.Array as A
+import Data.Array.Base as A
 import GHCi.BinaryArray
+import GHCi.ResolvedBCO
 import GHC.Word
 
 roundtripTest :: (IArray UArray a, MArray IOUArray a IO, Eq a)
@@ -18,6 +22,17 @@ roundtripTest arr =
            | otherwise    -> putStrLn "failed to round-trip"
          Left _           -> putStrLn "deserialization failed"
 
+roundtripTestByteArray :: forall a . (IArray UArray a, MArray IOUArray a IO, Eq a)
+              => UArray Int a -> IO ()
+roundtripTestByteArray (UArray _ _ _ arr#) =
+    let val  = BCOByteArray arr# :: BCOByteArray a
+        ser  = Data.Binary.Put.runPut $ put val
+    in case Data.Binary.Get.runGetOrFail (get :: Get (BCOByteArray a)) ser of
+         Right (_, _, BCOByteArray arr'# )
+           | ByteArray arr# == ByteArray arr'#  -> return ()
+           | otherwise                          -> putStrLn "failed to round-trip"
+         Left _                                 -> putStrLn "deserialization failed"
+
 main :: IO ()
 main = do
     roundtripTest (AU.listArray (1,500) [1..] :: UArray Int Int)
@@ -27,3 +42,10 @@ main = do
     roundtripTest (AU.listArray (1,500) [1..] :: UArray Int Word32)
     roundtripTest (AU.listArray (1,500) [1..] :: UArray Int Word64)
     roundtripTest (AU.listArray (1,500) ['a'..] :: UArray Int Char)
+    roundtripTestByteArray (AU.listArray (1,500) [1..] :: UArray Int Int)
+    roundtripTestByteArray (AU.listArray (1,500) [1..] :: UArray Int Word)
+    roundtripTestByteArray (AU.listArray (1,500) [1..] :: UArray Int Word8)
+    roundtripTestByteArray (AU.listArray (1,500) [1..] :: UArray Int Word16)
+    roundtripTestByteArray (AU.listArray (1,500) [1..] :: UArray Int Word32)
+    roundtripTestByteArray (AU.listArray (1,500) [1..] :: UArray Int Word64)
+    roundtripTestByteArray (AU.listArray (1,500) ['a'..] :: UArray Int Char)



View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/b15caeb035b418021ab45b259a33a33...

-- 
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/b15caeb035b418021ab45b259a33a33...
You're receiving this email because of your account on gitlab.haskell.org.

    

Matthew Pickering (＠mpickering)

tags

participants (1)