Matthew Pickering pushed to branch wip/mp/iface-patches-9.10 at Glasgow Haskell Compiler / GHC Commits: b15caeb0 by Fendor at 2025-12-19T09:43:37+00:00 Refactor the Binary serialisation interface The goal is simplifiy adding deduplication tables to `ModIface` interface serialisation. We identify two main points of interest that make this difficult: 1. UserData hardcodes what `Binary` instances can have deduplication tables. Moreover, it heavily uses partial functions. 2. GHC.Iface.Binary hardcodes the deduplication tables for 'Name' and 'FastString', making it difficult to add more deduplication. Instead of having a single `UserData` record with fields for all the types that can have deduplication tables, we allow to provide custom serialisers for any `Typeable`. These are wrapped in existentials and stored in a `Map` indexed by their respective `TypeRep`. The `Binary` instance of the type to deduplicate still needs to explicitly look up the decoder via `findUserDataReader` and `findUserDataWriter`, which is no worse than the status-quo. `Map` was chosen as microbenchmarks indicate it is the fastest for a small number of keys (< 10). To generalise the deduplication table serialisation mechanism, we introduce the types `ReaderTable` and `WriterTable` which provide a simple interface that is sufficient to implement a general purpose deduplication mechanism for `writeBinIface` and `readBinIface`. This allows us to provide a list of deduplication tables for serialisation that can be extended more easily, for example for `IfaceTyCon`, see the issue https://gitlab.haskell.org/ghc/ghc/-/issues/24540 for more motivation. In addition to this refactoring, we split `UserData` into `ReaderUserData` and `WriterUserData`, to avoid partial functions and reduce overall memory usage, as we need fewer mutable variables. Bump haddock submodule to accomodate for `UserData` split. ------------------------- Metric Increase: MultiLayerModulesTH_Make MultiLayerModulesRecomp T21839c ------------------------- Split `BinHandle` into `ReadBinHandle` and `WriteBinHandle` A `BinHandle` contains too much information for reading data. For example, it needs to keep a `FastMutInt` and a `IORef BinData`, when the non-mutable variants would suffice. Additionally, this change has the benefit that anyone can immediately tell whether the `BinHandle` is used for reading or writing. Bump haddock submodule BinHandle split. Add Eq and Ord instance to `IfaceType` We add an `Ord` instance so that we can store `IfaceType` in a `Data.Map` container. This is required to deduplicate `IfaceType` while writing `.hi` files to disk. Deduplication has many beneficial consequences to both file size and memory usage, as the deduplication enables implicit sharing of values. See issue #24540 for more motivation. The `Ord` instance would be unnecessary if we used a `TrieMap` instead of `Data.Map` for the deduplication process. While in theory this is clerarly the better option, experiments on the agda code base showed that a `TrieMap` implementation has worse run-time performance characteristics. To the change itself, we mostly derive `Eq` and `Ord`. This requires us to change occurrences of `FastString` with `LexicalFastString`, since `FastString` has no `Ord` instance. We change the definition of `IfLclName` to a newtype of `LexicalFastString`, to make such changes in the future easier. Bump haddock submodule for IfLclName changes Move out LiteralMap to avoid cyclic module dependencies Add deduplication table for `IfaceType` The type `IfaceType` is a highly redundant, tree-like data structure. While benchmarking, we realised that the high redundancy of `IfaceType` causes high memory consumption in GHCi sessions when byte code is embedded into the `.hi` file via `-fwrite-if-simplified-core` or `-fbyte-code-and-object-code`. Loading such `.hi` files from disk introduces many duplicates of memory expensive values in `IfaceType`, such as `IfaceTyCon`, `IfaceTyConApp`, `IA_Arg` and many more. We improve the memory behaviour of GHCi by adding an additional deduplication table for `IfaceType` to the serialisation of `ModIface`, similar to how we deduplicate `Name`s and `FastString`s. When reading the interface file back, the table allows us to automatically share identical values of `IfaceType`. To provide some numbers, we evaluated this patch on the agda code base. We loaded the full library from the `.hi` files, which contained the embedded core expressions (`-fwrite-if-simplified-core`). Before this patch: * Load time: 11.7 s, 2.5 GB maximum residency. After this patch: * Load time: 7.3 s, 1.7 GB maximum residency. This deduplication has the beneficial side effect to additionally reduce the size of the on-disk interface files tremendously. For example, on agda, we reduce the size of `.hi` files (with `-fwrite-if-simplified-core`): * Before: 101 MB on disk * Now: 24 MB on disk This has even a beneficial side effect on the cabal store. We reduce the size of the store on disk: * Before: 341 MB on disk * Now: 310 MB on disk Note, none of the dependencies have been compiled with `-fwrite-if-simplified-core`, but `IfaceType` occurs in multiple locations in a `ModIface`. We also add IfaceType deduplication table to .hie serialisation and refactor .hie file serialisation to use the same infrastrucutre as `putWithTables`. Bump haddock submodule to accomodate for changes to the deduplication table layout and binary interface. Add run-time configurability of `.hi` file compression Introduce the flag `-fwrite-if-compression=<n>` which allows to configure the compression level of writing .hi files. The motivation is that some deduplication operations are too expensive for the average use case. Hence, we introduce multiple compression levels with variable impact on performance, but still reduce the memory residency and `.hi` file size on disk considerably. We introduce three compression levels: * `1`: `Normal` mode. This is the least amount of compression. It deduplicates only `Name` and `FastString`s, and is naturally the fastest compression mode. * `2`: `Safe` mode. It has a noticeable impact on .hi file size and is marginally slower than `Normal` mode. In general, it should be safe to always use `Safe` mode. * `3`: `Full` deduplication mode. Deduplicate as much as we can, resulting in minimal .hi files, but at the cost of additional compilation time. Reading .hi files doesn't need to know the initial compression level, and can always deserialise a `ModIface`, as we write out a byte that indicates the next value has been deduplicated. This allows users to experiment with different compression levels for packages, without recompilation of dependencies. Note, the deduplication also has an additional side effect of reduced memory consumption to implicit sharing of deduplicated elements. See https://gitlab.haskell.org/ghc/ghc/-/issues/24540 for example where that matters. ------------------------- Metric Decrease: MultiLayerModulesDefsGhciWithCore T16875 T21839c T24471 hard_hole_fits libdir ------------------------- Improve sharing of duplicated values in `ModIface`, fixes #24723 As a `ModIface` often contains duplicated values that are not necessarily shared, we improve sharing by serialising the `ModIface` to an in-memory byte array. Serialisation uses deduplication tables, and deserialisation implicitly shares duplicated values. This helps reducing the peak memory usage while compiling in `--make` mode. The peak memory usage is especially smaller when generating interface files with core expressions (`-fwrite-if-simplified-core`). On agda, this reduces the peak memory usage: * `2.2 GB` to `1.9 GB` for a ghci session. On `lib:Cabal`, we report: * `570 MB` to `500 MB` for a ghci session * `790 MB` to `667 MB` for compiling `lib:Cabal` with ghc There is a small impact on execution time, around 2% on the agda code base. Avoid unneccessarily re-serialising the `ModIface` To reduce memory usage of `ModIface`, we serialise `ModIface` to an in-memory byte array, which implicitly shares duplicated values. This serialised byte array can be reused to avoid work when we actually write the `ModIface` to disk. We introduce a new field to `ModIface` which allows us to save the byte array, and write it direclty to disk if the `ModIface` wasn't changed after the initial serialisation. This requires us to change absolute offsets, for example to jump to the deduplication table for `Name` or `FastString` with relative offsets, as the deduplication byte array doesn't contain header information, such as fingerprints. To allow us to dump the binary blob to disk, we need to replace all absolute offsets with relative ones. We introduce additional helpers for `ModIface` binary serialisation, which construct relocatable binary blobs. We say the binary blob is relocatable, if the binary representation can be moved and does not contain any absolute offsets. Further, we introduce new primitives for `Binary` that allow to create relocatable binaries, such as `forwardGetRel` and `forwardPutRel`. ------------------------- Metric Decrease: MultiLayerModulesDefsGhcWithCore Metric Increase: MultiComponentModules MultiLayerModules T10421 T12150 T12234 T12425 T13035 T13253-spj T13701 T13719 T14697 T15703 T16875 T18698b T18140 T18304 T18698a T18730 T18923 T20049 T24582 T5837 T6048 T9198 T9961 mhu-perf ------------------------- These metric increases may look bad, but they are all completely benign, we simply allocate 1 MB per module for `shareIface`. As this allocation is quite quick, it has a negligible impact on run-time performance. In fact, the performance difference wasn't measurable on my local machine. Reducing the size of the pre-allocated 1 MB buffer avoids these test failures, but also requires us to reallocate the buffer if the interface file is too big. These reallocations *did* have an impact on performance, which is why I have opted to accept all these metric increases, as the number of allocated bytes is merely a guidance. This 1MB allocation increase causes a lot of tests to fail that generally have a low allocation number. E.g., increasing from 40MB to 41MB is a 2.5% increase. In particular, the tests T12150, T13253-spj, T18140, T18304, T18698a, T18923, T20049, T24582, T5837, T6048, and T9961 only fail on i386-darwin job, where the number of allocated bytes seems to be lower than in other jobs. The tests T16875 and T18698b fail on i386-linux for the same reason. WIP: Lazy loading of IfaceDecl - - - - - 44 changed files: - compiler/GHC.hs - compiler/GHC/Core/Map/Expr.hs - compiler/GHC/Core/TyCo/Rep.hs - compiler/GHC/CoreToIface.hs - compiler/GHC/Data/FastString.hs - compiler/GHC/Data/TrieMap.hs - compiler/GHC/Driver/DynFlags.hs - compiler/GHC/Driver/Main.hs - compiler/GHC/Driver/Pipeline.hs - compiler/GHC/Driver/Pipeline/Execute.hs - compiler/GHC/Driver/Session.hs - compiler/GHC/Iface/Binary.hs - compiler/GHC/Iface/Decl.hs - compiler/GHC/Iface/Env.hs - compiler/GHC/Iface/Ext/Binary.hs - compiler/GHC/Iface/Ext/Fields.hs - compiler/GHC/Iface/Ext/Utils.hs - compiler/GHC/Iface/Load.hs - compiler/GHC/Iface/Make.hs - compiler/GHC/Iface/Recomp.hs - compiler/GHC/Iface/Recomp/Binary.hs - compiler/GHC/Iface/Recomp/Flags.hs - compiler/GHC/Iface/Rename.hs - compiler/GHC/Iface/Syntax.hs - compiler/GHC/Iface/Type.hs - compiler/GHC/IfaceToCore.hs - compiler/GHC/IfaceToCore.hs-boot - compiler/GHC/Stg/CSE.hs - compiler/GHC/StgToJS/Object.hs - compiler/GHC/Tc/Errors/Hole.hs - compiler/GHC/Tc/Gen/Splice.hs - compiler/GHC/Tc/Utils/Backpack.hs - compiler/GHC/Types/Basic.hs - compiler/GHC/Types/FieldLabel.hs - compiler/GHC/Types/Name.hs - compiler/GHC/Types/Var.hs - compiler/GHC/Unit/Module/ModIface.hs - compiler/GHC/Utils/Binary.hs - compiler/GHC/Utils/Binary/Typeable.hs - compiler/Language/Haskell/Syntax/Type.hs - compiler/Language/Haskell/Syntax/Type.hs-boot - docs/users_guide/using-optimisation.rst - testsuite/tests/plugins/simple-plugin/Simple/RemovePlugin.hs - utils/haddock The diff was not included because it is too large. View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/commit/b15caeb035b418021ab45b259a33a334... -- View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/commit/b15caeb035b418021ab45b259a33a334... You're receiving this email because of your account on gitlab.haskell.org.