Matthew Pickering pushed to branch wip/mp/iface-patches-9.10 at Glasgow Haskell Compiler / GHC Commits: 012af357 by Cheng Shao at 2024-12-09T09:46:45-08:00 compiler: implement --show-iface-abi-hash major mode - - - - - 0f665588 by Cheng Shao at 2024-12-09T09:46:45-08:00 Oneshot bytecode linking - - - - - a54376cf by Torsten Schmits at 2024-12-09T09:46:45-08:00 Package deps bytecode linking - - - - - 44c8c978 by Ian-Woo Kim at 2024-12-09T09:46:45-08:00 set extra_decls = Nothing in interpreter after interface generation - - - - - d695ed07 by Ian-Woo Kim at 2024-12-09T09:46:45-08:00 No in-memory resident mi_extra_decls in compilation. They are transiently loaded and removed after byte-code generation. - - - - - 27a4a8bc by Ben Gamari at 2024-12-09T10:20:35-08:00 rts: Tighten up invariants of PACK - - - - - 0756f0b5 by Ben Gamari at 2024-12-09T10:20:40-08:00 StgToByteCode: Don't assume that data con workers are nullary Previously StgToByteCode assumed that all data-con workers were of a nullary representation. This is not a valid assumption, as seen in #23210, where an unsaturated application of a unary data constructor's worker resulted in invalid bytecode. Sadly, I have not yet been able to reduce a minimal testcase for this. Fixes #23210. - - - - - b335f856 by Cheng Shao at 2024-12-10T10:52:14-08:00 driver: fix hpc undefined symbol issue in TH with -fprefer-byte-code This commit fixes an undefined symbol error in RTS linker when attempting to compile home modules with -fhpc and -fbyte-code-and-object-code/-fprefer-byte-code, see #25510 for detailed description and analysis of the bug. Also adds T25510/T25510c regression tests to test make mode/oneshot mode of the bug. backported to GHC 9.10. - - - - - 2261e59e by Rebecca Turner at 2024-12-10T13:55:31-08:00 ghc-internal: No trailing whitespace in exceptions This is a backport of the *behavior* in https://gitlab.haskell.org/ghc/ghc/-/commit/bfe600f5bb3ecd2c8fa71c536c63d3c4... The commit upstream depends on a bunch of other changes to the exception reporting infrastructure, so I've chosen to recreate its behavior here rather than pulling in all the dependent patches. This fixes a regression where GHC 9.10.1 adds a trailing newline to the `displayException` implementation for `SomeException`. This has been reverted in `master` but 9.10.2 isn't out yet and there's various changes the upstream commit depends on, so this is a simple one-line fix. See: https://gitlab.haskell.org/ghc/ghc/-/issues/25052 - - - - - 68f4225c by Ian-Woo Kim at 2024-12-11T14:49:55-08:00 disabled stub dynamic object generation for one-shot byte-code linking. - - - - - 97da9d9e by Torsten Schmits at 2024-12-25T08:52:35-08:00 refactor quadratic search in warnMissingHomeModules - - - - - ba330bbd by Rodrigo Mesquita at 2024-12-25T09:49:02-08:00 Improve reachability queries on ModuleGraph Introduces `ReachabilityIndex`, an index constructed from a `GHC.Data.Graph.Directed` `Graph` that supports fast reachability queries (in $O(1)$). This abstract data structure is exposed from `GHC.Data.Graph.Directed.Reachability`. This index is constructed from the module graph nodes and cached in `ModuleGraph`, enabling efficient reachability queries on the module graph. Previously, we'd construct a Map of Set of ModuleGraph nodes which used a lot of memory (`O(n^2)` in the number of nodes) and cache that in the `ModuleGraph`. By using the reachability index we get rid of this space leak in the module graph -- even though the index is still quadratic in the number of modules, it is much, much more space efficient due to its representation using an IntMap of IntSet as opposed to the transitive closure we previously cached. In a memory profile of MultiLayerModules with 100x100 modules, memory usage improved from 6GB residency to 2.8GB, out of which roughly 1.8GB are caused by a second space leak related to ModuleGraph. On the same program, it brings compile time from 7.5s to 5.5s. Note how we simplify `checkHomeUnitsClosed` in terms of `isReachableMany` and by avoiding constructing a second graph with the full transitive closure -- it suffices to answer the reachability query on the full graph without collapsing the transitive closure completely into nodes. Unfortunately, solving this leak means we have to do a little bit more work since we can no longer cache the result of turning vertex indices into nodes. This results in a slight regression in MultiLayerModulesTH_Make, but results in large performance and memory wins when compiling large amounts of modules. ------------------------- Metric Decrease: mhu-perf Metric Increase: MultiLayerModulesTH_Make ------------------------- - - - - - 53ecc06c by Ian-Woo Kim at 2024-12-25T16:45:39-08:00 dummy flag -fpackage-db-byte-code - - - - - 3573c6fe by Ian-Woo Kim at 2025-01-08T06:58:32-08:00 similarize the parallel downsweep to GHC HEAD version. - - - - - 9da11f8f by Matthew Pickering at 2025-01-08T08:29:47-08:00 Use deterministic names for temporary files When there are multiple threads they can race to create a temporary file, in some situations the thread will create ghc_1.c and in some it will create ghc_2.c. This filename ends up in the debug info for object files after compiling a C file, therefore contributes to object nondeterminism. In order to fix this we store a prefix in `TmpFs` which serves to namespace temporary files. The prefix is populated from the counter in TmpFs when the TmpFs is forked. Therefore the TmpFs must be forked outside the thread which consumes it, in a deterministic order, so each thread always receives a TmpFs with the same prefix. This assumes that after the initial TmpFs is created, all other TmpFs are created from forking the original TmpFs. Which should have been try anyway as otherwise there would be file collisions and non-determinism. Fixes #25224 - - - - - 3a484a7b by Ian-Woo Kim at 2025-01-08T14:47:34-08:00 monotonic FinderCache. missed part from parallel downsweep latest GHC HEAD patch - - - - - 2c4d9f61 by Andreas Klebinger at 2025-02-28T15:04:43-08:00 SpecConstr: Introduce a separate argument limit for forced specs. We used to put no limit at all on specializations forced via the SPEC argument. This isn't always reasonable so we introduce a very high limit that applies to forced specializations, a flag to control it, and we now emit a warning if we fail a specialization because we exceed the warning. Fixes #25197 - - - - - 1c80ba27 by Torsten Schmits at 2025-10-30T17:58:26+01:00 Load TH deps from home unit states of the modules that import them - - - - - 4d6ffa22 by Torsten Schmits at 2025-11-06T16:10:01+01:00 remove redundant core bindings typecheck in initWholeCoreBindings - - - - - 75206a24 by Georgios Karachalias at 2025-11-19T13:58:58+01:00 Use OsPath in PkgDbRef and UnitDatabase, not FilePath - - - - - ad1d03ec by Matthew Pickering at 2025-12-11T19:44:01+01:00 Various downsweep perf tweaks - - - - - f20cef54 by Torsten Schmits at 2025-12-11T19:44:01+01:00 Abstract out parts of mkUnitState into a handler type - - - - - 3ecb31bc by Fendor at 2025-12-18T17:28:42+00:00 Refactor the Binary serialisation interface The goal is simplifiy adding deduplication tables to `ModIface` interface serialisation. We identify two main points of interest that make this difficult: 1. UserData hardcodes what `Binary` instances can have deduplication tables. Moreover, it heavily uses partial functions. 2. GHC.Iface.Binary hardcodes the deduplication tables for 'Name' and 'FastString', making it difficult to add more deduplication. Instead of having a single `UserData` record with fields for all the types that can have deduplication tables, we allow to provide custom serialisers for any `Typeable`. These are wrapped in existentials and stored in a `Map` indexed by their respective `TypeRep`. The `Binary` instance of the type to deduplicate still needs to explicitly look up the decoder via `findUserDataReader` and `findUserDataWriter`, which is no worse than the status-quo. `Map` was chosen as microbenchmarks indicate it is the fastest for a small number of keys (< 10). To generalise the deduplication table serialisation mechanism, we introduce the types `ReaderTable` and `WriterTable` which provide a simple interface that is sufficient to implement a general purpose deduplication mechanism for `writeBinIface` and `readBinIface`. This allows us to provide a list of deduplication tables for serialisation that can be extended more easily, for example for `IfaceTyCon`, see the issue https://gitlab.haskell.org/ghc/ghc/-/issues/24540 for more motivation. In addition to this refactoring, we split `UserData` into `ReaderUserData` and `WriterUserData`, to avoid partial functions and reduce overall memory usage, as we need fewer mutable variables. Bump haddock submodule to accomodate for `UserData` split. ------------------------- Metric Increase: MultiLayerModulesTH_Make MultiLayerModulesRecomp T21839c ------------------------- Split `BinHandle` into `ReadBinHandle` and `WriteBinHandle` A `BinHandle` contains too much information for reading data. For example, it needs to keep a `FastMutInt` and a `IORef BinData`, when the non-mutable variants would suffice. Additionally, this change has the benefit that anyone can immediately tell whether the `BinHandle` is used for reading or writing. Bump haddock submodule BinHandle split. Add Eq and Ord instance to `IfaceType` We add an `Ord` instance so that we can store `IfaceType` in a `Data.Map` container. This is required to deduplicate `IfaceType` while writing `.hi` files to disk. Deduplication has many beneficial consequences to both file size and memory usage, as the deduplication enables implicit sharing of values. See issue #24540 for more motivation. The `Ord` instance would be unnecessary if we used a `TrieMap` instead of `Data.Map` for the deduplication process. While in theory this is clerarly the better option, experiments on the agda code base showed that a `TrieMap` implementation has worse run-time performance characteristics. To the change itself, we mostly derive `Eq` and `Ord`. This requires us to change occurrences of `FastString` with `LexicalFastString`, since `FastString` has no `Ord` instance. We change the definition of `IfLclName` to a newtype of `LexicalFastString`, to make such changes in the future easier. Bump haddock submodule for IfLclName changes Move out LiteralMap to avoid cyclic module dependencies Add deduplication table for `IfaceType` The type `IfaceType` is a highly redundant, tree-like data structure. While benchmarking, we realised that the high redundancy of `IfaceType` causes high memory consumption in GHCi sessions when byte code is embedded into the `.hi` file via `-fwrite-if-simplified-core` or `-fbyte-code-and-object-code`. Loading such `.hi` files from disk introduces many duplicates of memory expensive values in `IfaceType`, such as `IfaceTyCon`, `IfaceTyConApp`, `IA_Arg` and many more. We improve the memory behaviour of GHCi by adding an additional deduplication table for `IfaceType` to the serialisation of `ModIface`, similar to how we deduplicate `Name`s and `FastString`s. When reading the interface file back, the table allows us to automatically share identical values of `IfaceType`. To provide some numbers, we evaluated this patch on the agda code base. We loaded the full library from the `.hi` files, which contained the embedded core expressions (`-fwrite-if-simplified-core`). Before this patch: * Load time: 11.7 s, 2.5 GB maximum residency. After this patch: * Load time: 7.3 s, 1.7 GB maximum residency. This deduplication has the beneficial side effect to additionally reduce the size of the on-disk interface files tremendously. For example, on agda, we reduce the size of `.hi` files (with `-fwrite-if-simplified-core`): * Before: 101 MB on disk * Now: 24 MB on disk This has even a beneficial side effect on the cabal store. We reduce the size of the store on disk: * Before: 341 MB on disk * Now: 310 MB on disk Note, none of the dependencies have been compiled with `-fwrite-if-simplified-core`, but `IfaceType` occurs in multiple locations in a `ModIface`. We also add IfaceType deduplication table to .hie serialisation and refactor .hie file serialisation to use the same infrastrucutre as `putWithTables`. Bump haddock submodule to accomodate for changes to the deduplication table layout and binary interface. Add run-time configurability of `.hi` file compression Introduce the flag `-fwrite-if-compression=<n>` which allows to configure the compression level of writing .hi files. The motivation is that some deduplication operations are too expensive for the average use case. Hence, we introduce multiple compression levels with variable impact on performance, but still reduce the memory residency and `.hi` file size on disk considerably. We introduce three compression levels: * `1`: `Normal` mode. This is the least amount of compression. It deduplicates only `Name` and `FastString`s, and is naturally the fastest compression mode. * `2`: `Safe` mode. It has a noticeable impact on .hi file size and is marginally slower than `Normal` mode. In general, it should be safe to always use `Safe` mode. * `3`: `Full` deduplication mode. Deduplicate as much as we can, resulting in minimal .hi files, but at the cost of additional compilation time. Reading .hi files doesn't need to know the initial compression level, and can always deserialise a `ModIface`, as we write out a byte that indicates the next value has been deduplicated. This allows users to experiment with different compression levels for packages, without recompilation of dependencies. Note, the deduplication also has an additional side effect of reduced memory consumption to implicit sharing of deduplicated elements. See https://gitlab.haskell.org/ghc/ghc/-/issues/24540 for example where that matters. ------------------------- Metric Decrease: MultiLayerModulesDefsGhciWithCore T16875 T21839c T24471 hard_hole_fits libdir ------------------------- Improve sharing of duplicated values in `ModIface`, fixes #24723 As a `ModIface` often contains duplicated values that are not necessarily shared, we improve sharing by serialising the `ModIface` to an in-memory byte array. Serialisation uses deduplication tables, and deserialisation implicitly shares duplicated values. This helps reducing the peak memory usage while compiling in `--make` mode. The peak memory usage is especially smaller when generating interface files with core expressions (`-fwrite-if-simplified-core`). On agda, this reduces the peak memory usage: * `2.2 GB` to `1.9 GB` for a ghci session. On `lib:Cabal`, we report: * `570 MB` to `500 MB` for a ghci session * `790 MB` to `667 MB` for compiling `lib:Cabal` with ghc There is a small impact on execution time, around 2% on the agda code base. Avoid unneccessarily re-serialising the `ModIface` To reduce memory usage of `ModIface`, we serialise `ModIface` to an in-memory byte array, which implicitly shares duplicated values. This serialised byte array can be reused to avoid work when we actually write the `ModIface` to disk. We introduce a new field to `ModIface` which allows us to save the byte array, and write it direclty to disk if the `ModIface` wasn't changed after the initial serialisation. This requires us to change absolute offsets, for example to jump to the deduplication table for `Name` or `FastString` with relative offsets, as the deduplication byte array doesn't contain header information, such as fingerprints. To allow us to dump the binary blob to disk, we need to replace all absolute offsets with relative ones. We introduce additional helpers for `ModIface` binary serialisation, which construct relocatable binary blobs. We say the binary blob is relocatable, if the binary representation can be moved and does not contain any absolute offsets. Further, we introduce new primitives for `Binary` that allow to create relocatable binaries, such as `forwardGetRel` and `forwardPutRel`. ------------------------- Metric Decrease: MultiLayerModulesDefsGhcWithCore Metric Increase: MultiComponentModules MultiLayerModules T10421 T12150 T12234 T12425 T13035 T13253-spj T13701 T13719 T14697 T15703 T16875 T18698b T18140 T18304 T18698a T18730 T18923 T20049 T24582 T5837 T6048 T9198 T9961 mhu-perf ------------------------- These metric increases may look bad, but they are all completely benign, we simply allocate 1 MB per module for `shareIface`. As this allocation is quite quick, it has a negligible impact on run-time performance. In fact, the performance difference wasn't measurable on my local machine. Reducing the size of the pre-allocated 1 MB buffer avoids these test failures, but also requires us to reallocate the buffer if the interface file is too big. These reallocations *did* have an impact on performance, which is why I have opted to accept all these metric increases, as the number of allocated bytes is merely a guidance. This 1MB allocation increase causes a lot of tests to fail that generally have a low allocation number. E.g., increasing from 40MB to 41MB is a 2.5% increase. In particular, the tests T12150, T13253-spj, T18140, T18304, T18698a, T18923, T20049, T24582, T5837, T6048, and T9961 only fail on i386-darwin job, where the number of allocated bytes seems to be lower than in other jobs. The tests T16875 and T18698b fail on i386-linux for the same reason. WIP: Lazy loading of IfaceDecl - - - - - 114 changed files: - compiler/GHC.hs - compiler/GHC/Core/Map/Expr.hs - compiler/GHC/Core/Opt/Pipeline.hs - compiler/GHC/Core/Opt/SpecConstr.hs - compiler/GHC/Core/TyCo/Rep.hs - compiler/GHC/CoreToIface.hs - compiler/GHC/Data/FastString.hs - compiler/GHC/Data/Graph/Directed.hs - + compiler/GHC/Data/Graph/Directed/Internal.hs - + compiler/GHC/Data/Graph/Directed/Reachability.hs - + compiler/GHC/Data/OsPath.hs - compiler/GHC/Data/TrieMap.hs - compiler/GHC/Driver/Backpack.hs - compiler/GHC/Driver/Config/StgToCmm.hs - compiler/GHC/Driver/DynFlags.hs - compiler/GHC/Driver/Env.hs - compiler/GHC/Driver/Flags.hs - compiler/GHC/Driver/Hooks.hs - compiler/GHC/Driver/Main.hs - + compiler/GHC/Driver/Main.hs-boot - compiler/GHC/Driver/Make.hs - compiler/GHC/Driver/Pipeline.hs - compiler/GHC/Driver/Pipeline/Execute.hs - compiler/GHC/Driver/Session.hs - compiler/GHC/HsToCore.hs - compiler/GHC/HsToCore/Coverage.hs - compiler/GHC/HsToCore/Monad.hs - compiler/GHC/Iface/Binary.hs - compiler/GHC/Iface/Decl.hs - compiler/GHC/Iface/Env.hs - compiler/GHC/Iface/Ext/Binary.hs - compiler/GHC/Iface/Ext/Fields.hs - compiler/GHC/Iface/Ext/Utils.hs - compiler/GHC/Iface/Load.hs - compiler/GHC/Iface/Make.hs - compiler/GHC/Iface/Recomp.hs - compiler/GHC/Iface/Recomp/Binary.hs - compiler/GHC/Iface/Recomp/Flags.hs - compiler/GHC/Iface/Rename.hs - compiler/GHC/Iface/Syntax.hs - compiler/GHC/Iface/Tidy.hs - compiler/GHC/Iface/Type.hs - compiler/GHC/IfaceToCore.hs - compiler/GHC/IfaceToCore.hs-boot - compiler/GHC/Linker/Deps.hs - compiler/GHC/Linker/Loader.hs - compiler/GHC/Linker/Types.hs - compiler/GHC/Rename/Names.hs - compiler/GHC/Runtime/Context.hs - compiler/GHC/Runtime/Loader.hs - compiler/GHC/Stg/CSE.hs - compiler/GHC/Stg/Utils.hs - compiler/GHC/StgToByteCode.hs - compiler/GHC/StgToCmm.hs - compiler/GHC/StgToCmm/Config.hs - compiler/GHC/StgToCmm/Hpc.hs - compiler/GHC/StgToJS/Object.hs - compiler/GHC/Tc/Errors/Hole.hs - compiler/GHC/Tc/Gen/Splice.hs - compiler/GHC/Tc/Module.hs - compiler/GHC/Tc/Utils/Backpack.hs - compiler/GHC/Tc/Utils/Monad.hs - compiler/GHC/Types/Basic.hs - compiler/GHC/Types/FieldLabel.hs - compiler/GHC/Types/Name.hs - compiler/GHC/Types/Name/Ppr.hs - compiler/GHC/Types/Var.hs - compiler/GHC/Unit/Env.hs - compiler/GHC/Unit/Finder.hs - compiler/GHC/Unit/Module/Graph.hs - compiler/GHC/Unit/Module/ModGuts.hs - compiler/GHC/Unit/Module/ModIface.hs - compiler/GHC/Unit/State.hs - compiler/GHC/Utils/Binary.hs - compiler/GHC/Utils/Binary/Typeable.hs - compiler/GHC/Utils/TmpFs.hs - compiler/Language/Haskell/Syntax/Type.hs - compiler/Language/Haskell/Syntax/Type.hs-boot - compiler/cbits/genSym.c - compiler/ghc.cabal.in - docs/users_guide/expected-undocumented-flags.txt - docs/users_guide/phases.rst - docs/users_guide/using-optimisation.rst - ghc/GHCi/UI.hs - ghc/GHCi/UI/Monad.hs - ghc/Main.hs - libraries/ghc-internal/src/GHC/Internal/Exception/Type.hs - rts/Interpreter.c - rts/include/rts/storage/InfoTables.h - + testsuite/tests/bytecode/T24634/T24634.stdout - + testsuite/tests/bytecode/T25510/Makefile - + testsuite/tests/bytecode/T25510/T25510A.hs - + testsuite/tests/bytecode/T25510/T25510B.hs - + testsuite/tests/bytecode/T25510/all.T - testsuite/tests/plugins/simple-plugin/Simple/RemovePlugin.hs - testsuite/tests/simplCore/should_compile/all.T - + testsuite/tests/th/cross-package/Cross.hs - + testsuite/tests/th/cross-package/CrossDep.hs - + testsuite/tests/th/cross-package/CrossDepApi.hs - + testsuite/tests/th/cross-package/CrossLocal.hs - + testsuite/tests/th/cross-package/CrossNum.hs - + testsuite/tests/th/cross-package/CrossNum.hs-boot - + testsuite/tests/th/cross-package/CrossObj.hs - + testsuite/tests/th/cross-package/CrossPackage.stdout - + testsuite/tests/th/cross-package/Makefile - + testsuite/tests/th/cross-package/all.T - + testsuite/tests/th/cross-package/dep.conf - + testsuite/tests/th/cross-package/obj.conf - + testsuite/tests/th/cross-package/prep.bash - + testsuite/tests/th/cross-package/run.bash - + testsuite/tests/th/cross-package/unit1 - + testsuite/tests/th/cross-package/unit2 - testsuite/tests/wasm/should_run/control-flow/LoadCmmGroup.hs - utils/haddock The diff was not included because it is too large. View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/7a3a509dd5259f5a5b87d7138225b70... -- View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/7a3a509dd5259f5a5b87d7138225b70... You're receiving this email because of your account on gitlab.haskell.org.