-
012af357
by Cheng Shao at 2024-12-09T09:46:45-08:00
compiler: implement --show-iface-abi-hash major mode
-
0f665588
by Cheng Shao at 2024-12-09T09:46:45-08:00
Oneshot bytecode linking
-
a54376cf
by Torsten Schmits at 2024-12-09T09:46:45-08:00
Package deps bytecode linking
-
44c8c978
by Ian-Woo Kim at 2024-12-09T09:46:45-08:00
set extra_decls = Nothing in interpreter after interface generation
-
d695ed07
by Ian-Woo Kim at 2024-12-09T09:46:45-08:00
No in-memory resident mi_extra_decls in compilation.
They are transiently loaded and removed after byte-code generation.
-
27a4a8bc
by Ben Gamari at 2024-12-09T10:20:35-08:00
rts: Tighten up invariants of PACK
-
0756f0b5
by Ben Gamari at 2024-12-09T10:20:40-08:00
StgToByteCode: Don't assume that data con workers are nullary
Previously StgToByteCode assumed that all data-con workers were of a
nullary representation. This is not a valid assumption, as seen
in #23210, where an unsaturated application of a unary data
constructor's worker resulted in invalid bytecode. Sadly, I have not yet
been able to reduce a minimal testcase for this.
Fixes #23210.
-
b335f856
by Cheng Shao at 2024-12-10T10:52:14-08:00
driver: fix hpc undefined symbol issue in TH with -fprefer-byte-code
This commit fixes an undefined symbol error in RTS linker when
attempting to compile home modules with -fhpc and
-fbyte-code-and-object-code/-fprefer-byte-code, see #25510 for
detailed description and analysis of the bug.
Also adds T25510/T25510c regression tests to test make mode/oneshot
mode of the bug.
backported to GHC 9.10.
-
2261e59e
by Rebecca Turner at 2024-12-10T13:55:31-08:00
ghc-internal: No trailing whitespace in exceptions
This is a backport of the *behavior* in
https://gitlab.haskell.org/ghc/ghc/-/commit/bfe600f5bb3ecd2c8fa71c536c63d3c46984e3f8
The commit upstream depends on a bunch of other changes to the exception
reporting infrastructure, so I've chosen to recreate its behavior here
rather than pulling in all the dependent patches.
This fixes a regression where GHC 9.10.1 adds a trailing newline to the
`displayException` implementation for `SomeException`. This has been
reverted in `master` but 9.10.2 isn't out yet and there's various
changes the upstream commit depends on, so this is a simple one-line
fix.
See: https://gitlab.haskell.org/ghc/ghc/-/issues/25052
-
68f4225c
by Ian-Woo Kim at 2024-12-11T14:49:55-08:00
disabled stub dynamic object generation for one-shot byte-code linking.
-
97da9d9e
by Torsten Schmits at 2024-12-25T08:52:35-08:00
refactor quadratic search in warnMissingHomeModules
-
ba330bbd
by Rodrigo Mesquita at 2024-12-25T09:49:02-08:00
Improve reachability queries on ModuleGraph
Introduces `ReachabilityIndex`, an index constructed from a
`GHC.Data.Graph.Directed` `Graph` that supports fast reachability
queries (in $O(1)$). This abstract data structure is exposed from
`GHC.Data.Graph.Directed.Reachability`.
This index is constructed from the module graph nodes and cached in
`ModuleGraph`, enabling efficient reachability queries on the module
graph. Previously, we'd construct a Map of Set of ModuleGraph nodes
which used a lot of memory (`O(n^2)` in the number of nodes) and cache
that in the `ModuleGraph`. By using the reachability index we get rid of
this space leak in the module graph -- even though the index is still
quadratic in the number of modules, it is much, much more space
efficient due to its representation using an IntMap of IntSet as opposed
to the transitive closure we previously cached.
In a memory profile of MultiLayerModules with 100x100 modules, memory
usage improved from 6GB residency to 2.8GB, out of which roughly 1.8GB
are caused by a second space leak related to ModuleGraph. On the same
program, it brings compile time from 7.5s to 5.5s.
Note how we simplify `checkHomeUnitsClosed` in terms of
`isReachableMany` and by avoiding constructing a second graph with the
full transitive closure -- it suffices to answer the reachability query
on the full graph without collapsing the transitive closure completely
into nodes.
Unfortunately, solving this leak means we have to do a little bit more
work since we can no longer cache the result of turning vertex indices
into nodes. This results in a slight regression in MultiLayerModulesTH_Make,
but results in large performance and memory wins when compiling large
amounts of modules.
-------------------------
Metric Decrease:
mhu-perf
Metric Increase:
MultiLayerModulesTH_Make
-------------------------
-
53ecc06c
by Ian-Woo Kim at 2024-12-25T16:45:39-08:00
dummy flag -fpackage-db-byte-code
-
3573c6fe
by Ian-Woo Kim at 2025-01-08T06:58:32-08:00
similarize the parallel downsweep to GHC HEAD version.
-
9da11f8f
by Matthew Pickering at 2025-01-08T08:29:47-08:00
Use deterministic names for temporary files
When there are multiple threads they can race to create a temporary
file, in some situations the thread will create ghc_1.c and in some it
will create ghc_2.c. This filename ends up in the debug info for object
files after compiling a C file, therefore contributes to object
nondeterminism.
In order to fix this we store a prefix in `TmpFs` which serves to
namespace temporary files. The prefix is populated from the counter in
TmpFs when the TmpFs is forked. Therefore the TmpFs must be forked
outside the thread which consumes it, in a deterministic order, so each
thread always receives a TmpFs with the same prefix.
This assumes that after the initial TmpFs is created, all other TmpFs
are created from forking the original TmpFs. Which should have been try
anyway as otherwise there would be file collisions and non-determinism.
Fixes #25224
-
3a484a7b
by Ian-Woo Kim at 2025-01-08T14:47:34-08:00
monotonic FinderCache. missed part from parallel downsweep latest GHC HEAD patch
-
2c4d9f61
by Andreas Klebinger at 2025-02-28T15:04:43-08:00
SpecConstr: Introduce a separate argument limit for forced specs.
We used to put no limit at all on specializations forced via the SPEC
argument. This isn't always reasonable so we introduce a very high limit
that applies to forced specializations, a flag to control it, and we now
emit a warning if we fail a specialization because we exceed the
warning.
Fixes #25197
-
1c80ba27
by Torsten Schmits at 2025-10-30T17:58:26+01:00
Load TH deps from home unit states of the modules that import them
-
4d6ffa22
by Torsten Schmits at 2025-11-06T16:10:01+01:00
remove redundant core bindings typecheck in initWholeCoreBindings
-
75206a24
by Georgios Karachalias at 2025-11-19T13:58:58+01:00
Use OsPath in PkgDbRef and UnitDatabase, not FilePath
-
ad1d03ec
by Matthew Pickering at 2025-12-11T19:44:01+01:00
Various downsweep perf tweaks
-
f20cef54
by Torsten Schmits at 2025-12-11T19:44:01+01:00
Abstract out parts of mkUnitState into a handler type
-
3ecb31bc
by Fendor at 2025-12-18T17:28:42+00:00
Refactor the Binary serialisation interface
The goal is simplifiy adding deduplication tables to `ModIface`
interface serialisation.
We identify two main points of interest that make this difficult:
1. UserData hardcodes what `Binary` instances can have deduplication
tables. Moreover, it heavily uses partial functions.
2. GHC.Iface.Binary hardcodes the deduplication tables for 'Name' and
'FastString', making it difficult to add more deduplication.
Instead of having a single `UserData` record with fields for all the
types that can have deduplication tables, we allow to provide custom
serialisers for any `Typeable`.
These are wrapped in existentials and stored in a `Map` indexed by their
respective `TypeRep`.
The `Binary` instance of the type to deduplicate still needs to
explicitly look up the decoder via `findUserDataReader` and
`findUserDataWriter`, which is no worse than the status-quo.
`Map` was chosen as microbenchmarks indicate it is the fastest for a
small number of keys (< 10).
To generalise the deduplication table serialisation mechanism, we
introduce the types `ReaderTable` and `WriterTable` which provide a
simple interface that is sufficient to implement a general purpose
deduplication mechanism for `writeBinIface` and `readBinIface`.
This allows us to provide a list of deduplication tables for
serialisation that can be extended more easily, for example for
`IfaceTyCon`, see the issue https://gitlab.haskell.org/ghc/ghc/-/issues/24540
for more motivation.
In addition to this refactoring, we split `UserData` into `ReaderUserData`
and `WriterUserData`, to avoid partial functions and reduce overall
memory usage, as we need fewer mutable variables.
Bump haddock submodule to accomodate for `UserData` split.
-------------------------
Metric Increase:
MultiLayerModulesTH_Make
MultiLayerModulesRecomp
T21839c
-------------------------
Split `BinHandle` into `ReadBinHandle` and `WriteBinHandle`
A `BinHandle` contains too much information for reading data.
For example, it needs to keep a `FastMutInt` and a `IORef BinData`,
when the non-mutable variants would suffice.
Additionally, this change has the benefit that anyone can immediately
tell whether the `BinHandle` is used for reading or writing.
Bump haddock submodule BinHandle split.
Add Eq and Ord instance to `IfaceType`
We add an `Ord` instance so that we can store `IfaceType` in a
`Data.Map` container.
This is required to deduplicate `IfaceType` while writing `.hi` files to
disk. Deduplication has many beneficial consequences to both file size
and memory usage, as the deduplication enables implicit sharing of
values.
See issue #24540 for more motivation.
The `Ord` instance would be unnecessary if we used a `TrieMap` instead
of `Data.Map` for the deduplication process. While in theory this is
clerarly the better option, experiments on the agda code base showed
that a `TrieMap` implementation has worse run-time performance
characteristics.
To the change itself, we mostly derive `Eq` and `Ord`. This requires us
to change occurrences of `FastString` with `LexicalFastString`, since
`FastString` has no `Ord` instance.
We change the definition of `IfLclName` to a newtype of
`LexicalFastString`, to make such changes in the future easier.
Bump haddock submodule for IfLclName changes
Move out LiteralMap to avoid cyclic module dependencies
Add deduplication table for `IfaceType`
The type `IfaceType` is a highly redundant, tree-like data structure.
While benchmarking, we realised that the high redundancy of `IfaceType`
causes high memory consumption in GHCi sessions when byte code is
embedded into the `.hi` file via `-fwrite-if-simplified-core` or
`-fbyte-code-and-object-code`.
Loading such `.hi` files from disk introduces many duplicates of
memory expensive values in `IfaceType`, such as `IfaceTyCon`,
`IfaceTyConApp`, `IA_Arg` and many more.
We improve the memory behaviour of GHCi by adding an additional
deduplication table for `IfaceType` to the serialisation of `ModIface`,
similar to how we deduplicate `Name`s and `FastString`s.
When reading the interface file back, the table allows us to automatically
share identical values of `IfaceType`.
To provide some numbers, we evaluated this patch on the agda code base.
We loaded the full library from the `.hi` files, which contained the
embedded core expressions (`-fwrite-if-simplified-core`).
Before this patch:
* Load time: 11.7 s, 2.5 GB maximum residency.
After this patch:
* Load time: 7.3 s, 1.7 GB maximum residency.
This deduplication has the beneficial side effect to additionally reduce
the size of the on-disk interface files tremendously.
For example, on agda, we reduce the size of `.hi` files (with
`-fwrite-if-simplified-core`):
* Before: 101 MB on disk
* Now: 24 MB on disk
This has even a beneficial side effect on the cabal store. We reduce the
size of the store on disk:
* Before: 341 MB on disk
* Now: 310 MB on disk
Note, none of the dependencies have been compiled with
`-fwrite-if-simplified-core`, but `IfaceType` occurs in multiple
locations in a `ModIface`.
We also add IfaceType deduplication table to .hie serialisation and
refactor .hie file serialisation to use the same infrastrucutre as
`putWithTables`.
Bump haddock submodule to accomodate for changes to the deduplication
table layout and binary interface.
Add run-time configurability of `.hi` file compression
Introduce the flag `-fwrite-if-compression=<n>` which allows to
configure the compression level of writing .hi files.
The motivation is that some deduplication operations are too expensive
for the average use case. Hence, we introduce multiple compression
levels with variable impact on performance, but still reduce the
memory residency and `.hi` file size on disk considerably.
We introduce three compression levels:
* `1`: `Normal` mode. This is the least amount of compression.
It deduplicates only `Name` and `FastString`s, and is naturally the
fastest compression mode.
* `2`: `Safe` mode. It has a noticeable impact on .hi file size and is
marginally slower than `Normal` mode. In general, it should be safe to
always use `Safe` mode.
* `3`: `Full` deduplication mode. Deduplicate as much as we can,
resulting in minimal .hi files, but at the cost of additional
compilation time.
Reading .hi files doesn't need to know the initial compression level,
and can always deserialise a `ModIface`, as we write out a byte that
indicates the next value has been deduplicated.
This allows users to experiment with different compression levels for
packages, without recompilation of dependencies.
Note, the deduplication also has an additional side effect of reduced
memory consumption to implicit sharing of deduplicated elements.
See https://gitlab.haskell.org/ghc/ghc/-/issues/24540 for example where
that matters.
-------------------------
Metric Decrease:
MultiLayerModulesDefsGhciWithCore
T16875
T21839c
T24471
hard_hole_fits
libdir
-------------------------
Improve sharing of duplicated values in `ModIface`, fixes #24723
As a `ModIface` often contains duplicated values that are not
necessarily shared, we improve sharing by serialising the `ModIface`
to an in-memory byte array. Serialisation uses deduplication tables, and
deserialisation implicitly shares duplicated values.
This helps reducing the peak memory usage while compiling in
`--make` mode. The peak memory usage is especially smaller when
generating interface files with core expressions
(`-fwrite-if-simplified-core`).
On agda, this reduces the peak memory usage:
* `2.2 GB` to `1.9 GB` for a ghci session.
On `lib:Cabal`, we report:
* `570 MB` to `500 MB` for a ghci session
* `790 MB` to `667 MB` for compiling `lib:Cabal` with ghc
There is a small impact on execution time, around 2% on the agda code
base.
Avoid unneccessarily re-serialising the `ModIface`
To reduce memory usage of `ModIface`, we serialise `ModIface` to an
in-memory byte array, which implicitly shares duplicated values.
This serialised byte array can be reused to avoid work when we actually
write the `ModIface` to disk.
We introduce a new field to `ModIface` which allows us to save the byte
array, and write it direclty to disk if the `ModIface` wasn't changed
after the initial serialisation.
This requires us to change absolute offsets, for example to jump to the
deduplication table for `Name` or `FastString` with relative offsets, as
the deduplication byte array doesn't contain header information, such as
fingerprints.
To allow us to dump the binary blob to disk, we need to replace all
absolute offsets with relative ones.
We introduce additional helpers for `ModIface` binary serialisation, which
construct relocatable binary blobs. We say the binary blob is relocatable,
if the binary representation can be moved and does not contain any
absolute offsets.
Further, we introduce new primitives for `Binary` that allow to create
relocatable binaries, such as `forwardGetRel` and `forwardPutRel`.
-------------------------
Metric Decrease:
MultiLayerModulesDefsGhcWithCore
Metric Increase:
MultiComponentModules
MultiLayerModules
T10421
T12150
T12234
T12425
T13035
T13253-spj
T13701
T13719
T14697
T15703
T16875
T18698b
T18140
T18304
T18698a
T18730
T18923
T20049
T24582
T5837
T6048
T9198
T9961
mhu-perf
-------------------------
These metric increases may look bad, but they are all completely benign,
we simply allocate 1 MB per module for `shareIface`. As this allocation
is quite quick, it has a negligible impact on run-time performance.
In fact, the performance difference wasn't measurable on my local
machine. Reducing the size of the pre-allocated 1 MB buffer avoids these
test failures, but also requires us to reallocate the buffer if the
interface file is too big. These reallocations *did* have an impact on
performance, which is why I have opted to accept all these metric
increases, as the number of allocated bytes is merely a guidance.
This 1MB allocation increase causes a lot of tests to fail that
generally have a low allocation number. E.g., increasing from 40MB to
41MB is a 2.5% increase.
In particular, the tests T12150, T13253-spj, T18140, T18304, T18698a,
T18923, T20049, T24582, T5837, T6048, and T9961 only fail on i386-darwin
job, where the number of allocated bytes seems to be lower than in other
jobs.
The tests T16875 and T18698b fail on i386-linux for the same reason.
WIP: Lazy loading of IfaceDecl