Yes, this is exactly one of the issues that marge might run into as well, the aggregate ends up performing differently from the individual ones. Now we have marge to ensure that at least the aggregate builds together, which is the whole point of these merge trains. Not to end up in a situation where two patches that are fine on their own, end up to produce a broken merged state that doesn't build anymore.
Now we have marge to ensure every commit is buildable. Next we should run regression tests on all commits on master (and that includes each and everyone that marge brings into master. Then we have visualisation that tells us how performance metrics go up/down over time, and we can drill down into commits if they yield interesting results in either way.
Now lets say you had a commit that should have made GHC 50% faster across the board, but somehow after the aggregate with other patches this didn't happen anymore? We'd still expect this to somehow show in each of the singular commits on master right?