
#9476: Implement late lambda-lifting -------------------------------------+------------------------------------- Reporter: simonpj | Owner: sgraf Type: feature request | Status: closed Priority: normal | Milestone: 8.8.1 Component: Compiler | Version: 7.8.2 Resolution: fixed | Keywords: LateLamLift Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #8763 #13286 | Differential Rev(s): Phab:D5224 Wiki Page: LateLamLift | -------------------------------------+------------------------------------- Comment (by sgraf): Replying to [comment:70 simonpj]:
* To do this, either we need to add a RTS flag or... doesn't it happen anyway when we have a very small nursery? Each time the nursery runs out, we GC, and with -G1 that's a major GC. What am I missing?
Yes, doing `-G1` and varying the initial heap size with `-A$iM` was exactly what I have been doing to find out a close approximation of the maximum residency (the one of related to program semantics, not GC samples). (Note that when a GC happens with `-G1`, the heap is resized to 2*bytes copied according to `-S`, so the 'nursery' grows in `-G1`. Also I don't think that we can vary GC frequency through an RTS flag, because to my knowledge, the decision when to GC is entirely made by the mutator) Example: Let's say I start with `-A40M` (always `-G1` from now on) and let's assume that the picture shows the actual heap profile (e.g., with maximum sampling frequency), where the points in time at which GC runs are indicated in red: [[Image(https://i.imgur.com/RpeuCXc.jpg, 400px)]] The maximum residency observed by GC is smaller than the actual maximum residency, which I call the maximum working set for the remainder of this post for disambiguation. How can we make the sampled maximum residency approximate the maximum working set? We can vary the GC frequency by setting the initial heap size to a different value. It turns out that within the integer range of 1 to 200, `-A56M` seems to be the candidate with the closest approximation (closest meaning it sampled the highest maximum residency): [[Image(https://i.imgur.com/BfvbKSl.jpg, 400px)]] In particular, we don't really care how often major GC happens, only that one major GC happens ''immediately before'' the working set collapses. The `-A56M` is highly specific to the program (in this case, `default` above) and doesn't really matter, it's just the parameterisation where the sampled maximum residency most closely approximates the actual maximum working set. It's similar to modular arithmetic, if that's of any help: Imagine you want to find a number below 20 with the closest multiple to 73 (prime), but not larger than that. There's 8*9=72 or 6*12 or 4*18, etc. It doesn't really matter which factor you choose, point is that you get close to 73. The specific parameterisation for `allow-cg` to approximate its maximum working set was `-A76M -G1`, but it doesn't matter; what matters is that the sampled maximum residency for `default` was lower than `allow-cg` (192 vs. 196MB), so everything as expected.
Then I believe you are saying "Ignoring closure growth has no effect on residency". Is that right?
Well, it's hard to tell how close those samples are to the actual maximum working set, but under the assumption that they are close enough, I'd say that ignoring closure growth indeed leads to a bigger working set in this case (as to be expected), although it doesn't really manifest itself in the majority of GC parameterisations I sampled (cf. the second-to-last paragraph in comment:65). -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9476#comment:71 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler