
#9476: Implement late lambda-lifting -------------------------------------+------------------------------------- Reporter: simonpj | Owner: sgraf Type: feature request | Status: closed Priority: normal | Milestone: 8.8.1 Component: Compiler | Version: 7.8.2 Resolution: fixed | Keywords: LateLamLift Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #8763 #13286 | Differential Rev(s): Phab:D5224 Wiki Page: LateLamLift | -------------------------------------+------------------------------------- Comment (by sgraf): It seems this whole fuzz was about nothing. Obvious in hind-sight, I wasn't aware that the maximum residency and bytes copied are just based on sampling when the GC runs, which in turn depends on heap size. Although I played around with `-A` before, I just ran a script which would find the maximum residency over multiple different heap sizes, to get these numbers: {{{ $ ./default 19 +RTS -s -G1 -A56M 359,289,696 bytes allocated in the heap 313,229,544 bytes copied during GC 192,670,160 bytes maximum residency (4 sample(s)) 2,757,856 bytes maximum slop 183 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 4 colls, 0 par 0.405s 0.405s 0.1013s 0.2541s INIT time 0.001s ( 0.001s elapsed) MUT time 0.119s ( 0.119s elapsed) GC time 0.405s ( 0.405s elapsed) EXIT time 0.000s ( 0.000s elapsed) Total time 0.524s ( 0.525s elapsed) %GC time 0.0% (0.0% elapsed) Alloc rate 3,031,050,062 bytes per MUT second Productivity 22.6% of total user, 22.6% of total elapsed $ ./allow-cg 19 +RTS -s -G1 -A76M 401,485,600 bytes allocated in the heap 331,564,512 bytes copied during GC 196,161,944 bytes maximum residency (4 sample(s)) 1,389,968 bytes maximum slop 187 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 4 colls, 0 par 0.436s 0.441s 0.1102s 0.2637s INIT time 0.001s ( 0.001s elapsed) MUT time 0.132s ( 0.132s elapsed) GC time 0.436s ( 0.441s elapsed) EXIT time 0.000s ( 0.000s elapsed) Total time 0.569s ( 0.574s elapsed) %GC time 0.0% (0.0% elapsed) Alloc rate 3,049,743,734 bytes per MUT second Productivity 23.1% of total user, 23.0% of total elapsed }}} Still, the impact of GC parameters is very annoying. Also, when I vary `-A$iM`, where `$i` is an integer between 1 and 200, in 123 of 200 cases, the baseline has higher maximum residency than when we also decide to lift `go`. It seems that the GC parameterisation for the baseline is just in a bad (bitter?) spot. The question is, how do I sell this in the paper? I guess I could increase nursery size even further (I'm currently benchmarking with `-A128M -H1G`), but that's not very realistic, either... -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9476#comment:65 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler