[GHC] #8590: Reduce code size of CAFs

#8590: Reduce code size of CAFs ------------------------------------+------------------------------------- Reporter: parcs | Owner: parcs Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler (NCG) | Version: 7.7 Keywords: | Operating System: Unknown/Multiple Architecture: Unknown/Multiple | Type of failure: None/Unknown Difficulty: Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | ------------------------------------+------------------------------------- The size of each CAF could be reduced by moving the code that allocates the blackhole indirection closure out from the CAF's closure body and into `newCAF`. This saves around 50 bytes of object code per CAF. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8590 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8590: Reduce code size of CAFs -------------------------------------+------------------------------------ Reporter: parcs | Owner: parcs Type: feature request | Status: patch Priority: normal | Milestone: Component: Compiler (NCG) | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Unknown/Multiple Type of failure: None/Unknown | Difficulty: Unknown Test Case: | Blocked By: Blocking: | Related Tickets: -------------------------------------+------------------------------------ Changes (by parcs): * status: new => patch Comment: Proposed patch attached. To illustrate the change in code generation, consider the module {{{ #!haskell module CAF where a = "test" }}} Previously, the output Cmm looked like {{{ #!c CAF.a_entry() // [R1] { info_tbl: [(cCO, label: CAF.a_info rep:HeapRep static { Thunk })] stack_info: arg_space: 8 updfr_space: Just 8 } {offset cCO: _rnT::P64 = R1; if ((Sp + -16) < SpLim) goto cCP; else goto cCQ; cCQ: Hp = Hp + 16; if (Hp > HpLim) goto cCS; else goto cCR; cCS: HpAlloc = 16; goto cCP; cCP: R1 = _rnT::P64; call (stg_gc_enter_1)(R1) args: 8, res: 0, upd: 8; cCR: I64[Hp - 8] = stg_CAF_BLACKHOLE_info; I64[Hp] = CurrentTSO; (_cCK::I64) = call "ccall" arg hints: [PtrHint, PtrHint, PtrHint] result hints: [] newCAF(BaseReg, _rnT::P64, Hp - 8); if (_cCK::I64 == 0) goto cCM; else goto cCL; cCM: call (I64[_rnT::P64])() args: 8, res: 0, upd: 8; cCL: I64[Sp - 16] = stg_bh_upd_frame_info; P64[Sp - 8] = Hp - 8; R2 = cCN_str; Sp = Sp - 16; call GHC.CString.unpackCString#_info(R2) args: 24, res: 0, upd: 24; } }}} Now, the output Cmm looks like {{{ #!c CAF.a_entry() // [R1] { info_tbl: [(cCN, label: CAF.a_info rep:HeapRep static { Thunk })] stack_info: arg_space: 8 updfr_space: Just 8 } {offset cCN: if ((Sp + -16) < SpLim) goto cCO; else goto cCP; cCO: R1 = R1; call (stg_gc_enter_1)(R1) args: 8, res: 0, upd: 8; cCP: (_cCJ::I64) = call "ccall" arg hints: [PtrHint, PtrHint] result hints: [PtrHint] newCAF(BaseReg, R1); if (_cCJ::I64 == 0) goto cCL; else goto cCK; cCL: call (I64[R1])() args: 8, res: 0, upd: 8; cCK: I64[Sp - 16] = stg_bh_upd_frame_info; I64[Sp - 8] = _cCJ::I64; R2 = cCM_str; Sp = Sp - 16; call GHC.CString.unpackCString#_info(R2) args: 24, res: 0, upd: 24; } }}} Comparing object sizes: Before: {{{ text data bss dec hex filename 146 32 0 178 b2 caf.o }}} After: {{{ text data bss dec hex filename 92 32 0 124 7c caf.o }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8590#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8590: Reduce code size of CAFs -------------------------------------+------------------------------------ Reporter: parcs | Owner: parcs Type: feature request | Status: patch Priority: normal | Milestone: Component: Compiler (NCG) | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Unknown/Multiple Type of failure: None/Unknown | Difficulty: Unknown Test Case: | Blocked By: Blocking: | Related Tickets: -------------------------------------+------------------------------------ Comment (by nomeata): Nice. Have you also measured the effect on the runtime of programs? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8590#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8590: Reduce code size of CAFs -------------------------------------+------------------------------------ Reporter: parcs | Owner: parcs Type: feature request | Status: patch Priority: normal | Milestone: Component: Compiler (NCG) | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Unknown/Multiple Type of failure: None/Unknown | Difficulty: Unknown Test Case: | Blocked By: Blocking: | Related Tickets: -------------------------------------+------------------------------------ Comment (by parcs): Replying to [comment:2 nomeata]:
Nice. Have you also measured the effect on the runtime of programs?
No, I haven't. Doing so now. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8590#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8590: Reduce code size of CAFs -------------------------------------+------------------------------------ Reporter: parcs | Owner: parcs Type: feature request | Status: patch Priority: normal | Milestone: Component: Compiler (NCG) | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Unknown/Multiple Type of failure: None/Unknown | Difficulty: Unknown Test Case: | Blocked By: Blocking: | Related Tickets: -------------------------------------+------------------------------------ Comment (by parcs): Runtime results are mixed (likely noise I think) Allocation doesn't change except with cacheprof which sees an increase of 0.6%.. Binary size changes between -3.5% to -1.3% Compile-time allocation changes between -10.3% to 0% -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8590#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8590: Reduce code size of CAFs -------------------------------------+------------------------------------ Reporter: parcs | Owner: parcs Type: feature request | Status: patch Priority: normal | Milestone: Component: Compiler (NCG) | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Unknown/Multiple Type of failure: None/Unknown | Difficulty: Unknown Test Case: | Blocked By: Blocking: | Related Tickets: -------------------------------------+------------------------------------ Comment (by simonmar): Yes, this is a good plan. I think I considered it before but I was worried about the cost of `allocate` vs. inline allocation in the CAF code, but I think a 2% reduction in code size is worth far more, and CAF entry is a one-time cost. So go ahead! -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8590#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8590: Reduce code size of CAFs -------------------------------------+------------------------------------ Reporter: parcs | Owner: parcs Type: feature request | Status: patch Priority: normal | Milestone: Component: Compiler (NCG) | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Unknown/Multiple Type of failure: None/Unknown | Difficulty: Unknown Test Case: | Blocked By: Blocking: | Related Tickets: -------------------------------------+------------------------------------ Comment (by simonpj): Looks great to me. Could you bear to write a `Note [How CAFs are compiled]` in `StgCmmBind` (and perhaps cross-refer from `Storage.c`) so that there is a little writeup of the logic involved. Or maybe the Note belongs somewhere else. It would be good to cover (briefly) * What code we generate for CAFs * How and why they are reverted * What the CAF list is and why it is needed. Alternatively, elaborate this page [wiki:Commentary/Rts/Storage/GC/CAFs], and cross-refer to it from the code. While it's all fresh in your mind. Doubtless you had to infer lots of stuff from the code! Simon -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8590#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8590: Reduce code size of CAFs -------------------------------------+------------------------------------ Reporter: parcs | Owner: parcs Type: feature request | Status: patch Priority: normal | Milestone: Component: Compiler (NCG) | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Unknown/Multiple Type of failure: None/Unknown | Difficulty: Unknown Test Case: | Blocked By: Blocking: | Related Tickets: -------------------------------------+------------------------------------ Comment (by simonmar): Simon, this is pretty well documented already in `Storage.c`, all the points you mention are already covered. In fact some of it is *also* documented in `StgCmmBind`. I suggest: * remove the comments in `StgCmmBind`, replace with a pointer to `Storage.c`. Multiple comments describing the same thing are bad, because when making changes we'll tend to only update one of them. * edit the comments in `Storage.c` to bring them up to date with the changes. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8590#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8590: Reduce code size of CAFs -------------------------------------+------------------------------------ Reporter: parcs | Owner: parcs Type: feature request | Status: patch Priority: normal | Milestone: Component: Compiler (NCG) | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Unknown/Multiple Type of failure: None/Unknown | Difficulty: Unknown Test Case: | Blocked By: Blocking: | Related Tickets: -------------------------------------+------------------------------------ Comment (by simonpj): Great! Sorry I was just looking at the patch itself. Indeed a pointer from (the appropriate point in) `StgCmmBind` to the detailed description in `Storage.c` would be good. Also from the wiki page I referenced. Thanks parcs Simon -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8590#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8590: Reduce code size of CAFs -------------------------------------+------------------------------------ Reporter: parcs | Owner: parcs Type: feature request | Status: patch Priority: normal | Milestone: Component: Compiler (NCG) | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Unknown/Multiple Type of failure: None/Unknown | Difficulty: Unknown Test Case: | Blocked By: Blocking: | Related Tickets: -------------------------------------+------------------------------------ Comment (by parcs): Thanks for the comments. I've deduplicated and updated the CAF documentation in a followup patch. In the meantime, I noticed an oversight in the original patch. `lockCAF()` was not setting the profiling header of the newly-allocated blackhole. To fix this, I changed the following line {{{ #!c bh->header.info = &stg_CAF_BLACKHOLE_info; }}} to {{{ #!c SET_HDR(bh, &stg_CAF_BLACKHOLE_info, caf->header.prof.ccs); }}} which has the blackhole inherit the CCS from the respective CAF, equivalent to what the old code did. I'm going to commit this patch now. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8590#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8590: Reduce code size of CAFs
-------------------------------------+------------------------------------
Reporter: parcs | Owner: parcs
Type: feature request | Status: patch
Priority: normal | Milestone:
Component: Compiler (NCG) | Version: 7.7
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture: Unknown/Multiple
Type of failure: None/Unknown | Difficulty: Unknown
Test Case: | Blocked By:
Blocking: | Related Tickets:
-------------------------------------+------------------------------------
Comment (by Patrick Palka

#8590: Reduce code size of CAFs
-------------------------------------+------------------------------------
Reporter: parcs | Owner: parcs
Type: feature request | Status: patch
Priority: normal | Milestone:
Component: Compiler (NCG) | Version: 7.7
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture: Unknown/Multiple
Type of failure: None/Unknown | Difficulty: Unknown
Test Case: | Blocked By:
Blocking: | Related Tickets:
-------------------------------------+------------------------------------
Comment (by Patrick Palka

#8590: Reduce code size of CAFs -------------------------------------+------------------------------------ Reporter: parcs | Owner: parcs Type: feature request | Status: closed Priority: normal | Milestone: Component: Compiler (NCG) | Version: 7.7 Resolution: fixed | Keywords: Operating System: Unknown/Multiple | Architecture: Unknown/Multiple Type of failure: None/Unknown | Difficulty: Unknown Test Case: | Blocked By: Blocking: | Related Tickets: -------------------------------------+------------------------------------ Changes (by thoughtpolice): * status: patch => closed * resolution: => fixed Comment: Thanks Patrick! -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8590#comment:12 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8590: Reduce code size of CAFs -------------------------------------+------------------------------------ Reporter: parcs | Owner: parcs Type: feature request | Status: closed Priority: normal | Milestone: Component: Compiler (NCG) | Version: 7.7 Resolution: fixed | Keywords: Operating System: Unknown/Multiple | Architecture: Unknown/Multiple Type of failure: None/Unknown | Difficulty: Unknown Test Case: | Blocked By: Blocking: | Related Tickets: -------------------------------------+------------------------------------ Comment (by parcs): This patch removed the last user of the constructor `LambdaFormInfo.LFBlackHole`. A comment in the code alludes to removing it, and now it is possible to do so. See https://ghc.haskell.org/trac/ghc/browser/ghc/compiler/codeGen/StgCmmClosure.... So, should it be removed now? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8590#comment:13 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8590: Reduce code size of CAFs -------------------------------------+------------------------------------ Reporter: parcs | Owner: parcs Type: feature request | Status: closed Priority: normal | Milestone: Component: Compiler (NCG) | Version: 7.7 Resolution: fixed | Keywords: Operating System: Unknown/Multiple | Architecture: Unknown/Multiple Type of failure: None/Unknown | Difficulty: Unknown Test Case: | Blocked By: Blocking: | Related Tickets: -------------------------------------+------------------------------------ Comment (by simonmar): Yes, go ahead and remove it. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8590#comment:14 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8590: Reduce code size of CAFs -------------------------------------+------------------------------------ Reporter: parcs | Owner: parcs Type: feature request | Status: closed Priority: normal | Milestone: Component: Compiler (NCG) | Version: 7.7 Resolution: fixed | Keywords: Operating System: Unknown/Multiple | Architecture: Unknown/Multiple Type of failure: None/Unknown | Difficulty: Unknown Test Case: | Blocked By: Blocking: | Related Tickets: -------------------------------------+------------------------------------ Comment (by parcs): I am attempting to further reduce the code size of CAFs by moving the CAF- updating code out from each CAF and into a shared "CAF" info-table. For example, consider the module {{{ #!haskell module CAF where a = "test" }}} Currently, the Cmm outputted for this module looks like {{{ #!c [section "data" { CAF.a_closure: const CAF.a_info; const 0; const 0; const 0; }, section "readonly" { cCM_str: I8[] [116,101,115,116] }, CAF.a_entry() // [R1] { info_tbl: [(cCN, label: CAF.a_info rep:HeapRep static { Thunk })] stack_info: arg_space: 8 updfr_space: Just 8 } {offset cCN: if ((Sp + -16) < SpLim) goto cCO; else goto cCP; cCO: R1 = R1; call (stg_gc_enter_1)(R1) args: 8, res: 0, upd: 8; cCP: (_cCJ::I64) = call "ccall" arg hints: [PtrHint, PtrHint] result hints: [PtrHint] newCAF(BaseReg, R1); if (_cCJ::I64 == 0) goto cCL; else goto cCK; cCL: call (I64[R1])() args: 8, res: 0, upd: 8; cCK: I64[Sp - 16] = stg_bh_upd_frame_info; I64[Sp - 8] = _cCJ::I64; R2 = cCM_str; Sp = Sp - 16; call GHC.CString.unpackCString#_info(R2) args: 24, res: 0, upd: 24; } }] }}} Each CAF is augmented with code that handles the updating of the CAF itself. This is the `newCAF()` stuff shown above. It should be possible to refactor the CAF-updating out of the entry code of a CAF and into a special "CAF" info table. For example, the above code could look like {{{ #!c [section "data" { CAF.a_closure: const stg_CAF_info; // XXX new info table const 0; const 0; const CAF.a_info; // pointer to the "real" info table // this field corresponds to the saved_info field of an StgIndstatic }, section "readonly" { cCK_str: I8[] [116,101,115,116] }, CAF.a_entry() // [R1] { info_tbl: [(cCL, label: CAF.a_info rep:HeapRep static { Thunk })] stack_info: arg_space: 8 updfr_space: Just 8 } {offset cCL: // no CAF boilerplate here! R2 = cCK_str; call GHC.CString.unpackCString#_info(R2) args: 8, res: 0, upd: 8; } }] }}} where `stg_CAF_info` is the info table that encapsulates the CAF-specific code: {{{ #!c INFO_TABLE(stg_CAF, 0, 0, THUNK_STATIC, "CAF", "CAF") (P_ node) { P_ bh; W_ info; STK_CHK_GEN(); info = StgIndStatic_saved_info(node); ("ptr" bh) = ccall newCAF(BaseReg "ptr", node "ptr"); if (bh == 0) { jump %GET_ENTRY(node) (); } else { push (stg_bh_upd_frame_info, bh) { jump (%ENTRY_CODE(info)) (); } } } }}} Firstly, I wonder whether this approach is feasible. Is there a reason why CAF updates are not implemented this way in the first place? I have a tentative patch that implements this approach, and it does produce correct results given contrived input -- but when I try bootstrap GHC with the patch, the dll-split program (which is built against ghc- stage1) segfaults. So there is strong reason to believe that there is something flawed about the idea and/or the implementation. My STG-fu is still weak, and I am failing to figure out the source of the problem. dll-split does not have any CAFs Does anybody have any insights on Does anyone have any insights on this approach of updating CAFs and on what may be causing the segfault? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8590#comment:15 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8590: Reduce code size of CAFs -------------------------------------+------------------------------------ Reporter: parcs | Owner: parcs Type: feature request | Status: closed Priority: normal | Milestone: Component: Compiler (NCG) | Version: 7.7 Resolution: fixed | Keywords: Operating System: Unknown/Multiple | Architecture: Unknown/Multiple Type of failure: None/Unknown | Difficulty: Unknown Test Case: | Blocked By: Blocking: | Related Tickets: -------------------------------------+------------------------------------ Comment (by simonmar): I think it's a good idea, and it's great that you're finding ways to squeeze the code size. Keep at it! :)
jump (%ENTRY_CODE(info)) ()
I think you'll want that to be {{{ jump (%ENTRY_CODE(info)) (node) }}} One other thing you'll need to look at is reverting CAFs, which will need to put the `stg_CAF_info` back, rather than the saved info pointer. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8590#comment:16 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8590: Reduce code size of CAFs -------------------------------------+------------------------------------ Reporter: parcs | Owner: parcs Type: feature request | Status: closed Priority: normal | Milestone: Component: Compiler (NCG) | Version: 7.7 Resolution: fixed | Keywords: Operating System: Unknown/Multiple | Architecture: Unknown/Multiple Type of failure: None/Unknown | Difficulty: Unknown Test Case: | Blocked By: Blocking: | Related Tickets: -------------------------------------+------------------------------------ Changes (by Tarrasch): * cc: miffoljud@… (added) Comment: Hi Patrick, did you make any new progress on your `tmp.diff` patch? Did you create a new ticket for that? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8590#comment:17 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC