[GHC] #15113: Do not make CAFs from literal strings

#15113: Do not make CAFs from literal strings -------------------------------------+------------------------------------- Reporter: simonpj | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.2.2 Keywords: | Operating System: Unknown/Multiple Architecture: | Type of failure: None/Unknown Unknown/Multiple | Test Case: | Blocked By: Blocking: | Related Tickets: Differential Rev(s): | Wiki Page: -------------------------------------+------------------------------------- Currently (as I discovered in #15038), we get the following code for `GHC.Exception.Base.patError`: {{{ lvl2_r3y3 :: [Char] [GblId] lvl2_r3y3 = unpackCString# lvl1_r3y2 -- RHS size: {terms: 7, types: 6, coercions: 2, joins: 0/0} patError :: forall a. Addr# -> a [GblId, Arity=1, Str=x, Unf=OtherCon []] patError = \ (@ a_a2kh) (s_a1Pi :: Addr#) -> raise# @ SomeException @ 'LiftedRep @ a_a2kh (Control.Exception.Base.$fExceptionPatternMatchFail_$ctoException ((untangle s_a1Pi lvl2_r3y3) `cast` (Sym (Control.Exception.Base.N:PatternMatchFail[0]) :: (String :: *) ~R# (PatternMatchFail :: *)))) }}} That stupid `lvl2_r3y3 :: String` is a CAF, and hence `patError` has CAF- refs, and hence so does any function that calls `patError`, and any function that calls them. That's bad! Lots more CAF entries in SRTs, lots more work traversing those SRTs in the garbage collector. And for what? To share the work of unpacking a C string! This is nuts. What to do? * Somehow refrain from floating `unpackCSTring# lit` to top level, even if you could otherwise do so. But that seems very ad-hoc, and it make the function bigger and less inlinable. * Treat a top level definition {{{ x :: [Char] x = unpackCString# y }}} as NOT a CAF, and make it single-entry so that the thunk is not updated. Then every use of `x` will unpack the string afresh, which is probably a good idea anyhow. I like this more. It would be implemented somewhere in the code generator. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15113 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15113: Do not make CAFs from literal strings -------------------------------------+------------------------------------- Reporter: simonpj | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.2.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by osa1): * cc: osa1 (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15113#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15113: Do not make CAFs from literal strings -------------------------------------+------------------------------------- Reporter: simonpj | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.2.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): See comment:33 on #15038 for why this ticket (useful though it may be) will not actually benefit `patError` and friends. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15113#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15113: Do not make CAFs from literal strings -------------------------------------+------------------------------------- Reporter: simonpj | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.2.2 Resolution: | Keywords: CAFs Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by simonpj): * keywords: => CAFs -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15113#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15113: Do not make CAFs from literal strings -------------------------------------+------------------------------------- Reporter: simonpj | Owner: (none) Type: bug | Status: patch Priority: normal | Milestone: 8.8.1 Component: Compiler | Version: 8.2.2 Resolution: | Keywords: CAFs Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Phab:D4717 Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * status: new => patch * differential: => Phab:D4717 * milestone: 8.6.1 => 8.8.1 Comment: Phab:D4717 marks top-level strings as single-entry. I haven't yet had a chance to characterise the effect of this. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15113#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15113: Do not make CAFs from literal strings -------------------------------------+------------------------------------- Reporter: simonpj | Owner: (none) Type: bug | Status: patch Priority: normal | Milestone: 8.8.1 Component: Compiler | Version: 8.2.2 Resolution: | Keywords: CAFs Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Phab:D4717 Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): I like the idea of trying this out. It might actually make programs allocate more (because we start using call-by-name), but it's usually very short-lived allocation. To judge the effect, you might want to add a way to show the total SRT table sizes (statically); and the number of SRT links traversed by the garbage collector (dynamically). Reducing these two is the main point, so we should measure them. For the latter, perhaps add to the `-ticky` stats? Variants to explore * Broaden scope: do this for non-top-level bindings as well as top-level ones. (The Phab patch has bug on this point; it does it only for the non-top-level ones!) * Broaden scope: do this for any function marked CONLIKE. Currently those functions (in `libraries/`) are {{{ ./ghc-prim/GHC/CString.hs:74:{-# NOINLINE CONLIKE unpackCString# #-} ./ghc-prim/GHC/CString.hs:124:{-# NOINLINE CONLIKE unpackCStringUtf8# #-} ./bytestring/Data/ByteString/Builder/Prim/Internal.hs:74:#define CONLIKE ./bytestring/Data/ByteString/Builder/Prim/Internal.hs:157:{-# INLINE CONLIKE size #-} ./bytestring/Data/ByteString/Builder/Prim/Internal.hs:161:{-# INLINE CONLIKE runF #-} ./bytestring/Data/ByteString/Builder/Prim/Internal.hs:166:{-# INLINE CONLIKE emptyF #-} ./bytestring/Data/ByteString/Builder/Prim/Internal.hs:171:{-# INLINE CONLIKE pairF #-} ./bytestring/Data/ByteString/Builder/Prim/Internal.hs:185:{-# INLINE CONLIKE contramapF #-} ./bytestring/Data/ByteString/Builder/Prim/Internal.hs:190:{-# INLINE CONLIKE toB #-} ./bytestring/Data/ByteString/Builder/Prim/Internal.hs:195:{-# INLINE CONLIKE liftFixedToBounded #-} ./bytestring/Data/ByteString/Builder/Prim/Internal.hs:199:{-# INLINE CONLIKE storableToF #-} ./bytestring/Data/ByteString/Builder/Prim/Internal.hs:204:{-# INLINE CONLIKE liftIOF #-} ./bytestring/Data/ByteString/Builder/Prim/Internal.hs:218:{-# INLINE CONLIKE sizeBound #-} ./bytestring/Data/ByteString/Builder/Prim/Internal.hs:225:{-# INLINE CONLIKE runB #-} ./bytestring/Data/ByteString/Builder/Prim/Internal.hs:238:{-# INLINE CONLIKE contramapB #-} ./bytestring/Data/ByteString/Builder/Prim/Internal.hs:243:{-# INLINE CONLIKE emptyB #-} ./bytestring/Data/ByteString/Builder/Prim/Internal.hs:248:{-# INLINE CONLIKE pairB #-} ./bytestring/Data/ByteString/Builder/Prim/Internal.hs:264:{-# INLINE CONLIKE eitherB #-} ./bytestring/Data/ByteString/Builder/Prim/Internal.hs:277:{-# INLINE CONLIKE condB #-} }}} I have no idea if this idea would be a good one for those functions in `bytestring`; maybe not. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15113#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15113: Do not make CAFs from literal strings
-------------------------------------+-------------------------------------
Reporter: simonpj | Owner: (none)
Type: bug | Status: patch
Priority: normal | Milestone: 8.8.1
Component: Compiler | Version: 8.2.2
Resolution: | Keywords: CAFs
Operating System: Unknown/Multiple | Architecture:
| Unknown/Multiple
Type of failure: None/Unknown | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Rev(s): Phab:D4717
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by osa1):
I started looking into this, but I realized that nofib can't be run with
GHC 8.6 (and probably with HEAD) so I'll need to fix nofib first.
What I've done so far is I implemented two counters:
- A runtime counter for SRT traversals done by the GC
- A compiler counter to count how many SRTs are generated (I'm also
recording sum of sizes of the SRTs although this is probably not too
useful)
The runtime counter is printed with `+RTS -t`:
{{{
<

#15113: Do not make CAFs from literal strings -------------------------------------+------------------------------------- Reporter: simonpj | Owner: (none) Type: bug | Status: patch Priority: normal | Milestone: 8.8.1 Component: Compiler | Version: 8.2.2 Resolution: | Keywords: CAFs Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Phab:D4717 Wiki Page: | -------------------------------------+------------------------------------- Comment (by osa1): I managed to get nofib running. Here's an example output for runtime SRT traversals: {{{ NoFib Results -------------------------------------------------------------------------------- Program Size Allocs Runtime Elapsed TotalMem SRTScavs -------------------------------------------------------------------------------- ... circsim 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% clausify 0.0% 0.0% 0.013 0.013 0.0% 0.0% comp_lab_zift 0.0% 0.0% 0.068 0.068 0.0% 0.0% compress 0.0% 0.0% 0.054 0.054 0.0% 0.0% compress2 0.0% 0.0% 0.047 0.047 0.0% 0.0% constraints 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% ... Scavenged SRTs ------------------------------------------------------------------------------- Program nofib-log nofib-log ------------------------------------------------------------------------------- CS 627 0.0% CSD 622 0.0% FS 629 0.0% S 16363319 0.0% VS 2529 0.0% VSD 310 0.0% VSM 628 0.0% anna 7951 0.0% ansi 310 0.0% atom 2102 0.0% awards 310 0.0% ... }}} (all 0.0% becuase I used same file twice to run nofib-analyse) I have another program to parse compile-time numbers and generate this: {{{ PGM | SRTs | SRT size fluid | 5 | 25 wave4main | 25 | 112 integer | 17 | 74 ... }}} The code is here: https://github.com/osa1/nofib/tree/print_srt_stuff I also have a GHC branch that implements a new flag `-favoid- cafs=[0|1|2|3]` where 0 means compile as before, 1 means avoid making CAFs for top-level `unpackCString#`s, 2 means in addition to 1 avoid making CAFs for non-top-level `unpackCString#`s, 3 means in addition to 2 avoid making CAFs for CONLIKEs too. Code is here: https://github.com/osa1/ghc/tree/print_srt_stuff However for some reason the flag doesn't actually change number of SRTs or SRT sizes. I don't know why yet, I'll debug further. I think it may be because it's too late to change CAFFY-ness of values in CoreToStg (remember that CAFFY-enss of values are computed in CorePrep, which is before CoreToStg). I'll debug further. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15113#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15113: Do not make CAFs from literal strings -------------------------------------+------------------------------------- Reporter: simonpj | Owner: (none) Type: bug | Status: patch Priority: normal | Milestone: 8.8.1 Component: Compiler | Version: 8.2.2 Resolution: | Keywords: CAFs Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Phab:D4717 Wiki Page: | -------------------------------------+------------------------------------- Changes (by MikolajKonarski): * cc: MikolajKonarski (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15113#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15113: Do not make CAFs from literal strings -------------------------------------+------------------------------------- Reporter: simonpj | Owner: (none) Type: bug | Status: patch Priority: normal | Milestone: 8.10.1 Component: Compiler | Version: 8.2.2 Resolution: | Keywords: CAFs Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Phab:D4717 Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * milestone: 8.8.1 => 8.10.1 Comment: Bumping to 8.10. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15113#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15113: Do not make CAFs from literal strings -------------------------------------+------------------------------------- Reporter: simonpj | Owner: (none) Type: bug | Status: patch Priority: normal | Milestone: 8.10.1 Component: Compiler | Version: 8.2.2 Resolution: | Keywords: CAFs Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #16014 | Differential Rev(s): Phab:D4717 Wiki Page: | -------------------------------------+------------------------------------- Changes (by AndreasK): * related: => #16014 Comment: The string caf code also contributes significantly to code size. See #16014 for details. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15113#comment:10 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15113: Do not make CAFs from literal strings -------------------------------------+------------------------------------- Reporter: simonpj | Owner: (none) Type: bug | Status: patch Priority: normal | Milestone: 8.10.1 Component: Compiler | Version: 8.2.2 Resolution: | Keywords: CAFs Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #16014 | Differential Rev(s): Phab:D4717 Wiki Page: | -------------------------------------+------------------------------------- Old description:
Currently (as I discovered in #15038), we get the following code for `GHC.Exception.Base.patError`: {{{ lvl2_r3y3 :: [Char] [GblId] lvl2_r3y3 = unpackCString# lvl1_r3y2
-- RHS size: {terms: 7, types: 6, coercions: 2, joins: 0/0} patError :: forall a. Addr# -> a [GblId, Arity=1, Str=x, Unf=OtherCon []] patError = \ (@ a_a2kh) (s_a1Pi :: Addr#) -> raise# @ SomeException @ 'LiftedRep @ a_a2kh (Control.Exception.Base.$fExceptionPatternMatchFail_$ctoException ((untangle s_a1Pi lvl2_r3y3) `cast` (Sym (Control.Exception.Base.N:PatternMatchFail[0]) :: (String :: *) ~R# (PatternMatchFail :: *)))) }}} That stupid `lvl2_r3y3 :: String` is a CAF, and hence `patError` has CAF- refs, and hence so does any function that calls `patError`, and any function that calls them.
That's bad! Lots more CAF entries in SRTs, lots more work traversing those SRTs in the garbage collector. And for what? To share the work of unpacking a C string! This is nuts.
What to do?
* Somehow refrain from floating `unpackCSTring# lit` to top level, even if you could otherwise do so. But that seems very ad-hoc, and it make the function bigger and less inlinable.
* Treat a top level definition {{{ x :: [Char] x = unpackCString# y }}} as NOT a CAF, and make it single-entry so that the thunk is not updated. Then every use of `x` will unpack the string afresh, which is probably a good idea anyhow.
I like this more. It would be implemented somewhere in the code generator.
New description: Currently (as I discovered in #15038), we get the following code for `GHC.Exception.Base.patError`: {{{ lvl2_r3y3 :: [Char] [GblId] lvl2_r3y3 = unpackCString# lvl1_r3y2 -- RHS size: {terms: 7, types: 6, coercions: 2, joins: 0/0} patError :: forall a. Addr# -> a [GblId, Arity=1, Str=x, Unf=OtherCon []] patError = \ (@ a_a2kh) (s_a1Pi :: Addr#) -> raise# @ SomeException @ 'LiftedRep @ a_a2kh (Control.Exception.Base.$fExceptionPatternMatchFail_$ctoException ((untangle s_a1Pi lvl2_r3y3) `cast` (Sym (Control.Exception.Base.N:PatternMatchFail[0]) :: (String :: *) ~R# (PatternMatchFail :: *)))) }}} That stupid `lvl2_r3y3 :: String` is a CAF, and hence `patError` has CAF- refs, and hence so does any function that calls `patError`, and any function that calls them. That's bad! Lots more CAF entries in SRTs, lots more work traversing those SRTs in the garbage collector. And for what? To share the work of unpacking a C string! This is nuts. What to do? 1. Somehow refrain from floating `unpackCSTring# lit` to top level, even if you could otherwise do so. But that seems very ad-hoc, and it make the function bigger and less inlinable. 2. Treat a top level definition {{{ x :: [Char] x = unpackCString# y }}} as NOT a CAF, and make it single-entry so that the thunk is not updated. Then every use of `x` will unpack the string afresh, which is probably a good idea anyhow. I like this more. It would be implemented somewhere in the code generator. -- Comment (by simonpj): Looking at #16014, I like alternative (2) from the Description better and better. If we spot {{{ x = unpackCString# "blah"# }}} in the code generator, we could allocate a top-level closure with * Info-pointer: `rtsUnpackString_info` * One word of payload, a pointer to the literal string `"blah"#`. Now we can hand-write the single blob of code (plus info table) `rtsUnpackString_info` to unpack the string. Easy! And the overhead per string is only two words (for the closure) rather than all the stuff described in #16014. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15113#comment:11 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15113: Do not make CAFs from literal strings -------------------------------------+------------------------------------- Reporter: simonpj | Owner: (none) Type: bug | Status: patch Priority: normal | Milestone: 8.10.1 Component: Compiler | Version: 8.2.2 Resolution: | Keywords: CAFs, CodeGen Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #16014 | Differential Rev(s): Phab:D4717 Wiki Page: | -------------------------------------+------------------------------------- Changes (by simonpj): * keywords: CAFs => CAFs, CodeGen Comment: Omer: would you like to try this when you are done with `CafInfo`? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15113#comment:12 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC