[GHC] #12900: Common up identical info tables

#12900: Common up identical info tables -------------------------------------+------------------------------------- Reporter: dobenour | Owner: Type: feature | Status: new request | Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Keywords: | Operating System: Unknown/Multiple Architecture: | Type of failure: Runtime Unknown/Multiple | performance bug Test Case: | Blocked By: Blocking: | Related Tickets: Differential Rev(s): | Wiki Page: -------------------------------------+------------------------------------- Consider {{{#!hs data Bool' = No | Yes }}} It is clear that this is identical to `Bool` at runtime, and so GHC should be able to reuse `Bool`'s code. The same holds for all other pairs of isomorphic data types. I think that a **massive** space savings could be achieved this way. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12900 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12900: Common up identical info tables -------------------------------------+------------------------------------- Reporter: dobenour | Owner: Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by osa1): I've been thinking about this idea for a while, I think it's a great idea. I'd be willing to work on that if @simonpj or others can comment on how possible this is. I agree that we could save massive amount of space. I think what we can do is, we can give same info tables same names, so if we have {{{#!haskell data X = A | B data Bool = True | False }}} Both of these get the same info table `hs_s_s_info` (`s` is for "singleton"). Similarly, {{{#!haskell data Either a b = Left a | Right b }}} gets info table `hs_p_p_info` ('p' is for "pointer") etc. Then when linking we ignore multiple definitions of same info table symbols (not sure if this is possible easily). -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12900#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12900: Common up identical info tables -------------------------------------+------------------------------------- Reporter: dobenour | Owner: Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by osa1): We discussed this on the IRC channel and @ezyang pointed out two things: - Assuming this is only for merging same info tables, we can go even further by reordering fields so that {{{data T1 = T1 Int# Bool}}} and {{{data T2 = T2 Bool Int#}}} would get same info tables. This obviously needs a lot more work but saves more space. - This breaks {{{-h}}} which works even in non-profiled mode. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12900#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12900: Common up identical info tables -------------------------------------+------------------------------------- Reporter: dobenour | Owner: Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): Why do you say "massive" amount of space? Do you have data? My guess would be that, except in pathological situation (e.g. thousands of identical data types) the saving would be vanishingly small. Also I think that info tables contain strings that identify the type for debuggers etc; ghci has such a debugger built in. You'd lose this entirely. So I'm a skeptical that the gain justifies the pain (implementation complexity, loss of functionality). But data might convince! If you are looking for code space saving, then I think a more profitable place might be commoning up bits of Cmm code. E.g. consider {{{ f x y = case x of True -> False; False -> y }}} We push a return address, evaluate `x`; when we come back we take a conditional jump based on the tag and, in the `True` case we just return a fixed constructor `False`. Lots of code just returns `False`! If all that code was commoned up, we might get a big space saving. Maybe there are some sequences that are SO common we can put them in a big library, always available in the RTS, so all libraries could share. We'd need automation to find these common sequences and see which ones happened a lot. I remember a PhD student trying this years ago (20 yrs?) with some success. Simon -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12900#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12900: Common up identical info tables -------------------------------------+------------------------------------- Reporter: dobenour | Owner: Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by dobenour): simonpj: I remember reading that one of the reasons GHC-compiled binaries tend to be large is that each data type has its own info table and entry code. I suspect that most data types have one of a few common forms. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12900#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12900: Common up identical info tables -------------------------------------+------------------------------------- Reporter: dobenour | Owner: Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by rwbarton): Maybe start by measuring the total code size used by constructors (say in the `ghc` library itself). Slightly tricky because the starts of info tables aren't marked by symbols, I think. I'd be surprised if they account for more than about 10% of the total code size, myself. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12900#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC