[GHC] #12891: Automate symbols inclusion in RtsSymbols.c from Rts.h

#12891: Automate symbols inclusion in RtsSymbols.c from Rts.h -------------------------------------+------------------------------------- Reporter: Phyx- | Owner: Type: task | Status: new Priority: normal | Milestone: Component: Build System | Version: 8.0.1 Keywords: newcomer | Operating System: Unknown/Multiple Architecture: | Type of failure: None/Unknown Unknown/Multiple | Test Case: | Blocked By: Blocking: | Related Tickets: #12846 Differential Rev(s): | Wiki Page: -------------------------------------+------------------------------------- Public symbols that we export in the public headers (rooted from Rts.h) need to be added to the symbols table of the RTS. We currently do this by hand but the list can quickly grow out of sync. Investigate a way to automate either the inclusion or a check during build time. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12891 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12891: Automate symbols inclusion in RtsSymbols.c from Rts.h -------------------------------------+------------------------------------- Reporter: Phyx- | Owner: Type: task | Status: new Priority: normal | Milestone: Component: Build System | Version: 8.0.1 Resolution: | Keywords: newcomer Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #12846 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by RyanGlScott): * cc: RyanGlScott (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12891#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12891: Automate symbols inclusion in RtsSymbols.c from Rts.h -------------------------------------+------------------------------------- Reporter: Phyx- | Owner: Type: task | Status: new Priority: normal | Milestone: Component: Build System | Version: 8.0.1 Resolution: | Keywords: newcomer Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #12846 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by dobenour): Can we use Language-C? It is a C parser written in Haskell, and we could generate the output from the parse tree. Not sure if it can parse everything in `Rts.h`, though. If there is a problem, it would most probably be in the system headers included by `Rts.h`, not `Rts.h` itself. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12891#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12891: Automate symbols inclusion in RtsSymbols.c from Rts.h -------------------------------------+------------------------------------- Reporter: Phyx- | Owner: (none) Type: task | Status: new Priority: normal | Milestone: Component: Build System | Version: 8.0.1 Resolution: | Keywords: newcomer Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #12846 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by chris_r_timmons): * Attachment "T12891.py" added. Python 3 script that attempts to determine which symbols in includes/Rts.h do not appear in rts/RtsSymbols.c. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12891 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12891: Automate symbols inclusion in RtsSymbols.c from Rts.h -------------------------------------+------------------------------------- Reporter: Phyx- | Owner: (none) Type: task | Status: new Priority: normal | Milestone: Component: Build System | Version: 8.0.1 Resolution: | Keywords: newcomer Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #12846 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by chris_r_timmons): I've attached a Python 3 script that attempts to show what symbols in includes/Rts.h are missing from rts/RtsSymbols.c. A comment at the top of the script contains an overview of the algorithm and usage details. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12891#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12891: Automate symbols inclusion in RtsSymbols.c from Rts.h -------------------------------------+------------------------------------- Reporter: Phyx- | Owner: (none) Type: task | Status: new Priority: normal | Milestone: Component: Build System | Version: 8.0.1 Resolution: | Keywords: newcomer Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #12846 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): Very interesting. This looks great; thanks chris_r_timmons! I suppose given that the script needs to be run on a built tree it would be easiest to just run this at the end of `validate`. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12891#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12891: Automate symbols inclusion in RtsSymbols.c from Rts.h -------------------------------------+------------------------------------- Reporter: Phyx- | Owner: bgamari Type: task | Status: new Priority: normal | Milestone: 8.4.1 Component: Build System | Version: 8.0.1 Resolution: | Keywords: newcomer Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #12846 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * owner: (none) => bgamari * milestone: => 8.4.1 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12891#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12891: Automate symbols inclusion in RtsSymbols.c from Rts.h -------------------------------------+------------------------------------- Reporter: Phyx- | Owner: bgamari Type: task | Status: new Priority: normal | Milestone: 8.4.1 Component: Build System | Version: 8.0.1 Resolution: | Keywords: newcomer Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #12846 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by chris_r_timmons): I'd like to move forward on this with a submission to Phabricator, but I'd appreciate some guidance as to whether my approach is correct. My plan is: 1. Create a new folder $TOP/utils/checkRTS 2. Place the python script (renamed to checkRTS.py) and a README file in that folder 3. As suggested, modify $TOP/validate to call $TOP/utils/checkRTS/checkRTS.py 4. checkRTS.py will print its findings to stdout Does this sound feasible? Thanks. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12891#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12891: Automate symbols inclusion in RtsSymbols.c from Rts.h -------------------------------------+------------------------------------- Reporter: Phyx- | Owner: bgamari Type: task | Status: new Priority: normal | Milestone: 8.4.1 Component: Build System | Version: 8.0.1 Resolution: | Keywords: newcomer Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #12846 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by Phyx-): @chris_r_timmons that sounds fine to me. Though I think this should run before the tests are run, not after, since Stage2 may not work this will tell you why. I also think it should be a hard error if something is wrong. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12891#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12891: Automate symbols inclusion in RtsSymbols.c from Rts.h -------------------------------------+------------------------------------- Reporter: Phyx- | Owner: bgamari Type: task | Status: new Priority: normal | Milestone: 8.4.1 Component: Build System | Version: 8.0.1 Resolution: | Keywords: newcomer Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #12846 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by Phyx-): So looking at the script, two comments: There's no guarantee that there's a GCC on the path. On Windows you may not have one. So you have two choices here. You can either hardcode inplace/mingw/bin/gcc for Windows, or you can ask ghc for the C compiler it's using. (e.g. calling ghc --info). If you do the second one, you have the issue that some configurations use clang, so you have to guard against that. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12891#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12891: Automate symbols inclusion in RtsSymbols.c from Rts.h -------------------------------------+------------------------------------- Reporter: Phyx- | Owner: bgamari Type: task | Status: new Priority: normal | Milestone: 8.6.1 Component: Build System | Version: 8.0.1 Resolution: | Keywords: newcomer Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #12846 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by chris_r_timmons): I've developed a possible fix for this ticket, but I have some questions about the nature of GHC's runtime symbols. Depending on the answers, I'm afraid my fix will just push the problem to a different location, and not really be a fix at all. This ticket's description states that it's error-prone for a developer to keep GHC's runtime symbols in sync between the runtime system's C header files (*.h) and C source files (*.c). The goal is to remove the developer from having to decide if the runtime symbols are in sync, and have an algorithm perform that task instead. The runtime symbols naturally fall into mathematical sets: * Set "A": All runtime function symbols in `includes/Rts.h` (and its included headers). * Set "B": All runtime function symbols exported by the `rtsSyms[]` array in `rts/RtsSymbols.c`. It should be a simple matter of calculating the set difference `(A - B)` to discover what symbols aren't being exported. Unfortunately, it's not so simple. Set "A" has about 870 symbols, and set "B" about 560. This means roughly 310 symbols aren't being exported. These symbols appear to be legitimately intended to never be exported. We'll call this set "C": * Set "C": All runtime function symbols in `includes/Rts.h` intentionally **NOT** exported by the `rtsSyms[]` array in `rts/RtsSymbols.c`. Now the algorithm becomes `(A - B) - C`, which works. But there's a problem. The contents of sets "A" and "B" can be determined algorithmically. However, the contents of set "C" must be determined semantically - i.e. a developer must decide what symbols should and should not be in set "C". The goal of removing the developer from the decision-making process has not been met. Questions: * Should set "C" exist? As long as it does, a completely algorithmic solution will never be possible, because a developer will still be required to make a decision about the contents of set "C". In other words, the current problem of a developer forgetting to add a symbol to set "B" (the rtsSyms[] array in rts/RtsSymbols.c), is replaced by a new problem of a developer deciding what symbols belong in set "C". My fix would just shift the problem to new location (along with the addition of about 1000 lines of new code and documentation in GHC's code base). * The only fix I can think of is to eliminate set "C" by requiring any symbol appearing in set "A" to also appear in set "B". Is this feasible? Thanks. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12891#comment:10 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

Should set "C" exist? As long as it does, a completely algorithmic solution will never be possible, because a developer will still be required to make a decision about the contents of set "C". In other words,
#12891: Automate symbols inclusion in RtsSymbols.c from Rts.h -------------------------------------+------------------------------------- Reporter: Phyx- | Owner: bgamari Type: task | Status: new Priority: normal | Milestone: 8.6.1 Component: Build System | Version: 8.0.1 Resolution: | Keywords: newcomer Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #12846 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by Phyx-): Hi, Thanks for looking into this again. So to give some context, not all symbols in set `B` are actually important. `rtsSyms` exists to cover up a deficiency of the fact that the runtime linker itself is also a haskell process: We cannot have two versions of the rts loaded in one process. This means we cannot link against `libRTS` to provide the symbols that are required for a Haskell program to run when loading in haskell libraries. This works by providing a set of symbols that will be loaded from the running Haskell program. Effectively providing the code just loaded access the the running RTS. Now why don't we just export all symbols from the running process? Because we don't want to force the user to have to use the same implementation of standard functions that we have chosen, and more importantly, we don't want to conflict if the user does specify another implementation. We actually do have this specific case, those are the symbols exported by the `SymI_HasProto_deprecated` macro but these are a manually curated set. So essentially set `B` should be all `SymI` values excluding `SymI_HasProto_deprecated` ones. However in this particular case it doesn't matter much. But just so you know what the symbols do. Now why only `SymI`? because in certain cases, like when the code is dynamically linked, the symbols can just be gotten from the dynamically loaded shared libraries. In that case the runtime linker is not the one providing the symbols, but we do know that they are there. the current problem of a developer forgetting to add a symbol to set "B" (the rtsSyms[] array in rts/RtsSymbols.c), is replaced by a new problem of a developer deciding what symbols belong in set "C". My fix would just shift the problem to new location (along with the addition of about 1000 lines of new code and documentation in GHC's code base). Can you give a few examples of which symbols are in set C? one way to reduce this set C may be to check the symbols defined in the rts. and restrict C to only those symbols that are actually defined in one of the RTS libraries. The RTS libraries are ABI compatible (mostly, I think only profiling isn't but I don't remember of the top of my head.).
The only fix I can think of is to eliminate set "C" by requiring any symbol appearing in set "A" to also appear in set "B". Is this feasible?
Probably not, aside from the runtime cost it carries it also increases the risk of symbol collisions with user libraries. But again this really depends on what's in C. I guess the way forward really depends on what's in C. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12891#comment:11 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12891: Automate symbols inclusion in RtsSymbols.c from Rts.h -------------------------------------+------------------------------------- Reporter: Phyx- | Owner: bgamari Type: task | Status: new Priority: normal | Milestone: 8.6.1 Component: Build System | Version: 8.0.1 Resolution: | Keywords: newcomer Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #12846 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by chris_r_timmons): * Attachment "Set_A_-_Symbols_from_includes-rts_dot_h" added. Runtime symbols set "A", as decribed in comment #10. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12891 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12891: Automate symbols inclusion in RtsSymbols.c from Rts.h -------------------------------------+------------------------------------- Reporter: Phyx- | Owner: bgamari Type: task | Status: new Priority: normal | Milestone: 8.6.1 Component: Build System | Version: 8.0.1 Resolution: | Keywords: newcomer Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #12846 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by chris_r_timmons): * Attachment "Set_B_-_Symbols_from_rts-RtsSymbols_dot_c" added. Runtime symbols set "B", as decribed in comment 10. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12891 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12891: Automate symbols inclusion in RtsSymbols.c from Rts.h -------------------------------------+------------------------------------- Reporter: Phyx- | Owner: bgamari Type: task | Status: new Priority: normal | Milestone: 8.6.1 Component: Build System | Version: 8.0.1 Resolution: | Keywords: newcomer Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #12846 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by chris_r_timmons): * Attachment "Set_C_-_Symbols_from_includes-rts_dot_h NOT exported from rts-RtsSymbols_dot_c" added. Runtime symbols set "C", as decribed in comment 10. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12891 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12891: Automate symbols inclusion in RtsSymbols.c from Rts.h -------------------------------------+------------------------------------- Reporter: Phyx- | Owner: bgamari Type: task | Status: new Priority: normal | Milestone: 8.6.1 Component: Build System | Version: 8.0.1 Resolution: | Keywords: newcomer Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #12846 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by chris_r_timmons): I've attached three files containing runtime symbol sets as described in the comment above. The symbols were generated on Windows/MSYS2, using GHC code that's about two weeks old. Given what I'm learning about the runtime system, I'm not optimistic that the detection of missing runtime symbols can be fully automated. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12891#comment:12 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12891: Automate symbols inclusion in RtsSymbols.c from Rts.h -------------------------------------+------------------------------------- Reporter: Phyx- | Owner: bgamari Type: task | Status: new Priority: normal | Milestone: 8.6.1 Component: Build System | Version: 8.0.1 Resolution: | Keywords: newcomer Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #12846 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by Phyx-): hmm set C is somewhat suspicious.. some like `barf` `debugBelch` etc are internal symbols. what command did you use to generate the list? If they really are reachable from `Rts.h` it may be we have to move them to `internals` include file. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12891#comment:13 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12891: Automate symbols inclusion in RtsSymbols.c from Rts.h -------------------------------------+------------------------------------- Reporter: Phyx- | Owner: bgamari Type: task | Status: new Priority: normal | Milestone: 8.6.1 Component: Build System | Version: 8.0.1 Resolution: | Keywords: newcomer Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #12846 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by chris_r_timmons): Set `C` is just `A - B`. Referencing the attached `T12891.py` file, set `A` is generated with the `get_symbol_names` function. Set `B` is created by the `get_symbol_names_from_rts_symbols_c` function. (In the Python script I hard coded set `C`. I anticipated a developer would adjust its contents for correctness, at least until I could figure out a way to remove that requirement.) Regarding `barf` and `debugBelch` - they're declared in `includes/rts/Messages.h`, which is included by `includes/Rts.h`. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12891#comment:14 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12891: Automate symbols inclusion in RtsSymbols.c from Rts.h -------------------------------------+------------------------------------- Reporter: Phyx- | Owner: bgamari Type: task | Status: new Priority: normal | Milestone: 8.6.1 Component: Build System | Version: 8.0.1 Resolution: | Keywords: newcomer Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #12846 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by Phyx-): Yeah, I think we indeed have a problem. `Rts.h` exposes symbols that are there so you can load the rts in another program and control it. These aren't needed by the runtime linker so I prefer not to have them in the symbol list. So I think you're right in that we can't really automate this. Not unless we add these symbols too, but that costs us a small amount of time at runtime for symbols that will likely never be used.. what do you think @bgamari? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12891#comment:15 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC