
Hi ghc-devs, I've been working on a new mode of adding cost-centres to programs and I'd like to ask some questions and solicit some feedback. The code is here [fn:1], it works, provided one enables -fprof-core on all modules. I've recently been trying to pick some low hanging fruit from ghc compilation performance. A common frustration was in the difference between profiled and non-profiled builds. Often I thought had I found a problem in the profiled build, only to find it was optimized away in the non-profiled build. Several times an issue was tail-calls not happening in profiled builds. To solve this problem I've been working on a new way of inserting cost-centres: adding them to core after simplification (currently at the end of corePrepPgm) rather than adding them to HsSyn before simplification. This makes it harder to map cost-centres into source code (You have to -ddump-prep currently), but in exchange you are profiling the same core program as the non-profilied build. I intend to investigate whether I can use SourceNotes to create SrcSpans for the generated cost-centres to somewhat alleviate the need to inspect dumped core. There are several new flags: -fprof-core: Enables the aforementioned mode. This is mutually exclusive with -fprof-auto etc. -fprof-core-drop-ticks: Non-user ticks are dropped from unfoldings(though I don't know how to do this yet). -fprof-core-tick-binds: ticks are inserted around the RHS of bindings (except top-level unlifted bindings). -fprof-core-tick-cases: ticks are inserted around the scrutinees of cases. -fprof-core-tick-alts: ticks are inserted around Alt expressions (unless there is only one). Some questions: I need to strip (probably only non-user) ticks out of unfoldings before they are substituted into a module that uses -fprof-core. Where is the right place to do this? I need inlining to proceed exactly as if the ticks were not present, however I don't want to strip ticks when the unfoldings are created as other modules may still need them. Is the end of corePrepPgm the right place to insert the cost-centres? I chose it because it can't affect any core optimizations if it's last, but perhaps it could be earlier, or perhaps it needs to act on Stg? Do you have any examples of programs for which existing profiling tools are inadequate due to how cost-centres affect simplification? There is an example in #12893 but something self-contained would be great! Regards, Doug Wilson