
On Mon, Sep 22, 2014 at 4:52 AM, Eric Seidel
To provide an example, I'm currently working on a little game engine that uses JuicyPixels to load images. I have a problem in my code that needs optimizing, but the current state of things results in profiles that are very difficult to work with. JuicyPixels specifies -auto-all in its cabal file, which means I have no alternative but to profile JuicyPixels code. In this scenario, the bottleneck is actually within my FRP game loop and nothing to do with image loading! As a result, the profiles are fairly useless to me.
While in this case the extra profiling data may be useless, it seems to me that in general you won't know a priori. I would prefer to have as much data available as possible, and then filter it.
HPC works sort of like this, it gathers coverage data for all of the modules compiled with -fhpc, but has command-line options to only report coverage for certain modules. Perhaps GHC's rts could add a similar flag to only report profiling data from certain modules or functions?
I am sympathetic to this concern, but the main problem here is that -fprof-auto isn't free. It can interfere quite heavily with optimization passes, meaning that it ends up attributing much higher costs to certain functions than they would have in a normal compilation path, actively misleading users. I for one would have no confidence that profiles were pointing to the correct place if everything were built with -fprof-auto from the start. It's much better to start with just a few high-level cost-centers and drill down from there. This can be a tedious process, but it's much more reliable. As an alternative, compiling libraries with -fprof-auto-exported is fairly reasonable IMHO. John L.