
However, I do see there might be room for a project on the statistical profiler itself or its associated tooling. We just need to come to a conclusion on which direction is most appropriate for GHC.
You mean - given a choice between somehow reusing perf or going down the route of fully custom tooling?
For this having some concrete use-cases would be quite helpful. How do you envision using statistical profiling on Haskell projects? What is the minimal set of features that would make for a useful profiler?
That sounds like a good way to approach this. Here goes... I'd really prefer seeing a Haskell program as a black box, that I can profile using the same tools as C programs or native code generated from any other language. It shouldn't matter that the source is Haskell. In my ideal workflow, I have a *vanilla* Haskell program compiled with debug symbols by a *vanilla* GHC (no special ./configure options as prereqs), that I can hook up perf to, e.g. $ perf record -g ./mybinary Then I should be able to use perf report to analyze the results. Or indeed use existing pipelines to obtain other visualizations (flame graphs etc). I'm not particularly interested in integration with the event log, though others might have a need for that. I'm also interested in hotspot analysis, à la perf annotate. As Brendan Gregg says, "perf isn't some random tool: it's part of the Linux kernel, and is actively developed and enhanced." I need accurate and informative stack samples (no STG internal details in the output that I can't connect back to source locations) for programs that include all manner of FFI calls. Better still if time spent in the GC doesn't pollute my stack samples. The tricky part is that for flame graphs you need to sample stacks, and for that you need to teach perf how to collect that data somehow, since the C stack isn't used for haskell activation frames and we have a call-by-need evaluation strategy anyways. But the slow option you mention in the status page sounds okayish to me, and using eBPF to perform stack sampling entirely from the kernel looks like a promising direction.