
"Boespflug, Mathieu"
However, I do see there might be room for a project on the statistical profiler itself or its associated tooling. We just need to come to a conclusion on which direction is most appropriate for GHC.
You mean - given a choice between somehow reusing perf or going down the route of fully custom tooling?
Right.
For this having some concrete use-cases would be quite helpful. How do you envision using statistical profiling on Haskell projects? What is the minimal set of features that would make for a useful profiler?
That sounds like a good way to approach this. Here goes...
I'd really prefer seeing a Haskell program as a black box, that I can profile using the same tools as C programs or native code generated from any other language. It shouldn't matter that the source is Haskell. In my ideal workflow, I have a *vanilla* Haskell program compiled with debug symbols by a *vanilla* GHC (no special ./configure options as prereqs), that I can hook up perf to, e.g.
$ perf record -g ./mybinary
Then I should be able to use perf report to analyze the results. Or indeed use existing pipelines to obtain other visualizations (flame graphs etc).
I'm not particularly interested in integration with the event log, though others might have a need for that.
I'm also interested in hotspot analysis, à la perf annotate.
As Brendan Gregg says, "perf isn't some random tool: it's part of the Linux kernel, and is actively developed and enhanced."
I need accurate and informative stack samples (no STG internal details in the output that I can't connect back to source locations) for programs that include all manner of FFI calls. Better still if time spent in the GC doesn't pollute my stack samples.
The tricky part is that for flame graphs you need to sample stacks, and for that you need to teach perf how to collect that data somehow, since the C stack isn't used for haskell activation frames and we have a call-by-need evaluation strategy anyways. But the slow option you mention in the status page sounds okayish to me, and using eBPF to perform stack sampling entirely from the kernel looks like a promising direction.
Indeed that is the engineering side of the trickiness. However, there is also a theoretically difficult aspect which Peter articulates nicely in Chapter 2 of his thesis. I refer you to the thesis for the full explanation but, in short, reasoning about causality in lazy languages is quite difficult (which is why this whole endeavour was worthy of a thesis). This leads to some amount of impedance mismatch with existing tooling, which is the reason I started down the road of a Haskell-centric solution. To elaborate, while in an imperative language it is relatively easy to say "instruction at address $A arose from line $N in $FILE.c", this sort of unambiguous statement is not possible in Haskell. Instead, we have to be satisfied with *sets* of source locations. That is, "instruction at address $A arose from line $N in $FILE, and line $N' in $FILE', and ...". Unfortunately, essentially none of the existing profiling and debugging infrastructure (DWARF and perf included) was designed with this model in mind. In particular, DWARF's line information encoding requires that GHC choose a single location to attribute the cost of each instruction to. This is, as you might expect, a rather tricky choice to make and while GHC has a few heuristics, we will inevitably be wrong in some circumstances. From experience, I can say that attempting to analyse profiles naively, using only GHC's heuristically-guided guess of the appropriate source location, can lead to rather perplexing results. For this reason, GHC uses [1] DWARF's extensibility mechanisms to export an additional set of line information which can be consumed by Haskell-aware tooling which captures the full richness of GHC's source ticks. In section 5.8 of his thesis Peter proposes a scheme of "fuzzing" for making use of this location information. I haven't tried Peter's fuzzing approach, but I suspect it will make the profiler output significantly easier to reason about. All this being said, I totally agree that being able to use widely-known, well-maintained native tools is a significant advantage. This is a large part of why I put the profiling project on hold. I have wondered whether we might be able to provide a preprocessor which would take sample data from GHC's statistical profiler and export it to perf's own format, performing fuzzing and any other necessary Haskell-specific preprocessing. This would at least allow us to tap in to the ecosystem that has arisen around perf. Alternatively, we could try to contribute patches to perf's upstream, although I'm not sure how likely acceptance would be. Cheers, - Ben [1] https://phabricator.haskell.org/D1279#R?query=D1279