Re: [ANNOUNCE] GHC 8.2.1 release candidate 1

12 Apr 2017

      "Boespflug, Mathieu"  writes:
...
...
However, I do see there might be room for a project on the statistical
profiler itself or its associated tooling. We just need to come to a
conclusion on which direction is most appropriate for GHC.
You mean - given a choice between somehow reusing perf or going down
the route of fully custom tooling?
Right.
...
...
For this having some concrete use-cases would be quite helpful. How do
you envision using statistical profiling on Haskell projects? What is
the minimal set of features that would make for a useful profiler?
That sounds like a good way to approach this. Here goes...
I'd really prefer seeing a Haskell program as a black box, that I can
profile using the same tools as C programs or native code generated
from any other language. It shouldn't matter that the source is
Haskell. In my ideal workflow, I have a *vanilla* Haskell program
compiled with debug symbols by a *vanilla* GHC (no special ./configure
options as prereqs), that I can hook up perf to, e.g.
$ perf record -g ./mybinary
Then I should be able to use perf report to analyze the results. Or
indeed use existing pipelines to obtain other visualizations (flame
graphs etc).
I'm not particularly interested in integration with the event log,
though others might have a need for that.
I'm also interested in hotspot analysis, à la perf annotate.
As Brendan Gregg says, "perf isn't some random tool: it's part of the
Linux kernel, and is actively developed and enhanced."
I need accurate and informative stack samples (no STG internal details
in the output that I can't connect back to source locations) for
programs that include all manner of FFI calls. Better still if time
spent in the GC doesn't pollute my stack samples.
The tricky part is that for flame graphs you need to sample stacks,
and for that you need to teach perf how to collect that data somehow,
since the C stack isn't used for haskell activation frames and we have
a call-by-need evaluation strategy anyways. But the slow option you
mention in the status page sounds okayish to me, and using eBPF to
perform stack sampling entirely from the kernel looks like a promising
direction.
Indeed that is the engineering side of the trickiness. However, there is
also a theoretically difficult aspect which Peter articulates nicely in
Chapter 2 of his thesis. I refer you to the thesis for the full
explanation but, in short, reasoning about causality in lazy languages
is quite difficult (which is why this whole endeavour was worthy of a
thesis). This leads to some amount of impedance mismatch with existing
tooling, which is the reason I started down the road of a
Haskell-centric solution.

To elaborate, while in an imperative language it is relatively easy to say
"instruction at address $A arose from line $N in $FILE.c", this sort of
unambiguous statement is not possible in Haskell. Instead, we have to
be satisfied with *sets* of source locations. That is,
"instruction at address $A arose from line $N in $FILE, and
line $N' in $FILE', and ...".

Unfortunately, essentially none of the existing profiling and debugging
infrastructure (DWARF and perf included) was designed with this model in
mind. In particular, DWARF's line information encoding requires that GHC
choose a single location to attribute the cost of each instruction to.
This is, as you might expect, a rather tricky choice to make and while
GHC has a few heuristics, we will inevitably be wrong in some
circumstances. From experience, I can say that attempting to analyse
profiles naively, using only GHC's heuristically-guided guess of the
appropriate source location, can lead to rather perplexing results.

For this reason, GHC uses [1] DWARF's extensibility mechanisms to export
an additional set of line information which can be consumed by
Haskell-aware tooling which captures the full richness of GHC's source
ticks. In section 5.8 of his thesis Peter proposes a scheme of "fuzzing"
for making use of this location information. I haven't tried Peter's
fuzzing approach, but I suspect it will make the profiler output
significantly easier to reason about.

All this being said, I totally agree that being able to use
widely-known, well-maintained native tools is a significant advantage.
This is a large part of why I put the profiling project on hold. I have
wondered whether we might be able to provide a preprocessor which would
take sample data from GHC's statistical profiler and export it to
perf's own format, performing fuzzing and any other necessary
Haskell-specific preprocessing. This would at least allow us to tap in
to the ecosystem that has arisen around perf. Alternatively, we could
try to contribute patches to perf's upstream, although I'm not sure how
likely acceptance would be.

Cheers,

- Ben

[1] https://phabricator.haskell.org/D1279#R?query=D1279

Re: [ANNOUNCE] GHC 8.2.1 release candidate 1

Ben Gamari