Re: Where do I start if I would like help improve GHC compilation times?

18 Apr 2017

      Alfredo Di Napoli  writes:
...
Hey Simon,
thanks for chiming in. I had the same suspect as well, and in fact my first
temptation was to use dlists to avoid costly concatenation (I guess a
Builder shares pretty much the same idea, which is avoid right-appends). I
eventually bailed out as that Pretty module had some functions like sepX or
fillNBE (I might be wrong, going by memory only here) which needed to
*inspect* the current [Doc] we were carrying around, and thus dlists
couldn’t accomplish this in O(1), but forcing it back to a list was
necessary. Am I right in thinking that using a Builder here will suffer the
same malady? Ideally I would like constant time for both appends, left
concat & inspection (as in pattern-matching inspection) but I don’t think
what I’m asking exists, at least not in its functional declination anyway ;)
That’s why I was thinking to give Deques (or more generally, Sequences) a
go: they don’t have O(1) appends but at least can be inspected in O(1).
Question has to be answered whether this will yield improvement or not,
which I guess depends upon the ratio of inspection / appending we do in the
pretty printing. In that case using dlists or builders might be better.
Last but not least, although the current implementation doesn’t backtrack
anyway, I’m wondering wether switching to a different representation for a
Doc (namely a huge function composition of Token(s), as described in the
paper) could be beneficial as well.
Do you guys have any opinions? Yesterday I extracted Pretty.hs from the
sourcetree and I’m now planning to produce a criterion benchmark and
compare different implementations, althought it’s a bit hard to predict the
real world usage as I don’t have access to a representative Doc document as
produced by GHC, so my benchs could be all ill-founded.
Note that GHC's `Pretty` module is just a slightly modified version of
the `pretty` package. The differences are fairly minimal (although
important for performance):

 * It uses FastString in place of String, giving us fast `length` (in
   https://ghc.haskell.org/trac/ghc/ticket/8809#comment:60
   I propose that we extend `pretty` with a typeclass for string types)

 * GHC's variant still has a known stack overflow bug that was fixed
   upstream. Unfortunately, compiler performance regressed when we
   attempted to port upstream's patch (#10735)

Ideally we would fix these and just use the `pretty` library itself.

In addition to these issues, it would be quite helpful if `pretty`
gained a special-case for the infinite band-width case (which is what we
use in the code generator). The point here is that we should need to do
no layout computation in the infinite band case: merely place line
breaks precisely where the user asks. This should result in a noticeable
improvement in code generation performance (IIRC Bartosz noted rather
significant amounts of time spent pretty-printing assembler).

Cheers,

- Ben

Re: Where do I start if I would like help improve GHC compilation times?

Ben Gamari