
CCing,
* Alfredo Di Napoli for his on-going work in this area
* Shivansh Rai for his interest in contributing
* David Luposchainsky for his recent pretty-printer library
* Richard Eisenberg due to his participation in #8809
* Bartosz for his participation in #10735
* Alan Zimmerman for his interest in Haskell tooling
My apologies for the tome that follows. I have been thinking about this
problem recently and think an overview of where we stand would be helpful.
Siddharth Bhat
Hello all,
Thanks for all the work that's put into GHC :)
Thanks for your interest in helping!
I've tried to get into GHC development before, but I was unsuccessful, mostly because I didn't dedicate enough time to understanding the problem at hand & exploring the codebase.
I'd like to give it another shot. This time, I think I have a clear vision of what I want to help with: Have haskell's error messages be easier to read and understand.
1. Colors and layout to highlight important parts of the error messages
As I say below, I think #8809 will provide a good foundation for improvements here. More on this below.
2. Clear formatting & naming of errors, so they're easily googleable, stack-overflow able, etc.
Indeed this is a great goal. Do you have a list of error messages that you think are particularly egregious in this respect? Are you advocating that we give error classes unique identifiers (e.g. as rustc does IIRC) or are you merely suggesting that we improve the wording of the existing messages?
3. better hints with error messages, and perhaps integrated lints(?).
This sounds like a noble goal, but it's a bit unclear how you get there. We currently do try to give hints where possible, but of course we could always offer more. It would be helpful to have a set of concrete examples to discuss.
4. I don't know if this is already possible, but allowing GHC errors to be shipped off as JSON or something to interested tooling.
Indeed, this would be great. Thanks to Matthew Pickering we already offer some limited form of this in 8.2 [1], but I think having more structured error documents as suggested in #8809 would make this even nicer. [1] https://downloads.haskell.org/~ghc/master/users-guide//debugging.html?highli... The State of #8809 ==================
I saw this ticket on trac: https://ghc.haskell.org/trac/ghc/ticket/8809 I would like to take this up, but I'd like help / pointers and stuff. I have GHC setup, I know how to use phabricator, but.. where do I start? :)
This ticket has recently seen quite a bit of activity and I've been meaning to write down some thoughts on it. Here it goes: Currently Alfredo Di Napoli is working [2] on the `pretty` library to both improve performance and allow us to drop GHC's fork (see #10735), perhaps to use annotated pretty-printer documents. Meanwhile, David Luposchainsky, has recently released [3] his `prettyprinter` library which may serve as a drop-in replacement to `pretty` and handles all of the cases that Alfredo is working on. Moreover, Shivansh Rai has also recently expressed interest in helping out with this effort. All of this is great news: I have been hoping we'd get Idris-style errors for quite some time. However, given how many hands we have in this area, we should be careful not to step on each toes. Below I'll describe the various facets of the task as I see them. [2] https://github.com/haskell/pretty/pull/43 [3] https://www.reddit.com/r/haskell/comments/6e62i5/ann_prettyprinter_10_ending... # Choice of pretty printer It seems like we first need to resolve the question of whether switching from (our fork of) `pretty` to the `prettyprinter` library is worthwhile. The argument for moving to `prettyprinter` is its support for optimized infinite-band-width layout, which is one of the things holding us back from moving back to `pretty`. However, there are two impediments to switching, * `prettyprinter` depends upon the `text` package while GHC does not. Making GHC dependent on `text` is an option, but we should be careful. Adding a dependency has a non-trivial cost (GHC build times rise, GHC API users are stuck using whatever dependency versions GHC uses, release engineering is a bit more complicated). Currently GHC has its own abstractions for working with text efficiently, LitString and FastString. FastString is used throughout the compiler, including the pretty-printer, and represents a dense UTF-8 buffer (and a hash for quick comparison). It's not clear that we would want to move it to `text` as this would introduce UTF-8/UTF-16 conversion. * `prettyprinter` doesn't support rendering to `String`. This essentially means that we either use `Text` or fork the package. However, if we have already decided on depending on `text`, then perhaps the former isn't so bad. It's unclear to me exactly how difficult switching would be compared to finishing up the work Alfredo has started on `pretty`. Alfredo, what is your opinion? If we decide against moving to `prettyprinter`, then we will need to finish up something like Alfredo's `pretty` patches to rid GHC of its fork. # Representing rich error messages in GHC In my opinion we should avoid baking more stylistic decisions (e.g. printing types in red, terms in blue) into the modules like TcErrors which produce error messages. This is why I propose that we use annotated pretty-printer documents in #8809 (see comment 3). This would allow us to represent the typical things seen in GHC error messages (e.g. types, terms, spans, binders, etc.) in structured form, allowing the error message consumer (e.g. GHC itself, a GHC API user, or some JSON error sink) to make decisions about how to present these elements to the user. I think this approach give us a much better story for dealing with the problems currently solved by flags like -fprint-runtime-reps, -fprint-explicit-kinds, etc., especially for users using an IDE. As far as I can recall, there was still a bit of disagreement surrounding whether the values carried by the error message should be statically or dynamically typed. In particular, Richard Eisenberg advocated that error message documents look like, -- A dynamically typed value embedded in an error message data ErrItem = forall a. (Outputable a, Typeable a). ErrItem a type ErrDoc = Doc ErrItem Whereas I argue that this would quickly become unmaintainable, especially when one considers GHC API users. Rather, I say that we should encode the "vocabulary" of things that may appear in an error message explicitly, data ErrItem = ErrType Type | ErrSpan Span | ErrTerm HsExpr | ErrInstance ClsInst | ErrVar Var | ... While there are good arguments for both options, although I think that in balance an explicit approach will be better for consumers. Anyways, this is a question that will need to be answered. Once there is consensus I think it shouldn't be too difficult to move things forward. The change can be made incrementally and for the most part should only touch a few modules (with the bulk in TcErrors). ## What do we represent? There is also the question of what the vocabulary of embeddable items should consist of. I think the above are pretty non-controversial but I can think of a variety of items which would more precisely capture some common patterns, data ErrItem = ... | ErrExpectedActual Type Type -- ^ e.g. "Expected type: ty1, Actual type: ty2" | ErrContext Type -- ^ Like ErrType but specifically captures a context | ErrPotentialInstances [ClsInst] -- ^ A list of potentially matching instances | ... Exactly how far we want to go is something that would need to be decided. I think we would want to start with the minimal set initially proposed and then introduce additional items as we gain experience with the scheme. # Using rich error messages Once we have GHC producing rich error documents we can teach GHC's command line driver to prettify them. We can also teach haskell-mode, ghc-mod, and friends to preserve their structure to give the user an Idris-like experience. Exactly how many stylistic decisions we want GHC to make is a tricky question; this is prime territory for bike-shedding and people tend to have rather strong aesthetic beliefs; keeping things simple while satisfying all tastes may be a challenge. # Summary Above I discussed several tasks and a few questions, * We need to decide on whether David's `prettyprinter` library is right for GHC; having a prototype patch introducing it to the tree would help in evaluating this. Alfredo, what is your opinion here? * If not we need to drop our fork of `pretty` in favor of upstream * We need consensus on whether Idris-style annotated pretty-printer documents are the right approach for GHC (I think we are close to this) * If we want annotated documents, should the items be statically or dynamically typed? * Once these questions are resolved we can start introducing annotations into GHC's error documents (this shouldn't be hard) * Then we can teach GHC and associated tooling to pretty-print these rich messages prettily There is certainly a fair bit of work here although it's not obvious how to parallelize it across all of the interested parties. Regardless, I would be happy to advise on any bit of this. Cheers, - Ben