Interested to help with error messages

Hello all, Thanks for all the work that's put into GHC :) I've tried to get into GHC development before, but I was unsuccessful, mostly because I didn't dedicate enough time to understanding the problem at hand & exploring the codebase. I'd like to give it another shot. This time, I think I have a clear vision of what I want to help with: Have haskell's error messages be easier to read and understand. 1. Colors and layout to highlight important parts of the error messages 2. Clear formatting & naming of errors, so they're easily googleable, stack-overflow able, etc. 3. better hints with error messages, and perhaps integrated lints(?). 4. I don't know if this is already possible, but allowing GHC errors to be shipped off as JSON or something to interested tooling. I saw this ticket on trac: https://ghc.haskell.org/trac/ghc/ticket/8809 I would like to take this up, but I'd like help / pointers and stuff. I have GHC setup, I know how to use phabricator, but.. where do I start? :) Thanks, S~iddharth -- Sending this from my phone, please excuse any typos!

CCing,
* Alfredo Di Napoli for his on-going work in this area
* Shivansh Rai for his interest in contributing
* David Luposchainsky for his recent pretty-printer library
* Richard Eisenberg due to his participation in #8809
* Bartosz for his participation in #10735
* Alan Zimmerman for his interest in Haskell tooling
My apologies for the tome that follows. I have been thinking about this
problem recently and think an overview of where we stand would be helpful.
Siddharth Bhat
Hello all,
Thanks for all the work that's put into GHC :)
Thanks for your interest in helping!
I've tried to get into GHC development before, but I was unsuccessful, mostly because I didn't dedicate enough time to understanding the problem at hand & exploring the codebase.
I'd like to give it another shot. This time, I think I have a clear vision of what I want to help with: Have haskell's error messages be easier to read and understand.
1. Colors and layout to highlight important parts of the error messages
As I say below, I think #8809 will provide a good foundation for improvements here. More on this below.
2. Clear formatting & naming of errors, so they're easily googleable, stack-overflow able, etc.
Indeed this is a great goal. Do you have a list of error messages that you think are particularly egregious in this respect? Are you advocating that we give error classes unique identifiers (e.g. as rustc does IIRC) or are you merely suggesting that we improve the wording of the existing messages?
3. better hints with error messages, and perhaps integrated lints(?).
This sounds like a noble goal, but it's a bit unclear how you get there. We currently do try to give hints where possible, but of course we could always offer more. It would be helpful to have a set of concrete examples to discuss.
4. I don't know if this is already possible, but allowing GHC errors to be shipped off as JSON or something to interested tooling.
Indeed, this would be great. Thanks to Matthew Pickering we already offer some limited form of this in 8.2 [1], but I think having more structured error documents as suggested in #8809 would make this even nicer. [1] https://downloads.haskell.org/~ghc/master/users-guide//debugging.html?highli... The State of #8809 ==================
I saw this ticket on trac: https://ghc.haskell.org/trac/ghc/ticket/8809 I would like to take this up, but I'd like help / pointers and stuff. I have GHC setup, I know how to use phabricator, but.. where do I start? :)
This ticket has recently seen quite a bit of activity and I've been meaning to write down some thoughts on it. Here it goes: Currently Alfredo Di Napoli is working [2] on the `pretty` library to both improve performance and allow us to drop GHC's fork (see #10735), perhaps to use annotated pretty-printer documents. Meanwhile, David Luposchainsky, has recently released [3] his `prettyprinter` library which may serve as a drop-in replacement to `pretty` and handles all of the cases that Alfredo is working on. Moreover, Shivansh Rai has also recently expressed interest in helping out with this effort. All of this is great news: I have been hoping we'd get Idris-style errors for quite some time. However, given how many hands we have in this area, we should be careful not to step on each toes. Below I'll describe the various facets of the task as I see them. [2] https://github.com/haskell/pretty/pull/43 [3] https://www.reddit.com/r/haskell/comments/6e62i5/ann_prettyprinter_10_ending... # Choice of pretty printer It seems like we first need to resolve the question of whether switching from (our fork of) `pretty` to the `prettyprinter` library is worthwhile. The argument for moving to `prettyprinter` is its support for optimized infinite-band-width layout, which is one of the things holding us back from moving back to `pretty`. However, there are two impediments to switching, * `prettyprinter` depends upon the `text` package while GHC does not. Making GHC dependent on `text` is an option, but we should be careful. Adding a dependency has a non-trivial cost (GHC build times rise, GHC API users are stuck using whatever dependency versions GHC uses, release engineering is a bit more complicated). Currently GHC has its own abstractions for working with text efficiently, LitString and FastString. FastString is used throughout the compiler, including the pretty-printer, and represents a dense UTF-8 buffer (and a hash for quick comparison). It's not clear that we would want to move it to `text` as this would introduce UTF-8/UTF-16 conversion. * `prettyprinter` doesn't support rendering to `String`. This essentially means that we either use `Text` or fork the package. However, if we have already decided on depending on `text`, then perhaps the former isn't so bad. It's unclear to me exactly how difficult switching would be compared to finishing up the work Alfredo has started on `pretty`. Alfredo, what is your opinion? If we decide against moving to `prettyprinter`, then we will need to finish up something like Alfredo's `pretty` patches to rid GHC of its fork. # Representing rich error messages in GHC In my opinion we should avoid baking more stylistic decisions (e.g. printing types in red, terms in blue) into the modules like TcErrors which produce error messages. This is why I propose that we use annotated pretty-printer documents in #8809 (see comment 3). This would allow us to represent the typical things seen in GHC error messages (e.g. types, terms, spans, binders, etc.) in structured form, allowing the error message consumer (e.g. GHC itself, a GHC API user, or some JSON error sink) to make decisions about how to present these elements to the user. I think this approach give us a much better story for dealing with the problems currently solved by flags like -fprint-runtime-reps, -fprint-explicit-kinds, etc., especially for users using an IDE. As far as I can recall, there was still a bit of disagreement surrounding whether the values carried by the error message should be statically or dynamically typed. In particular, Richard Eisenberg advocated that error message documents look like, -- A dynamically typed value embedded in an error message data ErrItem = forall a. (Outputable a, Typeable a). ErrItem a type ErrDoc = Doc ErrItem Whereas I argue that this would quickly become unmaintainable, especially when one considers GHC API users. Rather, I say that we should encode the "vocabulary" of things that may appear in an error message explicitly, data ErrItem = ErrType Type | ErrSpan Span | ErrTerm HsExpr | ErrInstance ClsInst | ErrVar Var | ... While there are good arguments for both options, although I think that in balance an explicit approach will be better for consumers. Anyways, this is a question that will need to be answered. Once there is consensus I think it shouldn't be too difficult to move things forward. The change can be made incrementally and for the most part should only touch a few modules (with the bulk in TcErrors). ## What do we represent? There is also the question of what the vocabulary of embeddable items should consist of. I think the above are pretty non-controversial but I can think of a variety of items which would more precisely capture some common patterns, data ErrItem = ... | ErrExpectedActual Type Type -- ^ e.g. "Expected type: ty1, Actual type: ty2" | ErrContext Type -- ^ Like ErrType but specifically captures a context | ErrPotentialInstances [ClsInst] -- ^ A list of potentially matching instances | ... Exactly how far we want to go is something that would need to be decided. I think we would want to start with the minimal set initially proposed and then introduce additional items as we gain experience with the scheme. # Using rich error messages Once we have GHC producing rich error documents we can teach GHC's command line driver to prettify them. We can also teach haskell-mode, ghc-mod, and friends to preserve their structure to give the user an Idris-like experience. Exactly how many stylistic decisions we want GHC to make is a tricky question; this is prime territory for bike-shedding and people tend to have rather strong aesthetic beliefs; keeping things simple while satisfying all tastes may be a challenge. # Summary Above I discussed several tasks and a few questions, * We need to decide on whether David's `prettyprinter` library is right for GHC; having a prototype patch introducing it to the tree would help in evaluating this. Alfredo, what is your opinion here? * If not we need to drop our fork of `pretty` in favor of upstream * We need consensus on whether Idris-style annotated pretty-printer documents are the right approach for GHC (I think we are close to this) * If we want annotated documents, should the items be statically or dynamically typed? * Once these questions are resolved we can start introducing annotations into GHC's error documents (this shouldn't be hard) * Then we can teach GHC and associated tooling to pretty-print these rich messages prettily There is certainly a fair bit of work here although it's not obvious how to parallelize it across all of the interested parties. Regardless, I would be happy to advise on any bit of this. Cheers, - Ben

Thanks Ben, a great summary. Is there a Wiki page for this? It feels like
it should be on one, so we can easily comment/update the individual points.
In terms of the pretty-printer and its string type. Perhaps we could
backpackify it to use http://next.hackage.haskell.org:8080/package/str-sig,
and then specialise the GHC version to FastString etc.
Alan
On 3 June 2017 at 17:50, Ben Gamari
CCing, * Alfredo Di Napoli for his on-going work in this area * Shivansh Rai for his interest in contributing * David Luposchainsky for his recent pretty-printer library * Richard Eisenberg due to his participation in #8809 * Bartosz for his participation in #10735 * Alan Zimmerman for his interest in Haskell tooling
My apologies for the tome that follows. I have been thinking about this problem recently and think an overview of where we stand would be helpful.
Siddharth Bhat
writes: Hello all,
Thanks for all the work that's put into GHC :)
Thanks for your interest in helping!
I've tried to get into GHC development before, but I was unsuccessful, mostly because I didn't dedicate enough time to understanding the problem at hand & exploring the codebase.
I'd like to give it another shot. This time, I think I have a clear vision of what I want to help with: Have haskell's error messages be easier to read and understand.
1. Colors and layout to highlight important parts of the error messages
As I say below, I think #8809 will provide a good foundation for improvements here. More on this below.
2. Clear formatting & naming of errors, so they're easily googleable, stack-overflow able, etc.
Indeed this is a great goal. Do you have a list of error messages that you think are particularly egregious in this respect? Are you advocating that we give error classes unique identifiers (e.g. as rustc does IIRC) or are you merely suggesting that we improve the wording of the existing messages?
3. better hints with error messages, and perhaps integrated lints(?).
This sounds like a noble goal, but it's a bit unclear how you get there. We currently do try to give hints where possible, but of course we could always offer more. It would be helpful to have a set of concrete examples to discuss.
4. I don't know if this is already possible, but allowing GHC errors to be shipped off as JSON or something to interested tooling.
Indeed, this would be great. Thanks to Matthew Pickering we already offer some limited form of this in 8.2 [1], but I think having more structured error documents as suggested in #8809 would make this even nicer.
[1] https://downloads.haskell.org/~ghc/master/users-guide// debugging.html?highlight=json#ghc-flag--ddump-json
The State of #8809 ==================
I saw this ticket on trac: https://ghc.haskell.org/trac/ghc/ticket/8809 I would like to take this up, but I'd like help / pointers and stuff. I have GHC setup, I know how to use phabricator, but.. where do I start? :)
This ticket has recently seen quite a bit of activity and I've been meaning to write down some thoughts on it. Here it goes:
Currently Alfredo Di Napoli is working [2] on the `pretty` library to both improve performance and allow us to drop GHC's fork (see #10735), perhaps to use annotated pretty-printer documents. Meanwhile, David Luposchainsky, has recently released [3] his `prettyprinter` library which may serve as a drop-in replacement to `pretty` and handles all of the cases that Alfredo is working on. Moreover, Shivansh Rai has also recently expressed interest in helping out with this effort.
All of this is great news: I have been hoping we'd get Idris-style errors for quite some time. However, given how many hands we have in this area, we should be careful not to step on each toes. Below I'll describe the various facets of the task as I see them.
[2] https://github.com/haskell/pretty/pull/43 [3] https://www.reddit.com/r/haskell/comments/6e62i5/ann_ prettyprinter_10_ending_the_wadlerleijen_zoo/
# Choice of pretty printer
It seems like we first need to resolve the question of whether switching from (our fork of) `pretty` to the `prettyprinter` library is worthwhile. The argument for moving to `prettyprinter` is its support for optimized infinite-band-width layout, which is one of the things holding us back from moving back to `pretty`.
However, there are two impediments to switching,
* `prettyprinter` depends upon the `text` package while GHC does not. Making GHC dependent on `text` is an option, but we should be careful. Adding a dependency has a non-trivial cost (GHC build times rise, GHC API users are stuck using whatever dependency versions GHC uses, release engineering is a bit more complicated).
Currently GHC has its own abstractions for working with text efficiently, LitString and FastString. FastString is used throughout the compiler, including the pretty-printer, and represents a dense UTF-8 buffer (and a hash for quick comparison). It's not clear that we would want to move it to `text` as this would introduce UTF-8/UTF-16 conversion.
* `prettyprinter` doesn't support rendering to `String`. This essentially means that we either use `Text` or fork the package. However, if we have already decided on depending on `text`, then perhaps the former isn't so bad.
It's unclear to me exactly how difficult switching would be compared to finishing up the work Alfredo has started on `pretty`. Alfredo, what is your opinion?
If we decide against moving to `prettyprinter`, then we will need to finish up something like Alfredo's `pretty` patches to rid GHC of its fork.
# Representing rich error messages in GHC
In my opinion we should avoid baking more stylistic decisions (e.g. printing types in red, terms in blue) into the modules like TcErrors which produce error messages. This is why I propose that we use annotated pretty-printer documents in #8809 (see comment 3). This would allow us to represent the typical things seen in GHC error messages (e.g. types, terms, spans, binders, etc.) in structured form, allowing the error message consumer (e.g. GHC itself, a GHC API user, or some JSON error sink) to make decisions about how to present these elements to the user.
I think this approach give us a much better story for dealing with the problems currently solved by flags like -fprint-runtime-reps, -fprint-explicit-kinds, etc., especially for users using an IDE.
As far as I can recall, there was still a bit of disagreement surrounding whether the values carried by the error message should be statically or dynamically typed. In particular, Richard Eisenberg advocated that error message documents look like,
-- A dynamically typed value embedded in an error message data ErrItem = forall a. (Outputable a, Typeable a). ErrItem a
type ErrDoc = Doc ErrItem
Whereas I argue that this would quickly become unmaintainable, especially when one considers GHC API users. Rather, I say that we should encode the "vocabulary" of things that may appear in an error message explicitly,
data ErrItem = ErrType Type | ErrSpan Span | ErrTerm HsExpr | ErrInstance ClsInst | ErrVar Var | ...
While there are good arguments for both options, although I think that in balance an explicit approach will be better for consumers. Anyways, this is a question that will need to be answered.
Once there is consensus I think it shouldn't be too difficult to move things forward. The change can be made incrementally and for the most part should only touch a few modules (with the bulk in TcErrors).
## What do we represent?
There is also the question of what the vocabulary of embeddable items should consist of. I think the above are pretty non-controversial but I can think of a variety of items which would more precisely capture some common patterns,
data ErrItem = ... | ErrExpectedActual Type Type -- ^ e.g. "Expected type: ty1, Actual type: ty2" | ErrContext Type -- ^ Like ErrType but specifically captures a context | ErrPotentialInstances [ClsInst] -- ^ A list of potentially matching instances | ...
Exactly how far we want to go is something that would need to be decided. I think we would want to start with the minimal set initially proposed and then introduce additional items as we gain experience with the scheme.
# Using rich error messages
Once we have GHC producing rich error documents we can teach GHC's command line driver to prettify them. We can also teach haskell-mode, ghc-mod, and friends to preserve their structure to give the user an Idris-like experience.
Exactly how many stylistic decisions we want GHC to make is a tricky question; this is prime territory for bike-shedding and people tend to have rather strong aesthetic beliefs; keeping things simple while satisfying all tastes may be a challenge.
# Summary
Above I discussed several tasks and a few questions,
* We need to decide on whether David's `prettyprinter` library is right for GHC; having a prototype patch introducing it to the tree would help in evaluating this. Alfredo, what is your opinion here?
* If not we need to drop our fork of `pretty` in favor of upstream
* We need consensus on whether Idris-style annotated pretty-printer documents are the right approach for GHC (I think we are close to this)
* If we want annotated documents, should the items be statically or dynamically typed?
* Once these questions are resolved we can start introducing annotations into GHC's error documents (this shouldn't be hard)
* Then we can teach GHC and associated tooling to pretty-print these rich messages prettily
There is certainly a fair bit of work here although it's not obvious how to parallelize it across all of the interested parties. Regardless, I would be happy to advise on any bit of this.
Cheers,
- Ben

Alan & Kim Zimmerman
Thanks Ben, a great summary. Is there a Wiki page for this? It feels like it should be on one, so we can easily comment/update the individual points.
Here you are: https://ghc.haskell.org/trac/ghc/wiki/PrettyErrors In the interest of time I essentially just pasted my response. Feel free to hack it up to your heart's content.
In terms of the pretty-printer and its string type. Perhaps we could backpackify it to use http://next.hackage.haskell.org:8080/package/str-sig, and then specialise the GHC version to FastString etc.
Sounds plausible. Relatedly, I always found it a bit odd that we used FastString for Doc literals. Namely we have to pay the cost of hashing the string, despite the fact that we will never use the hash. It's likely a tiny effect (hashing is quite quick), but it still seems odd. Cheers,

Hi,
Thanks for all the work on error messages, this is important.
Whatever you do please remember that not only humans are recipients of
these messages. Recently GHC changed 'Warning' into 'warning' and even that
caused some issues:
https://github.com/haskell/haskell-mode/issues/1513
2017-06-03 9:22 GMT-07:00 Ben Gamari
Alan & Kim Zimmerman
writes: Thanks Ben, a great summary. Is there a Wiki page for this? It feels like it should be on one, so we can easily comment/update the individual points.
Here you are: https://ghc.haskell.org/trac/ghc/wiki/PrettyErrors
In the interest of time I essentially just pasted my response. Feel free to hack it up to your heart's content.
In terms of the pretty-printer and its string type. Perhaps we could backpackify it to use http://next.hackage.haskell. org:8080/package/str-sig, and then specialise the GHC version to FastString etc.
Sounds plausible.
Relatedly, I always found it a bit odd that we used FastString for Doc literals. Namely we have to pay the cost of hashing the string, despite the fact that we will never use the hash. It's likely a tiny effect (hashing is quite quick), but it still seems odd.
Cheers,
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

On Jun 3, 2017, at 11:50 AM, Ben Gamari
wrote: In particular, Richard Eisenberg advocated that error message documents look like,
-- A dynamically typed value embedded in an error message data ErrItem = forall a. (Outputable a, Typeable a). ErrItem a
type ErrDoc = Doc ErrItem
I retract this argument. Otherwise, I contribute only a hearty thanks to whoever is working on this, which likely has a larger impact on the adoption of Haskell than anything I've done. :) Richard

This is a bit off topic, but is there a collection of not-so-great
error messages along with opinions about what they should be? Like a
wiki page or something?
I just stumbled across one and was going to complain, but didn't know
the most productive way to do that, aside from try to fix it myself.
Specifically, I'm working in a small EDSL that uses lots of small
expressions composed together with a repurposed (.) operator. The
pretty printer layout causes the context section of each type error to
become huge, for instance:
Derive/Solkattu/MridangamScore.hs:110:60:
Couldn't match type ‘Derive.Solkattu.Sequence.Note
(Derive.Solkattu.Solkattu.Note
Derive.Solkattu.MridangamDsl.Stroke)’
with ‘[Derive.Solkattu.Sequence.Note
(Derive.Solkattu.Solkattu.Note
Derive.Solkattu.MridangamDsl.Stroke)]’
Expected type: [Sequence]
Actual type: [Derive.Solkattu.Sequence.Note
(Derive.Solkattu.Solkattu.Note
Derive.Solkattu.MridangamDsl.Stroke)]
In the second argument of ‘($)’, namely
‘nadai 7
$ repeat 2 kook
. od
. __
. k
. d
. __
. k
[ keeps walking to the right for many many lines ]
In the second argument of ‘($)’, namely
‘korvai adi
$ nadai 7
$ repeat 2 kook
. od
[ again ]
In the second argument of ‘($)’, namely
[ and again ]
You can probably easily reproduce this by making a lot of little
expressions and putting an error in someplace. What I think it should
be is put more things on a horizontal line, or maybe cap the
expression at a certain number of lines. Or maybe the "zooming out"
contexts could replace the previous expression with {- previous
expression -} to emphasize the difference.
Also, the other thing to think about before changing type errors is
that tons of ghc tests rely on (almost) exact error output. Probably
the first task is to make an abstract output, then change the tests to
rely on that (or at least a legacy formatting mode), so that every
change doesn't break a million tests.
On Sun, Jun 4, 2017 at 8:20 PM, Richard Eisenberg
On Jun 3, 2017, at 11:50 AM, Ben Gamari
wrote: In particular, Richard Eisenberg advocated that error message documents look like,
-- A dynamically typed value embedded in an error message data ErrItem = forall a. (Outputable a, Typeable a). ErrItem a
type ErrDoc = Doc ErrItem
I retract this argument.
Otherwise, I contribute only a hearty thanks to whoever is working on this, which likely has a larger impact on the adoption of Haskell than anything I've done. :)
Richard _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Elm has a interesting take on this: a collection of all error messages (
https://github.com/elm-lang/error-message-catalog).
I created an empty repo (
https://github.com/bollu/hask-error-messages-catalog) and shot an email to
haskell-cafe asking for examples where GHC generates unintuitive error
messages. Hopefully, I can start collecting these so there's context the
next time this comes up.
(PRs welcome :)
Thanks,
~Siddharth
On Mon 5 Jun, 2017, 22:56 Evan Laforge,
This is a bit off topic, but is there a collection of not-so-great error messages along with opinions about what they should be? Like a wiki page or something?
I just stumbled across one and was going to complain, but didn't know the most productive way to do that, aside from try to fix it myself.
Specifically, I'm working in a small EDSL that uses lots of small expressions composed together with a repurposed (.) operator. The pretty printer layout causes the context section of each type error to become huge, for instance:
Derive/Solkattu/MridangamScore.hs:110:60: Couldn't match type ‘Derive.Solkattu.Sequence.Note (Derive.Solkattu.Solkattu.Note Derive.Solkattu.MridangamDsl.Stroke)’ with ‘[Derive.Solkattu.Sequence.Note (Derive.Solkattu.Solkattu.Note Derive.Solkattu.MridangamDsl.Stroke)]’ Expected type: [Sequence] Actual type: [Derive.Solkattu.Sequence.Note (Derive.Solkattu.Solkattu.Note Derive.Solkattu.MridangamDsl.Stroke)] In the second argument of ‘($)’, namely ‘nadai 7 $ repeat 2 kook . od . __ . k . d . __ . k [ keeps walking to the right for many many lines ] In the second argument of ‘($)’, namely ‘korvai adi $ nadai 7 $ repeat 2 kook . od [ again ] In the second argument of ‘($)’, namely [ and again ]
You can probably easily reproduce this by making a lot of little expressions and putting an error in someplace. What I think it should be is put more things on a horizontal line, or maybe cap the expression at a certain number of lines. Or maybe the "zooming out" contexts could replace the previous expression with {- previous expression -} to emphasize the difference.
Also, the other thing to think about before changing type errors is that tons of ghc tests rely on (almost) exact error output. Probably the first task is to make an abstract output, then change the tests to rely on that (or at least a legacy formatting mode), so that every change doesn't break a million tests.
On Sun, Jun 4, 2017 at 8:20 PM, Richard Eisenberg
wrote: On Jun 3, 2017, at 11:50 AM, Ben Gamari
wrote: In particular, Richard Eisenberg advocated that error message documents look like,
-- A dynamically typed value embedded in an error message data ErrItem = forall a. (Outputable a, Typeable a). ErrItem a
type ErrDoc = Doc ErrItem
I retract this argument.
Otherwise, I contribute only a hearty thanks to whoever is working on
this, which likely has a larger impact on the adoption of Haskell than anything I've done. :)
Richard _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
-- Sending this from my phone, please excuse any typos!
participants (6)
-
Alan & Kim Zimmerman
-
Ben Gamari
-
Evan Laforge
-
Gracjan Polak
-
Richard Eisenberg
-
Siddharth Bhat