value of documenting error messages?

Richard Eisenberg

1 Jun 2021 1 Jun '21

9:34 p.m.

Hi devs, Take a quick look at https://gitlab.haskell.org/ghc/ghc/-/blob/6db8a0f76ec45d47060e28bb303e9eef60... https://gitlab.haskell.org/ghc/ghc/-/blob/6db8a0f76ec45d47060e28bb303e9eef60... You will see a datatype there with constructors describing error messages that GHC might produce. These constructors have comments describing the error, sometimes giving an example, and sometimes listing test cases. More datatypes like this one and more constructors in these datatypes are on the way. Question: Is there sufficient value in carefully documenting each constructor? In my ideal world, each constructor would have a high-level description, a detailed description of each field, an example of a program that generates the error, and one or more test cases that test the output. Along the way, we might discover that no such test case exists, and then we would add. However, generating this documentation is hard. I was thinking of whipping up an army of volunteers (Hécate has advised me how to do this) to do the work, but that army will need to be fed (with instructions, supervision, and reviews) and will want to know that their work is important. Is this effort worthwhile? Do we see ourselves maintaining this documentation? Or is the effort better spent elsewhere, perhaps tagging each constructor with an ID and then making wiki pages? Not sure what's best -- would love ideas! Credit to Alfredo di Napoli, who's done the heavy lifting of getting us this far. Relevant tickets: Original: https://gitlab.haskell.org/ghc/ghc/-/issues/18516 Tasks left: https://gitlab.haskell.org/ghc/ghc/-/issues/19905 Thanks, Richard

Attachments:

attachment.html (text/html — 2.4 KB)

Show replies by date

Alec Theriault

1 Jun 1 Jun

10:40 p.m.

Hello, If these are the messages that get pretty-printed into errors or warnings, I would think detailed documentation is definitely useful. However, since this is documentation that users of GHC will want to read (and not just contributors), I think it should live primarily in the user's guide and not the Haddocks. Rust has taken an interesting approach for this: every error message is given a unique number like "E0119" and there is an error index https://doc.rust-lang.org/error-index.html#E0119 generated from simple markdown files https://github.com/rust-lang/rust/tree/master/compiler/rustc_error_codes/src... containing explanations and examples for the errors (error codes by themselves already massively help searchability). If GHC were to take this approach, I think it would be fine to just include the error message identifier in the Haddocks. Alec PS: Rust even bundles the documentation for errors into the compiler, so you can do something like `rustc --explain E0119` to get the full description of the error. It'd be pretty neat if GHC could do this too. Some errors don't have much to say about them, but others definitely could be explained! On Tue, Jun 1, 2021 at 2:36 PM Richard Eisenberg wrote:

...

Hi devs,

Take a quick look at https://gitlab.haskell.org/ghc/ghc/-/blob/6db8a0f76ec45d47060e28bb303e9eef60... You will see a datatype there with constructors describing error messages that GHC might produce. These constructors have comments describing the error, sometimes giving an example, and sometimes listing test cases. More datatypes like this one and more constructors in these datatypes are on the way.

Question: Is there sufficient value in carefully documenting each constructor?

In my ideal world, each constructor would have a high-level description, a detailed description of each field, an example of a program that generates the error, and one or more test cases that test the output. Along the way, we might discover that no such test case exists, and then we would add. However, generating this documentation is hard. I was thinking of whipping up an army of volunteers (Hécate has advised me how to do this) to do the work, but that army will need to be fed (with instructions, supervision, and reviews) and will want to know that their work is important. Is this effort worthwhile? Do we see ourselves maintaining this documentation? Or is the effort better spent elsewhere, perhaps tagging each constructor with an ID and then making wiki pages? Not sure what's best -- would love ideas!

Credit to Alfredo di Napoli, who's done the heavy lifting of getting us this far.

Relevant tickets: Original: https://gitlab.haskell.org/ghc/ghc/-/issues/18516 Tasks left: https://gitlab.haskell.org/ghc/ghc/-/issues/19905

Thanks, Richard _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Simon Peyton Jones

2 Jun 2 Jun

10:25 a.m.

Rust has taken an interesting approach for this: every error message is given a unique number like "E0119" and there is an error indexhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdoc.rust-lang.org%2Ferror-index.html%23E0119&data=04%7C01%7Csimonpj%40microsoft.com%7C15e3afdfef1346a3e27a08d9254e7c63%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637581843865456276%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=fNrQBjAVF2CrG5tCZYTGg0DVIOxRszYNp49ixg%2F4%2FX0%3D&reserved=0 generated from simple markdown fileshttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Frust-lang%2Frust%2Ftree%2Fmaster%2Fcompiler%2Frustc_error_codes%2Fsrc%2Ferror_codes&data=04%7C01%7Csimonpj%40microsoft.com%7C15e3afdfef1346a3e27a08d9254e7c63%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637581843865466273%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=m6m4y%2BbabeAbLf0X7l%2FYGRo7qvqYQu1W0onZ8k7uBYI%3D&reserved=0 containing explanations and examples for the errors (error codes by themselves already massively help searchability). If GHC were to take this approach, I think it would be fine to just include the error message identifier in the Haddocks. I think this is a great idea, including that of giving unique numbers. We should be aware that there are two client groups: 1. Users, for whom the error index above is ideal 2. Clients of the GHC API (e.g. authors of an IDE) who are consuming the data type itself, and need to know what the various fields mean. For (A) the Rust approach seems terrific. For (B) adding Haddocks as in the example Richard gave seems better. But it should not repeat (A); rather it should assume you are also looking at (A) for that error number, and add any implementation specific info, like what the fields mean, and what the test cases are. Simon From: ghc-devs On Behalf Of Alec Theriault Sent: 01 June 2021 23:41 To: Richard Eisenberg Cc: GHC developers Subject: Re: value of documenting error messages? Hello, If these are the messages that get pretty-printed into errors or warnings, I would think detailed documentation is definitely useful. However, since this is documentation that users of GHC will want to read (and not just contributors), I think it should live primarily in the user's guide and not the Haddocks. Rust has taken an interesting approach for this: every error message is given a unique number like "E0119" and there is an error indexhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdoc.rust-lang.org%2Ferror-index.html%23E0119&data=04%7C01%7Csimonpj%40microsoft.com%7C15e3afdfef1346a3e27a08d9254e7c63%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637581843865456276%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=fNrQBjAVF2CrG5tCZYTGg0DVIOxRszYNp49ixg%2F4%2FX0%3D&reserved=0 generated from simple markdown fileshttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Frust-lang%2Frust%2Ftree%2Fmaster%2Fcompiler%2Frustc_error_codes%2Fsrc%2Ferror_codes&data=04%7C01%7Csimonpj%40microsoft.com%7C15e3afdfef1346a3e27a08d9254e7c63%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637581843865466273%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=m6m4y%2BbabeAbLf0X7l%2FYGRo7qvqYQu1W0onZ8k7uBYI%3D&reserved=0 containing explanations and examples for the errors (error codes by themselves already massively help searchability). If GHC were to take this approach, I think it would be fine to just include the error message identifier in the Haddocks. Alec PS: Rust even bundles the documentation for errors into the compiler, so you can do something like `rustc --explain E0119` to get the full description of the error. It'd be pretty neat if GHC could do this too. Some errors don't have much to say about them, but others definitely could be explained! On Tue, Jun 1, 2021 at 2:36 PM Richard Eisenberg mailto:rae@richarde.dev> wrote: Hi devs, Take a quick look at https://gitlab.haskell.org/ghc/ghc/-/blob/6db8a0f76ec45d47060e28bb303e9eef60...https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.haskell.org%2Fghc%2Fghc%2F-%2Fblob%2F6db8a0f76ec45d47060e28bb303e9eef60bdb16b%2Fcompiler%2FGHC%2FDriver%2FErrors%2FTypes.hs%23L107&data=04%7C01%7Csimonpj%40microsoft.com%7C15e3afdfef1346a3e27a08d9254e7c63%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637581843865466273%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=kXIh0I3JZObTbB8Huki5EHFaPGHNcjcYBKxpBpfeqNM%3D&reserved=0 You will see a datatype there with constructors describing error messages that GHC might produce. These constructors have comments describing the error, sometimes giving an example, and sometimes listing test cases. More datatypes like this one and more constructors in these datatypes are on the way. Question: Is there sufficient value in carefully documenting each constructor? In my ideal world, each constructor would have a high-level description, a detailed description of each field, an example of a program that generates the error, and one or more test cases that test the output. Along the way, we might discover that no such test case exists, and then we would add. However, generating this documentation is hard. I was thinking of whipping up an army of volunteers (Hécate has advised me how to do this) to do the work, but that army will need to be fed (with instructions, supervision, and reviews) and will want to know that their work is important. Is this effort worthwhile? Do we see ourselves maintaining this documentation? Or is the effort better spent elsewhere, perhaps tagging each constructor with an ID and then making wiki pages? Not sure what's best -- would love ideas! Credit to Alfredo di Napoli, who's done the heavy lifting of getting us this far. Relevant tickets: Original: https://gitlab.haskell.org/ghc/ghc/-/issues/18516 https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.haskell.org%2Fghc%2Fghc%2F-%2Fissues%2F18516&data=04%7C01%7Csimonpj%40microsoft.com%7C15e3afdfef1346a3e27a08d9254e7c63%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637581843865476267%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=xy2lUIjOwAtV%2BjZajPytLyyiy3f94xIulkTt8tHAF5g%3D&reserved=0 Tasks left: https://gitlab.haskell.org/ghc/ghc/-/issues/19905 https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.haskell.org%2Fghc%2Fghc%2F-%2Fissues%2F19905&data=04%7C01%7Csimonpj%40microsoft.com%7C15e3afdfef1346a3e27a08d9254e7c63%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637581843865476267%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=mJHN1JSv5lvCdByhLpwzGwgnGOCKGY7Oej2UbYvFttI%3D&reserved=0 Thanks, Richard _______________________________________________ ghc-devs mailing list ghc-devs@haskell.orgmailto:ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmail.haskell.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fghc-devs&data=04%7C01%7Csimonpj%40microsoft.com%7C15e3afdfef1346a3e27a08d9254e7c63%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637581843865486259%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=3dZA9c6ZPotW0A89KIvEVyuXkB0tfRJVU6uUGVgZwik%3D&reserved=0

Tom Ellis

10:46 a.m.

On Tue, Jun 01, 2021 at 03:40:57PM -0700, Alec Theriault wrote:

...

Rust has taken an interesting approach for this: every error message is given a unique number like "E0119"

Is there a particularly strong reason to use numbers as codes when we have the entire space human-readable strings available to us? Even the subset of case-insensitive strings formed from alphanumeric characters plus underscore seems more suitable for the encoding than positive integers. e.g. "conflicting_trait_implementations" seems better than "E0119" Tom

Ruben Astudillo

3:22 p.m.

I am no GHC developer, so this is not my place to reply. Even though I humbly would like to put an argument in favor of numbers. On 02-06-21 06:46, Tom Ellis wrote:

...

On Tue, Jun 01, 2021 at 03:40:57PM -0700, Alec Theriault wrote:

...
Rust has taken an interesting approach for this: every error message is given a unique number like "E0119"

Is there a particularly strong reason to use numbers as codes when we have the entire space human-readable strings available to us? Even the subset of case-insensitive strings formed from alphanumeric characters plus underscore seems more suitable for the encoding than positive integers.

e.g. "conflicting_trait_implementations" seems better than "E0119"

One is SEO-optimization. A number like #0119 on a search string like "ghc error #0119" ought to have as a first result the GHC user docs. This is a great user experience for students. A more general search string can have more results on other languages and is difficult to say we would be first result. Second one is that a number is shorter than a general string. That way we can highlight it on a error message on the terminal without occupying to much space. Current messages in GHC are already too big. -- -- Rubén -- pgp: 4EE9 28F7 932E F4AD

Carter Schonwald

4:12 p.m.

And just generally having a short code and descriptive name both, allows useful toooling and human communication. If we want to be careful / hedge against errors/ warnings being slightly different over time, these descriptive names / error codes should also be documented with respect to the ghc version being used. For example, I remember in ghc 8.2 or so for example that for certain type family uses that were actually fine that ghc would warn that allow ambiguous types. Richard may recall this better than I. The important piece is that in at least some cases, the full meaning and interpretation of various warnings has definitely changed over ghc versions as various analyses get fancier or simpler or bug fixed. So in some respects, at least historically: for sufficiently fancy code, the context of meaning for a given error code / message will only be unambiguous if we interpret it knowing the specific ghc version. I presume this will still be true? Should we always talk about error code ghc version pairs rather than error codes? If so should the error rendering be like ghc9_4_1:E2433 as a sortah URI ? On Wed, Jun 2, 2021 at 11:24 AM Ruben Astudillo wrote:

...

I am no GHC developer, so this is not my place to reply. Even though I humbly would like to put an argument in favor of numbers.

On 02-06-21 06:46, Tom Ellis wrote:

...
On Tue, Jun 01, 2021 at 03:40:57PM -0700, Alec Theriault wrote:

...
Rust has taken an interesting approach for this: every error message is given a unique number like "E0119"

Is there a particularly strong reason to use numbers as codes when we have the entire space human-readable strings available to us? Even the subset of case-insensitive strings formed from alphanumeric characters plus underscore seems more suitable for the encoding than positive integers.

e.g. "conflicting_trait_implementations" seems better than "E0119"

One is SEO-optimization. A number like #0119 on a search string like "ghc error #0119" ought to have as a first result the GHC user docs. This is a great user experience for students. A more general search string can have more results on other languages and is difficult to say we would be first result.

Second one is that a number is shorter than a general string. That way we can highlight it on a error message on the terminal without occupying to much space. Current messages in GHC are already too big.

-- -- Rubén -- pgp: 4EE9 28F7 932E F4AD _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Mario Carneiro

4:25 p.m.

Rust error codes are not sequential, presumably because some old errors are no longer applicable and new errors get new numbers. It seems to me that it should be possible to just allocate numbers so that if the error changes more than cosmetically then it gets a new number, and thus the error code alone should be sufficient. If a new GHC version changes the meaning of an error message, it should drop the old error code and allocate a new one, so as not to confuse searchers. On Wed, Jun 2, 2021 at 12:13 PM Carter Schonwald wrote:

...

And just generally having a short code and descriptive name both, allows useful toooling and human communication.

If we want to be careful / hedge against errors/ warnings being slightly different over time, these descriptive names / error codes should also be documented with respect to the ghc version being used.

For example, I remember in ghc 8.2 or so for example that for certain type family uses that were actually fine that ghc would warn that allow ambiguous types. Richard may recall this better than I. The important piece is that in at least some cases, the full meaning and interpretation of various warnings has definitely changed over ghc versions as various analyses get fancier or simpler or bug fixed.

So in some respects, at least historically: for sufficiently fancy code, the context of meaning for a given error code / message will only be unambiguous if we interpret it knowing the specific ghc version.

I presume this will still be true? Should we always talk about error code ghc version pairs rather than error codes? If so should the error rendering be like ghc9_4_1:E2433 as a sortah URI ?

On Wed, Jun 2, 2021 at 11:24 AM Ruben Astudillo wrote:

...
I am no GHC developer, so this is not my place to reply. Even though I humbly would like to put an argument in favor of numbers.

On 02-06-21 06:46, Tom Ellis wrote:

...
On Tue, Jun 01, 2021 at 03:40:57PM -0700, Alec Theriault wrote:

...
Rust has taken an interesting approach for this: every error message is given a unique number like "E0119"

Is there a particularly strong reason to use numbers as codes when we have the entire space human-readable strings available to us? Even the subset of case-insensitive strings formed from alphanumeric characters plus underscore seems more suitable for the encoding than positive integers.

e.g. "conflicting_trait_implementations" seems better than "E0119"

One is SEO-optimization. A number like #0119 on a search string like "ghc error #0119" ought to have as a first result the GHC user docs. This is a great user experience for students. A more general search string can have more results on other languages and is difficult to say we would be first result.

Second one is that a number is shorter than a general string. That way we can highlight it on a error message on the terminal without occupying to much space. Current messages in GHC are already too big.

-- -- Rubén -- pgp: 4EE9 28F7 932E F4AD _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Tom Ellis

5:49 p.m.

On Wed, Jun 02, 2021 at 11:22:47AM -0400, Ruben Astudillo wrote:

...

I am no GHC developer, so this is not my place to reply. Even though I humbly would like to put an argument in favor of numbers.

The issue of error codes impinges more on GHC users than GHC developers (although it's also a bit tangential to Richard's original post).

...

On 02-06-21 06:46, Tom Ellis wrote:

...
On Tue, Jun 01, 2021 at 03:40:57PM -0700, Alec Theriault wrote:

...
Rust has taken an interesting approach for this: every error message is given a unique number like "E0119"

Is there a particularly strong reason to use numbers as codes when we have the entire space human-readable strings available to us? Even the subset of case-insensitive strings formed from alphanumeric characters plus underscore seems more suitable for the encoding than positive integers.

e.g. "conflicting_trait_implementations" seems better than "E0119"

One is SEO-optimization. A number like #0119 on a search string like "ghc error #0119" ought to have as a first result the GHC user docs. This is a great user experience for students. A more general search string can have more results on other languages and is difficult to say we would be first result.

Second one is that a number is shorter than a general string. That way we can highlight it on a error message on the terminal without occupying to much space. Current messages in GHC are already too big.

I'm surprised by the responses to the idea of descriptive error codes (not just Ruben's response). "ghc error #0119" seems like no better a search string than "ghc error conflicting_trait_implementations" (and I can concoct reasons why it would be worse). Non-descriptive error codes risk being buried in results about food additives[1] amongst myriad other things. If we really think that non-descriptive codes are the clearest way to communicate between machines and humans then I wonder if we should rename `mapAccumL` to `F392` and `TypeFamilies` to `X56`. Tom [1] https://en.wikipedia.org/wiki/E_number#Full_list

Richard Eisenberg

6:13 p.m.

I'm in favor of short, undescriptive, quite-possibly numeric error codes. Advantages: * Easy to sequentialize. We might have, for example, a "conflicting_trait_implementations" from this year, move on from that design, and then accidentally reintroduce it in a decade, to confusion. Along similar lines, it is easy to write in a comment somewhere what the next error code should be, without having to search the codebase for a use. * Easy to make compositional. We can choose to have all GHC error codes begin with G (for GHC). Then Cabal could use C, Haddock could use H, and Stack could use S. This makes it easy for users to tell (once they've learned the scheme) where an error has come from. * Short. * No chance for misspellings during transcription. When I'm copying a terse identifier, I know I have to get every glyph correct. If I remember that the error code is "bad_generalizing", I might not know how "generalizing" is spelled. I might also forget whether it's "generalizing" or "generalization". And I could very easily see myself making either both of these mistakes as I'm switching from one window to another, in under a second. Disadvantages: * The code does not impart semantic meaning. But I argue this is not so bad, as even a more descriptive code does not impart a precise enough semantic meaning to be helpful.

...

On Jun 2, 2021, at 1:49 PM, Tom Ellis wrote:

If we really think that non-descriptive codes are the clearest way to communicate between machines and humans then I wonder if we should rename `mapAccumL` to `F392` and `TypeFamilies` to `X56`.

I think this is a false equivalence. The error codes are meant to be looked up when you see them on your screen, not remembered and then produced at will. ------- Surfacing up a few levels: it sounds like a good next step is not to get all these constructors finely documented, but instead to come up with a way to connect these constructors to user-facing documentation. This might be done by slurping out Markdown from the GHC source code, or perhaps something better. It would be a shame to invest a lot of effort in documenting the constructors in a way that serves only GHC developers, not end users, when we can perhaps do both at the same time. Thanks, Richard

Tom Ellis

6:48 p.m.

On Wed, Jun 02, 2021 at 06:13:17PM +0000, Richard Eisenberg wrote:

...

I'm in favor of short, undescriptive, quite-possibly numeric error codes.

These responses are so completely opposite to what I expected that I can't help thinking I've made a fundamental error in my understanding of what we're trying to achieve! Since no one has suggested any support for the idea of descriptive error codes I'm pressing on mostly in the hope that someone will be able to see from where my misunderstanding arises and set me straight. Before I continue, I'd like to suggest that this is very much a user-facing issue and I would be strongly in favour of actually asking users about what they prefer (and allowing them to discuss for a while) rather than taking a straw poll amongst GHC developers. To that end, would it be inappropriate of me to link this discussion to Haskell Reddit and/or Haskell Discourse?

...

Advantages:

...

Easy to sequentialize. We might have, for example, a "conflicting_trait_implementations" from this year, move on from that design, and then accidentally reintroduce it in a decade, to confusion. Along similar lines, it is easy to write in a comment somewherewhat the next error code should be, without having to search the codebase for a use.

I don't understand at all why it's valuable to sequentialize. Is the relative ordering of error codes meaningful in some way? I don't see why deprecating an error code and reintroducing it is a problem any more than doing the same to a function or GHC extension. If we are *really* desperate to disambiguate then conflicting_trait_implementations_2021 still seems better to me than E195.

...

Easy to make compositional. We can choose to have all GHC error codes begin with G (for GHC). Then Cabal could use C, Haddock could use H, and Stack could use S. This makes it easy for users to tell (once they've learned the scheme) where an error has come from.

Surely the same holds for descriptive error codes. One could have G_conflicting_trait_implementations, H_malformatted_section_header, ...

...

Short.

Again I must be misunderstanding. Why is brevity valuable? Aren't we expecting users to read these things and look them up? Copy/paste is free.

...

No chance for misspellings during transcription. When I'm copying a terse identifier, I know I have to get every glyph correct. If I remember that the error code is "bad_generalizing", I might not know how "generalizing" is spelled. I might also forget whether it's "generalizing" or "generalization". And I could very easily see myself making either both of these mistakes as I'm switching from one window to another, in under a second.

Surely it's just as easy to mistype E159 as E195 as it is to misspell "generalise". As above, copy/paste is free and if we *really* want to be helpful then instead of naked error codes we should give URLs whch directly link to sections in the GHC users guide (or other appropriate resource).

...

Disadvantages:

...

The code does not impart semantic meaning. But I argue this is not so bad, as even a more descriptive code does not impart a precise enough semantic meaning to be helpful.

I challenge you to name your next GHC extension X25!

...

...
On Jun 2, 2021, at 1:49 PM, Tom Ellis wrote:

If we really think that non-descriptive codes are the clearest way to communicate between machines and humans then I wonder if we should rename `mapAccumL` to `F392` and `TypeFamilies` to `X56`.

I think this is a false equivalence. The error codes are meant to be looked up when you see them on your screen, not remembered and then produced at will.

Possibly ... possibly not. "Hey Anna, what should I do about E159?" "Hey Anna, what should I do about conflicting_trait_implementations?" Which would I prefer to shout to my colleague across the room? To me this seems like a rare opportunity to do something where people will say "Hey look, that formidable Haskell compiler is doing something that's friendlier than the equivalent in any other compiler!". For such an important user-facing feature I don't understand why we're not asking users what they prefer. Where could I be going wrong in my understanding? Tom

Richard Eisenberg

7:03 p.m.

...

On Jun 2, 2021, at 2:48 PM, Tom Ellis wrote:

On Wed, Jun 02, 2021 at 06:13:17PM +0000, Richard Eisenberg wrote:

...
I'm in favor of short, undescriptive, quite-possibly numeric error codes.

These responses are so completely opposite to what I expected that I can't help thinking I've made a fundamental error in my understanding of what we're trying to achieve! Since no one has suggested any support for the idea of descriptive error codes I'm pressing on mostly in the hope that someone will be able to see from where my misunderstanding arises and set me straight.

I see no place where our understandings have diverged, just our opinions. But I may be missing something, too, of course! (For the record, I don't see your suggestion as unreasonable; I just think it's inferior to terse non-descriptive identifiers.)

...

Before I continue, I'd like to suggest that this is very much a user-facing issue and I would be strongly in favour of actually asking users about what they prefer (and allowing them to discuss for a while) rather than taking a straw poll amongst GHC developers.

To that end, would it be inappropriate of me to link this discussion to Haskell Reddit and/or Haskell Discourse?

Not this discussion, but I think a discussion there is a good idea. This thread started as a question about documenting constructors in the GHC source code, and it has (rightly!) moved to be about documenting error messages more generally. I (myopically) had not connected these two, and I'm glad for the direction this has taken. But I think the user-facing discussion should have a different starting point than this thread. I don't think I currently have the bandwidth for that discussion, but if no one else starts it, I will before too much longer.

...

I don't understand at all why it's valuable to sequentialize. Is the relative ordering of error codes meaningful in some way?

No. Sequentialization is good because it allows for the production of a new, unique member of the class, with a minimum of storage requirements (that is, you just store the greatest such member, and you know the next one up is unique).

...

I don't see why deprecating an error code and reintroducing it is a problem any more than doing the same to a function or GHC extension.

It is definitely worse than a function, because functions are associated with a particular version. If GHC 9 and GHC 13 have a function of the same name but different meanings, I don't see how that causes trouble. Extensions and error codes, on the other hand, are more troublesome, because they get documented and discussed widely online, and web pages live forever. And it is currently sometimes problematic when extensions change meaning over time, leading to conversations I've seen about adding version numbers to extensions. I don't think we've had an extension disappear and then reappear, because removing an extension is very, very hard. Error messages, on the hand, will be much more fluid.

...

...
Easy to make compositional. We can choose to have all GHC error codes begin with G (for GHC). Then Cabal could use C, Haddock could use H, and Stack could use S. This makes it easy for users to tell (once they've learned the scheme) where an error has come from.

Surely the same holds for descriptive error codes. One could have G_conflicting_trait_implementations, H_malformatted_section_header, ...

Yes, I thought you might say that. But now these are mixed, with an opaque component and a more descriptive one. Better would be ghc_conflicting_trait_implementations, but that's even longer!

...

...
Short.

Again I must be misunderstanding. Why is brevity valuable? Aren't we expecting users to read these things and look them up? Copy/paste is free.

Short things are easier to format? Yes, I agree that brevity is harder to motivate here. Yet I also think that, say, pasting the entire error message text would be wrong, too. So why do we want abbreviations at all? I think: it's to be sure we're looking up what we intend to look up, something served nicely by guaranteed uniqueness.

...

...
No chance for misspellings during transcription. When I'm copying a terse identifier, I know I have to get every glyph correct. If I remember that the error code is "bad_generalizing", I might not know how "generalizing" is spelled. I might also forget whether it's "generalizing" or "generalization". And I could very easily see myself making either both of these mistakes as I'm switching from one window to another, in under a second.

Surely it's just as easy to mistype E159 as E195 as it is to misspell "generalise". As above, copy/paste is free and if we *really* want to be helpful then instead of naked error codes we should give URLs whch directly link to sections in the GHC users guide (or other appropriate resource).

I'd be very happy with URLs.

...

...
Disadvantages:

...
The code does not impart semantic meaning. But I argue this is not so bad, as even a more descriptive code does not impart a precise enough semantic meaning to be helpful.

I challenge you to name your next GHC extension X25!

I *am* waiting for the day when I can figure out what -XKCD does.

...

Possibly ... possibly not.

"Hey Anna, what should I do about E159?"

"Hey Anna, what should I do about conflicting_trait_implementations?"

Which would I prefer to shout to my colleague across the room?

It depends on the colleague. There's a chance she knows about E159, and then the first works fine. There's a chance she doesn't know that conflicting_trait_implementations is an error code, and then she goes on a long lecture about conflicting trait implementations (but not about your error); then the second one fails.

...

To me this seems like a rare opportunity to do something where people will say "Hey look, that formidable Haskell compiler is doing something that's friendlier than the equivalent in any other compiler!". For such an important user-facing feature I don't understand why we're not asking users what they prefer.

I agree completely here! Let's ask! (Remember that this thread, posted to ghc-devs, was originally about documenting the GHC source code, something that would not affect users.) Richard

Tom Ellis

7:10 p.m.

On Wed, Jun 02, 2021 at 07:03:25PM +0000, Richard Eisenberg wrote:

...

...
To me this seems like a rare opportunity to do something where people will say "Hey look, that formidable Haskell compiler is doing something that's friendlier than the equivalent in any other compiler!". For such an important user-facing feature I don't understand why we're not asking users what they prefer.

I agree completely here! Let's ask! (Remember that this thread, posted to ghc-devs, was originally about documenting the GHC source code, something that would not affect users.)

Yes indeed. Let's one of us start a user-focused thread elsewhere (whoever gets round to it first) and post a link here so interested parties here can join in. Tom

Jakob Brünker

7:12 p.m.

For what it's worth, there is an existing proposal about this topic, maybe that's the right place to discuss it for a user-focused perspective. See https://github.com/ghc-proposals/ghc-proposals/pull/325 Jakob On Wed, Jun 2, 2021 at 9:10 PM Tom Ellis < tom-lists-haskell-cafe-2017@jaguarpaw.co.uk> wrote:

...

On Wed, Jun 02, 2021 at 07:03:25PM +0000, Richard Eisenberg wrote:

...
...
To me this seems like a rare opportunity to do something where people will say "Hey look, that formidable Haskell compiler is doing something that's friendlier than the equivalent in any other compiler!". For such an important user-facing feature I don't understand why we're not asking users what they prefer.

I agree completely here! Let's ask! (Remember that this thread, posted to ghc-devs, was originally about documenting the GHC source code, something that would not affect users.)

Yes indeed. Let's one of us start a user-focused thread elsewhere (whoever gets round to it first) and post a link here so interested parties here can join in.

Tom _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Alan & Kim Zimmerman

3 Jun 3 Jun

6:50 p.m.

I think in practical terms for IDE-based people, a short standardised alphanumeric identifier makes sense. These typically get displayed along with the full error text in the error pane, and it helps to be able to allocate a known, standard amount of real estate to them. Fundamentally they are just an index into something else, you will either copy/paste it, or click on it. Alan On Wed, 2 Jun 2021 at 20:16, Jakob Brünker wrote:

...

For what it's worth, there is an existing proposal about this topic, maybe that's the right place to discuss it for a user-focused perspective.

See https://github.com/ghc-proposals/ghc-proposals/pull/325

Jakob

On Wed, Jun 2, 2021 at 9:10 PM Tom Ellis < tom-lists-haskell-cafe-2017@jaguarpaw.co.uk> wrote:

...
On Wed, Jun 02, 2021 at 07:03:25PM +0000, Richard Eisenberg wrote:

...
...
To me this seems like a rare opportunity to do something where people will say "Hey look, that formidable Haskell compiler is doing something that's friendlier than the equivalent in any other compiler!". For such an important user-facing feature I don't understand why we're not asking users what they prefer.

I agree completely here! Let's ask! (Remember that this thread, posted to ghc-devs, was originally about documenting the GHC source code, something that would not affect users.)

Yes indeed. Let's one of us start a user-focused thread elsewhere (whoever gets round to it first) and post a link here so interested parties here can join in.

Tom _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Carter Schonwald

7:59 p.m.

Yes! Thanks for articulating it so nicely On Thu, Jun 3, 2021 at 2:51 PM Alan & Kim Zimmerman wrote:

...

I think in practical terms for IDE-based people, a short standardised alphanumeric identifier makes sense. These typically get displayed along with the full error text in the error pane, and it helps to be able to allocate a known, standard amount of real estate to them. Fundamentally they are just an index into something else, you will either copy/paste it, or click on it.

Alan

On Wed, 2 Jun 2021 at 20:16, Jakob Brünker wrote:

...
For what it's worth, there is an existing proposal about this topic, maybe that's the right place to discuss it for a user-focused perspective.

See https://github.com/ghc-proposals/ghc-proposals/pull/325

Jakob

On Wed, Jun 2, 2021 at 9:10 PM Tom Ellis < tom-lists-haskell-cafe-2017@jaguarpaw.co.uk> wrote:

...
On Wed, Jun 02, 2021 at 07:03:25PM +0000, Richard Eisenberg wrote:

...
...
To me this seems like a rare opportunity to do something where people will say "Hey look, that formidable Haskell compiler is doing something that's friendlier than the equivalent in any other compiler!". For such an important user-facing feature I don't understand why we're not asking users what they prefer.

I agree completely here! Let's ask! (Remember that this thread, posted to ghc-devs, was originally about documenting the GHC source code, something that would not affect users.)

Yes indeed. Let's one of us start a user-focused thread elsewhere (whoever gets round to it first) and post a link here so interested parties here can join in.

Tom _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Bryan Richter

7:41 p.m.

By the way, to summarize the discussion on #325, I think the words to use would be "overwhelming support for short numeric reference ids". Here's my argument for it: 1. If you try to make the unique id some mangled form of the error's name, the cognitive burden of crafting errors is increased. Professional experience leads me to believe this is no laughing matter. 2. A unique id will never replace pithy error messages, clear error names, nor detailed reference documentation anyway. 3. Numeric ids are internationalized (well, at least multi-nationalized) by construction. 4. Numeric ids, having no intrinsic meaning, are perfectly forward-compatible with any potential evolution of error names or descriptions. -Bryan On Wed, Jun 2, 2021 at 10:15 PM Jakob Brünker wrote:

...

For what it's worth, there is an existing proposal about this topic, maybe that's the right place to discuss it for a user-focused perspective.

See https://github.com/ghc-proposals/ghc-proposals/pull/325

Jakob

On Wed, Jun 2, 2021 at 9:10 PM Tom Ellis < tom-lists-haskell-cafe-2017@jaguarpaw.co.uk> wrote:

...
On Wed, Jun 02, 2021 at 07:03:25PM +0000, Richard Eisenberg wrote:

...
...
To me this seems like a rare opportunity to do something where people will say "Hey look, that formidable Haskell compiler is doing something that's friendlier than the equivalent in any other compiler!". For such an important user-facing feature I don't understand why we're not asking users what they prefer.

I agree completely here! Let's ask! (Remember that this thread, posted to ghc-devs, was originally about documenting the GHC source code, something that would not affect users.)

Yes indeed. Let's one of us start a user-focused thread elsewhere (whoever gets round to it first) and post a link here so interested parties here can join in.

Tom _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Simon Peyton Jones

2 Jun 2 Jun

4:46 p.m.

| e.g. "conflicting_trait_implementations" seems better than "E0119" I don't think so. If the compiler prints "E0119" and I search for that, I know I'm going to get exactly that, not similar but subtly different things. (A free text search might also throw up illuminating info, but is much less precise.) Simon | -----Original Message----- | From: ghc-devs On Behalf Of Tom Ellis | Sent: 02 June 2021 11:46 | To: ghc-devs@haskell.org | Subject: Re: value of documenting error messages? | | On Tue, Jun 01, 2021 at 03:40:57PM -0700, Alec Theriault wrote: | > Rust has taken an interesting approach for this: every error message | > is given a unique number like "E0119" | | Is there a particularly strong reason to use numbers as codes when we have | the entire space human-readable strings available to us? Even the subset of | case-insensitive strings formed from alphanumeric characters plus underscore | seems more suitable for the encoding than positive integers. | | e.g. "conflicting_trait_implementations" seems better than "E0119" | | Tom | _______________________________________________ | ghc-devs mailing list | ghc-devs@haskell.org | https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmail.haskel | l.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fghc- | devs&data=04%7C01%7Csimonpj%40microsoft.com%7C746c7987d166423f0cf808d925 | b3da91%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637582277123771646%7CUnk | nown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXV | CI6Mn0%3D%7C2000&sdata=ymgTrD0iPgl7%2Bf%2FOLwOP6r%2BJGfkiR2ej0QQl0oig2Pk | %3D&reserved=0

1507

Age (days ago)

1509

Last active (days ago)

List overview

Download

16 comments

10 participants

participants (10)

Alan & Kim Zimmerman
Alec Theriault
Bryan Richter
Carter Schonwald
Jakob Brünker
Mario Carneiro
Richard Eisenberg
Ruben Astudillo
Simon Peyton Jones
Tom Ellis