Proposal: require Haddock comment for every new top-level function and type in GHC source code

Hi! I found myself exploring new parts of the GHC code base the last few weeks (exciting!), which again reminded me of my biggest frustration when working on GHC: the lack of per-function/type (Haddock) comments. GHC code is sometimes commented with "notes", which are great but tend to (1) mostly cover the exceptional cases and (2) talk about the implementation of a function, not how a caller might use it or why. Lack of documentation, in GHC and other software projects, usually has (at least) two causes: - Programmers comment code they think is "complex enough to warrant a comment". The problem is that the author is usually a poor judge of what's complex enough, because he/she is too familiar with the code and tends to under-document code when following this principle. - Documenting is boring and tends to have little benefit the person writing to documentation. Given lack of incentives we tend to document less than we ought to. I've only seen one successful way to combat the lack of documentation that stems from the above: have the project's style guide mandate that top-level functions and types (or at least those that are exported) have documentation. This works well at Google. Anecdote: we have one code base inside Google that was until recently exempt from this rule and documentation is almost completely absent in that code base, even though hundreds of engineers work on and need to understand it every day. This breeds institutional knowledge problems i.e. if the author of a core piece of code leaves, lots of knowledge is lost. *Proposal: *I propose that we require that new top-level functions and types have Haddock comments, even if they start out as a single, humble sentence. I've found that putting even that one sentence (1) helps new users and (2) establishes a place for improvements to be made. There's a strong "broken window" effect to lack of comments, in that lack of comments breeds more lack of comments as developers follow established practices. We should add this requirement to the style guide. Having it as a written down policy tends to prevent having to re-hash the whole argument about documentation over and over again. This has also helped us a lot at Google, because programmers can spend endless amount of time arguing about comments, placement of curly braces, etc. and having a written policy helps cut down on that. To give an idea of how to write good comments, here are two examples of undocumented code I ran into in GHC and how better comments would have helped. *First example* In compiler/nativeGen/X86/Instr.hs there's a (local) function called mkRUR, which is a helper function use when computing instruction register usage. The first question that I asked upon seeing uses of that function was "what does RUR stand for?" Given the context the function is in, I guessed it stands for read-update-read, because R is used to mean "read" in the enclosing function and "updating" is related to "reading" so that must be what U stands for. It turns out that it stands for RegUsageReadonly. Here's a comment that would have captured, in a single sentence, what this function is for: -- | Create register usage info for instruction that only -- reads registers. mkRUR src = src' `seq` RU src' [] where src' = filter (interesting platform) src That already a big improvement. A note about the register filtering, which means that not all registers you pass to the function will be recorded as being read in the end, could also be useful. Aside: providing a type signature, which would have made it clear that the return type is RU, might also have helped in this particular case. *Second example* In the same file there a function called x86_regUsageOfInstr. It's the function that encloses the local function mkRUR above. I could figure out that this function has something to do with register usage, of the instruction passed as an argument, and that register usage is important for the register allocator. However, trying to understand in more detail what that meant was more of challenge than it needed to be. First, a comment more clearly explaining what computing register usage means in practice would be helpful: -- | Returns which registers are read and written by this -- instruction, as a (read, written) pair. This info is used -- by the register allocator. x86_regUsageOfInstr :: Platform -> Instr -> RegUsage The reason mentioning that the return value is essentially a (read, written) pair is helpful is because the body of the function a big case statement full of lines like this one: GCMP _ src1 src2 -> mkRUR [src1,src2] ... FDIV _ src dst -> usageRM src dst It's not immediately clear that all the various helper functions used here just end up computing a pair of the above form. A top-level comment lets you understand what's going on without understanding exactly what all these helper functions are doing. Thoughts? -- Johan

I’d be OK with this, (it’s a bit like requiring signatures on all top level functions) but I don’t know how we’d enforce it. Do you think the requirement should be for all top-level functions or just exported ones? I agree that Notes have a different purpose. But it should be OK style to refer to a Note from a top-level function comment, even though Haddock won’t be able to make much sense of it. Simon From: ghc-devs [mailto:ghc-devs-bounces@haskell.org] On Behalf Of Johan Tibell Sent: 27 June 2014 10:51 To: ghc-devs@haskell.org Subject: Proposal: require Haddock comment for every new top-level function and type in GHC source code Hi! I found myself exploring new parts of the GHC code base the last few weeks (exciting!), which again reminded me of my biggest frustration when working on GHC: the lack of per-function/type (Haddock) comments. GHC code is sometimes commented with "notes", which are great but tend to (1) mostly cover the exceptional cases and (2) talk about the implementation of a function, not how a caller might use it or why. Lack of documentation, in GHC and other software projects, usually has (at least) two causes: * Programmers comment code they think is "complex enough to warrant a comment". The problem is that the author is usually a poor judge of what's complex enough, because he/she is too familiar with the code and tends to under-document code when following this principle. * Documenting is boring and tends to have little benefit the person writing to documentation. Given lack of incentives we tend to document less than we ought to. I've only seen one successful way to combat the lack of documentation that stems from the above: have the project's style guide mandate that top-level functions and types (or at least those that are exported) have documentation. This works well at Google. Anecdote: we have one code base inside Google that was until recently exempt from this rule and documentation is almost completely absent in that code base, even though hundreds of engineers work on and need to understand it every day. This breeds institutional knowledge problems i.e. if the author of a core piece of code leaves, lots of knowledge is lost. Proposal: I propose that we require that new top-level functions and types have Haddock comments, even if they start out as a single, humble sentence. I've found that putting even that one sentence (1) helps new users and (2) establishes a place for improvements to be made. There's a strong "broken window" effect to lack of comments, in that lack of comments breeds more lack of comments as developers follow established practices. We should add this requirement to the style guide. Having it as a written down policy tends to prevent having to re-hash the whole argument about documentation over and over again. This has also helped us a lot at Google, because programmers can spend endless amount of time arguing about comments, placement of curly braces, etc. and having a written policy helps cut down on that. To give an idea of how to write good comments, here are two examples of undocumented code I ran into in GHC and how better comments would have helped. First example In compiler/nativeGen/X86/Instr.hs there's a (local) function called mkRUR, which is a helper function use when computing instruction register usage. The first question that I asked upon seeing uses of that function was "what does RUR stand for?" Given the context the function is in, I guessed it stands for read-update-read, because R is used to mean "read" in the enclosing function and "updating" is related to "reading" so that must be what U stands for. It turns out that it stands for RegUsageReadonly. Here's a comment that would have captured, in a single sentence, what this function is for: -- | Create register usage info for instruction that only -- reads registers. mkRUR src = src' `seq` RU src' [] where src' = filter (interesting platform) src That already a big improvement. A note about the register filtering, which means that not all registers you pass to the function will be recorded as being read in the end, could also be useful. Aside: providing a type signature, which would have made it clear that the return type is RU, might also have helped in this particular case. Second example In the same file there a function called x86_regUsageOfInstr. It's the function that encloses the local function mkRUR above. I could figure out that this function has something to do with register usage, of the instruction passed as an argument, and that register usage is important for the register allocator. However, trying to understand in more detail what that meant was more of challenge than it needed to be. First, a comment more clearly explaining what computing register usage means in practice would be helpful: -- | Returns which registers are read and written by this -- instruction, as a (read, written) pair. This info is used -- by the register allocator. x86_regUsageOfInstr :: Platform -> Instr -> RegUsage The reason mentioning that the return value is essentially a (read, written) pair is helpful is because the body of the function a big case statement full of lines like this one: GCMP _ src1 src2 -> mkRUR [src1,src2] ... FDIV _ src dst -> usageRM src dst It's not immediately clear that all the various helper functions used here just end up computing a pair of the above form. A top-level comment lets you understand what's going on without understanding exactly what all these helper functions are doing. Thoughts? -- Johan

On Fri, Jun 27, 2014 at 12:17 PM, Simon Peyton Jones
I’d be OK with this, (it’s a bit like requiring signatures on all top level functions) but I don’t know how we’d enforce it.
I think social enforcement is enough. If we agree that this is something we want to do and communicate that to ghc-devs@, put a note in our style guide, and kindly remind people to add comments when we do code reviews, we'll eventually end up with a culture of writing Haddocks. I think the most important part right now is that current contributors agree that this is something we want to do. Aside: people usually say that they find it hard to know what to document in their own code, because they don't know what others will find difficult. My advice is this: add a sentence or two about what the function does and why it exists, no matter how obvious you think that statement is.
Do you think the requirement should be for all top-level functions or just exported ones?
I take what I can get, but I think documenting all top-level functions makes sense in the case of GHC, as there's so much that goes on in our modules but we often only export a handful of functions. For example, compiler/codeGen/StgCmmPrim.hs is 2,000 lines long but only exports 3 functions. For someone that wants to work on that module for the first time only have docs on those three functions is helpful, but likely not enough. FWIW I document all top-level functions in my projects (and when I don't I often regret it later).
I agree that Notes have a different purpose. But it should be OK style to refer to a Note from a top-level function comment, even though Haddock won’t be able to make much sense of it.
Sure. Personally I would refer to the note from the function body if it talks mostly about the implementation, as opposed to how to use the function. -- Johan

I would counter propose a place on hackage for people to type in or modify
the documentation for functions, designed in such a way that the
documentation would easily find its way back into the project's source code
(with developer approval.) This way the documentation can be generated by
people who only recently came to understand the function, so the questions
a newcomer has are fresh in their mind.
On Fri, Jun 27, 2014 at 4:19 AM, Johan Tibell
On Fri, Jun 27, 2014 at 12:17 PM, Simon Peyton Jones
wrote: I’d be OK with this, (it’s a bit like requiring signatures on all top level functions) but I don’t know how we’d enforce it.
I think social enforcement is enough. If we agree that this is something we want to do and communicate that to ghc-devs@, put a note in our style guide, and kindly remind people to add comments when we do code reviews, we'll eventually end up with a culture of writing Haddocks.
I think the most important part right now is that current contributors agree that this is something we want to do.
Aside: people usually say that they find it hard to know what to document in their own code, because they don't know what others will find difficult. My advice is this: add a sentence or two about what the function does and why it exists, no matter how obvious you think that statement is.
Do you think the requirement should be for all top-level functions or just exported ones?
I take what I can get, but I think documenting all top-level functions makes sense in the case of GHC, as there's so much that goes on in our modules but we often only export a handful of functions. For example, compiler/codeGen/StgCmmPrim.hs is 2,000 lines long but only exports 3 functions. For someone that wants to work on that module for the first time only have docs on those three functions is helpful, but likely not enough. FWIW I document all top-level functions in my projects (and when I don't I often regret it later).
I agree that Notes have a different purpose. But it should be OK style to refer to a Note from a top-level function comment, even though Haddock won’t be able to make much sense of it.
Sure. Personally I would refer to the note from the function body if it talks mostly about the implementation, as opposed to how to use the function.
-- Johan _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

I don't think this is mutually exclusive with Johan's proposal. Let me suggest an amendment: Developers (both new and old) would be encouraged to submit patches adding or improving documentation in the source code. Documentation patches would be vetted as any others would be.
Discouraging source code comments is the worst legacy of Unix. IMO (contradicting K&R) even incorrect comments are better than none.
Howard
________________________________
From: David Fox

On 06/27/2014 03:26 PM, David Fox wrote:
I would counter propose a place on hackage for people to type in or modify the documentation for functions, designed in such a way that the documentation would easily find its way back into the project's source code (with developer approval.) This way the documentation can be generated by people who only recently came to understand the function, so the questions a newcomer has are fresh in their mind.
Are you asking for a wiki-like thing for documentation? There were a few times where this has been proposed such as https://github.com/haskell/haddock/issues/72 but in general it turns out that there's not enough interest for anyone to sit down and implement it and make sure it all works properly. Patches should be going straight to upstream rather than lingering on Hackage until someone notices them (even with automated tools, it's a pain). I doubt many people would use it for anything but typos because if you have enough knowledge about a function to document it, you're likely to already be involved with the project in some way and have means to report it properly.
On Fri, Jun 27, 2014 at 4:19 AM, Johan Tibell
wrote: On Fri, Jun 27, 2014 at 12:17 PM, Simon Peyton Jones
wrote: I’d be OK with this, (it’s a bit like requiring signatures on all top level functions) but I don’t know how we’d enforce it.
I think social enforcement is enough. If we agree that this is something we want to do and communicate that to ghc-devs@, put a note in our style guide, and kindly remind people to add comments when we do code reviews, we'll eventually end up with a culture of writing Haddocks.
I think the most important part right now is that current contributors agree that this is something we want to do.
Aside: people usually say that they find it hard to know what to document in their own code, because they don't know what others will find difficult. My advice is this: add a sentence or two about what the function does and why it exists, no matter how obvious you think that statement is.
Do you think the requirement should be for all top-level functions or just exported ones?
I take what I can get, but I think documenting all top-level functions makes sense in the case of GHC, as there's so much that goes on in our modules but we often only export a handful of functions. For example, compiler/codeGen/StgCmmPrim.hs is 2,000 lines long but only exports 3 functions. For someone that wants to work on that module for the first time only have docs on those three functions is helpful, but likely not enough. FWIW I document all top-level functions in my projects (and when I don't I often regret it later).
I agree that Notes have a different purpose. But it should be OK style to refer to a Note from a top-level function comment, even though Haddock won’t be able to make much sense of it.
Sure. Personally I would refer to the note from the function body if it talks mostly about the implementation, as opposed to how to use the function.
-- Johan _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
-- Mateusz K.

On Fri, Jun 27, 2014 at 11:11 AM, Mateusz Kowalczyk wrote: On 06/27/2014 03:26 PM, David Fox wrote: I would counter propose a place on hackage for people to type in or
modify
the documentation for functions, designed in such a way that the
documentation would easily find its way back into the project's source
code
(with developer approval.) This way the documentation can be generated
by
people who only recently came to understand the function, so the
questions
a newcomer has are fresh in their mind. Are you asking for a wiki-like thing for documentation? There were a few
times where this has been proposed such as
https://github.com/haskell/haddock/issues/72 but in general it turns out
that there's not enough interest for anyone to sit down and implement it
and make sure it all works properly. Patches should be going straight to
upstream rather than lingering on Hackage until someone notices them
(even with automated tools, it's a pain). I doubt many people would use
it for anything but typos because if you have enough knowledge about a
function to document it, you're likely to already be involved with the
project in some way and have means to report it properly. My thought was that it would end up in the library's source code, not that
it would reside in a wiki. The question is whether anyone has the
motivation to write a sufficiently smooth mechanism to achieve this. If I
was editing a package that I normally upload to hackage and I could look at
a nice presentation of alternative documentation strings people have
suggested for the different functions in my library, I would be happy to
cut and paste them into the source code.

On 06/27/2014 01:19 PM, Johan Tibell wrote:
On Fri, Jun 27, 2014 at 12:17 PM, Simon Peyton Jones
wrote: I’d be OK with this, (it’s a bit like requiring signatures on all top level functions) but I don’t know how we’d enforce it.
I think social enforcement is enough. If we agree that this is something we want to do and communicate that to ghc-devs@, put a note in our style guide, and kindly remind people to add comments when we do code reviews, we'll eventually end up with a culture of writing Haddocks.
I think the most important part right now is that current contributors agree that this is something we want to do.
Aside: people usually say that they find it hard to know what to document in their own code, because they don't know what others will find difficult. My advice is this: add a sentence or two about what the function does and why it exists, no matter how obvious you think that statement is.
Do you think the requirement should be for all top-level functions or just exported ones?
I take what I can get, but I think documenting all top-level functions makes sense in the case of GHC, as there's so much that goes on in our modules but we often only export a handful of functions. For example, compiler/codeGen/StgCmmPrim.hs is 2,000 lines long but only exports 3 functions. For someone that wants to work on that module for the first time only have docs on those three functions is helpful, but likely not enough. FWIW I document all top-level functions in my projects (and when I don't I often regret it later).
I agree that Notes have a different purpose. But it should be OK style to refer to a Note from a top-level function comment, even though Haddock won’t be able to make much sense of it.
Sure. Personally I would refer to the note from the function body if it talks mostly about the implementation, as opposed to how to use the function.
Notes could be moved into the module header if necessary so that they are rendered by Haddock. Alternatively, one function can contain the note and other places can refer to it by means of an anchor.
-- Johan _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
My personal gripe of trying to read docs for GHC modules/types is that some older modules are in .lhs format which means we have to dive out of nicely-rendered HTML and go into source. Was there ever talk of converting Literate Haskell files into more Haddock-friendly format? -- Mateusz K.

Given the examples provided with this proposal it looks like this
change is targeted mostly at compiler hackers, and not at
library/package developers. For the latter community it would be nice
to establish a policy for libraries and packages that encourages
("nudges") developers to include comments
and examples, something more useful than type signatures or pointers
to the literature on type theory.
For example, the R stats system encourages package developers to
include examples, and these examples are run as part of the package
checking process, thus encouraging developers to keep the examples
up-to-date.
This kind of change might help Haskell move beyond its current role as
a research and teaching tool, and an incubator for new programming
ideas
that are eventually implemented and popularized in other programming
languages...
Cheers,
Dominick
On Fri, Jun 27, 2014 at 6:17 AM, Simon Peyton Jones
I'd be OK with this, (it's a bit like requiring signatures on all top level functions) but I don't know how we'd enforce it.
Do you think the requirement should be for all top-level functions or just exported ones?
I agree that Notes have a different purpose. But it should be OK style to refer to a Note from a top-level function comment, even though Haddock won't be able to make much sense of it.
Simon
From: ghc-devs [mailto:ghc-devs-bounces@haskell.org] On Behalf Of Johan Tibell Sent: 27 June 2014 10:51 To: ghc-devs@haskell.org Subject: Proposal: require Haddock comment for every new top-level function and type in GHC source code
Hi!
I found myself exploring new parts of the GHC code base the last few weeks (exciting!), which again reminded me of my biggest frustration when working on GHC: the lack of per-function/type (Haddock) comments.
GHC code is sometimes commented with "notes", which are great but tend to (1) mostly cover the exceptional cases and (2) talk about the implementation of a function, not how a caller might use it or why.
Lack of documentation, in GHC and other software projects, usually has (at least) two causes:
Programmers comment code they think is "complex enough to warrant a comment". The problem is that the author is usually a poor judge of what's complex enough, because he/she is too familiar with the code and tends to under-document code when following this principle. Documenting is boring and tends to have little benefit the person writing to documentation. Given lack of incentives we tend to document less than we ought to.
I've only seen one successful way to combat the lack of documentation that stems from the above: have the project's style guide mandate that top-level functions and types (or at least those that are exported) have documentation. This works well at Google.
Anecdote: we have one code base inside Google that was until recently exempt from this rule and documentation is almost completely absent in that code base, even though hundreds of engineers work on and need to understand it every day. This breeds institutional knowledge problems i.e. if the author of a core piece of code leaves, lots of knowledge is lost.
Proposal: I propose that we require that new top-level functions and types have Haddock comments, even if they start out as a single, humble sentence.
I've found that putting even that one sentence (1) helps new users and (2) establishes a place for improvements to be made. There's a strong "broken window" effect to lack of comments, in that lack of comments breeds more lack of comments as developers follow established practices.
We should add this requirement to the style guide. Having it as a written down policy tends to prevent having to re-hash the whole argument about documentation over and over again. This has also helped us a lot at Google, because programmers can spend endless amount of time arguing about comments, placement of curly braces, etc. and having a written policy helps cut down on that.
To give an idea of how to write good comments, here are two examples of undocumented code I ran into in GHC and how better comments would have helped.
First example
In compiler/nativeGen/X86/Instr.hs there's a (local) function called mkRUR, which is a helper function use when computing instruction register usage.
The first question that I asked upon seeing uses of that function was "what does RUR stand for?" Given the context the function is in, I guessed it stands for read-update-read, because R is used to mean "read" in the enclosing function and "updating" is related to "reading" so that must be what U stands for. It turns out that it stands for RegUsageReadonly. Here's a comment that would have captured, in a single sentence, what this function is for:
-- | Create register usage info for instruction that only
-- reads registers.
mkRUR src = src' `seq` RU src' []
where src' = filter (interesting platform) src
That already a big improvement. A note about the register filtering, which means that not all registers you pass to the function will be recorded as being read in the end, could also be useful.
Aside: providing a type signature, which would have made it clear that the return type is RU, might also have helped in this particular case.
Second example
In the same file there a function called x86_regUsageOfInstr. It's the function that encloses the local function mkRUR above.
I could figure out that this function has something to do with register usage, of the instruction passed as an argument, and that register usage is important for the register allocator. However, trying to understand in more detail what that meant was more of challenge than it needed to be. First, a comment more clearly explaining what computing register usage means in practice would be helpful:
-- | Returns which registers are read and written by this
-- instruction, as a (read, written) pair. This info is used
-- by the register allocator.
x86_regUsageOfInstr :: Platform -> Instr -> RegUsage
The reason mentioning that the return value is essentially a (read, written) pair is helpful is because the body of the function a big case statement full of lines like this one:
GCMP _ src1 src2 -> mkRUR [src1,src2]
...
FDIV _ src dst -> usageRM src dst
It's not immediately clear that all the various helper functions used here just end up computing a pair of the above form. A top-level comment lets you understand what's going on without understanding exactly what all these helper functions are doing.
Thoughts?
-- Johan
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

On Mon, Jun 30, 2014 at 11:22 PM, Dominick Samperi
Given the examples provided with this proposal it looks like this change is targeted mostly at compiler hackers, and not at library/package developers.
Yes, this discussion is only about documenting the GHC modules. -- Johan

Thanks, Johan, for starting this discussion.
I mostly agree with the proposal. However, one (at times, serious) drawback to using Haddock is that it means that editing comments can cause parse failures. The way the GHC build works, these failures may not be detected until the end of a hacking session (if I'm using, say, `make 2`, as I tend to do) and then can be hard to diagnose. I've actually been bitten by this when working on GHC.
So, I have to ask: why use Haddock? Do folks read the Haddock docs for GHC? (I don't, but perhaps that's because the docs aren't so good right now.) Would it be acceptable to change this proposal not to require Haddock docs?
Even if we decide to keep this proposal about Haddock docs specifically, I would strongly request that correct rendering of the Haddock docs not be scrutinized. At the end of a hacking session (which is hard enough to find time for, as is), I don't want to be expected to look through the generated HTML to make sure that my typewriter font and italics are rendering correctly. This is something of a corollary to Simon's comment about wanting to refer to Notes from Haddock comments -- I would want the Haddock output to be quite secondary to the proper documentation in the source code.
(Note that this "demotion" of the role of Haddock is certainly not my practice in released libraries! But, Haddock is much less useful in an application like GHC than in a library.)
All that said, I do agree with the intent of this proposal and am happy to take on my part of the burden of documenting new (and perhaps some old) functions as I work. I have been very guilty of the "broken window" effect in not documenting new code.
Thanks,
Richard
On Jun 27, 2014, at 5:51 AM, Johan Tibell
Hi!
I found myself exploring new parts of the GHC code base the last few weeks (exciting!), which again reminded me of my biggest frustration when working on GHC: the lack of per-function/type (Haddock) comments.
GHC code is sometimes commented with "notes", which are great but tend to (1) mostly cover the exceptional cases and (2) talk about the implementation of a function, not how a caller might use it or why.
Lack of documentation, in GHC and other software projects, usually has (at least) two causes: Programmers comment code they think is "complex enough to warrant a comment". The problem is that the author is usually a poor judge of what's complex enough, because he/she is too familiar with the code and tends to under-document code when following this principle. Documenting is boring and tends to have little benefit the person writing to documentation. Given lack of incentives we tend to document less than we ought to. I've only seen one successful way to combat the lack of documentation that stems from the above: have the project's style guide mandate that top-level functions and types (or at least those that are exported) have documentation. This works well at Google.
Anecdote: we have one code base inside Google that was until recently exempt from this rule and documentation is almost completely absent in that code base, even though hundreds of engineers work on and need to understand it every day. This breeds institutional knowledge problems i.e. if the author of a core piece of code leaves, lots of knowledge is lost.
Proposal: I propose that we require that new top-level functions and types have Haddock comments, even if they start out as a single, humble sentence.
I've found that putting even that one sentence (1) helps new users and (2) establishes a place for improvements to be made. There's a strong "broken window" effect to lack of comments, in that lack of comments breeds more lack of comments as developers follow established practices.
We should add this requirement to the style guide. Having it as a written down policy tends to prevent having to re-hash the whole argument about documentation over and over again. This has also helped us a lot at Google, because programmers can spend endless amount of time arguing about comments, placement of curly braces, etc. and having a written policy helps cut down on that.
To give an idea of how to write good comments, here are two examples of undocumented code I ran into in GHC and how better comments would have helped.
First example In compiler/nativeGen/X86/Instr.hs there's a (local) function called mkRUR, which is a helper function use when computing instruction register usage.
The first question that I asked upon seeing uses of that function was "what does RUR stand for?" Given the context the function is in, I guessed it stands for read-update-read, because R is used to mean "read" in the enclosing function and "updating" is related to "reading" so that must be what U stands for. It turns out that it stands for RegUsageReadonly. Here's a comment that would have captured, in a single sentence, what this function is for:
-- | Create register usage info for instruction that only -- reads registers. mkRUR src = src' `seq` RU src' [] where src' = filter (interesting platform) src
That already a big improvement. A note about the register filtering, which means that not all registers you pass to the function will be recorded as being read in the end, could also be useful.
Aside: providing a type signature, which would have made it clear that the return type is RU, might also have helped in this particular case.
Second example In the same file there a function called x86_regUsageOfInstr. It's the function that encloses the local function mkRUR above.
I could figure out that this function has something to do with register usage, of the instruction passed as an argument, and that register usage is important for the register allocator. However, trying to understand in more detail what that meant was more of challenge than it needed to be. First, a comment more clearly explaining what computing register usage means in practice would be helpful:
-- | Returns which registers are read and written by this -- instruction, as a (read, written) pair. This info is used -- by the register allocator. x86_regUsageOfInstr :: Platform -> Instr -> RegUsage
The reason mentioning that the return value is essentially a (read, written) pair is helpful is because the body of the function a big case statement full of lines like this one:
GCMP _ src1 src2 -> mkRUR [src1,src2] ... FDIV _ src dst -> usageRM src dst
It's not immediately clear that all the various helper functions used here just end up computing a pair of the above form. A top-level comment lets you understand what's going on without understanding exactly what all these helper functions are doing.
Thoughts?
-- Johan
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hey list, I am strongly in favour of the proposal. As a pedestrian-level GHC contributor, the *vast* majority of my time is spent trying to figure out what certain things do, when the answer could be found in a one- or two-line comment above a definition. As for Richard's remark,
So, I have to ask: why use Haddock? Do folks read the Haddock docs for GHC? (I don't, but perhaps that's because the docs aren't so good right now.)
I find Haddock very useful for one major reason: it allows me an HTML overview over all exported definitions (in the index), which is a big plus when you're searching for things. Greetings, David/quchen -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.14 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBAgAGBQJTsYVUAAoJELrQsaT5WQUsKawIAIMHt9Ha4qTWtJO6qwOjN5RD JOx1MnuPlDLosbyE9+BlkEV1tRnnG/snyxwFTgmtFSO9fAV2FPZEbtzZ2AZd4xbb VgORhTAeL1n1aBitGNaAzT1T60tS2JNict2S0pUWa0Qt3nYWwoRw1B+OOaZRuuaR cHkOFKMbzU5knmeD/RyDIE+oRxZvjAKdAaaQ0vJ70ovNUptjtfDeX6Nxto65qFis sKsWjsL++TgeOscejw7DNLeCei/cwrzjOSNOB6xFGAxPHUHZFvSkbuVAMNWIgbic 55tbDIog/l9P/N8RoUQh4PLjh3TG3xT3vsM5iiTKl3UZ7eTMpzmzAKhvikoHGOU= =w6S7 -----END PGP SIGNATURE-----

Richard, Not requiring contributors to check that the Haddock render well is OK. As long as the comments are there someone with free time on their hands can always tidy up the rendering. I'm looking forward to the day I can browse the GHC Haddocks and make sense of them. I tried in the past but the generated docs aren't accessible enough without any actual comments.

Left hold off one more week to give more contributors a change to voice their thoughts. If no one protests I will announce the new policy next Monday. Sounds good?

I think that's a good idea, as long as we do enforce it as "social policy"
rather than a script. In other words, we should encourage folks to add
comments documenting functions, but not necessarily require them
*everywhere*, and leave it
I also share Richard's concern about using Haddock notation, and the
possibility of introducing build-failures due to typos in the documentation.
-Iavor
On Mon, Jun 30, 2014 at 1:18 PM, Johan Tibell
Left hold off one more week to give more contributors a change to voice their thoughts. If no one protests I will announce the new policy next Monday. Sounds good? _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

David Luposchainsky
Hey list,
I am strongly in favour of the proposal. As a pedestrian-level GHC contributor, the *vast* majority of my time is spent trying to figure out what certain things do, when the answer could be found in a one- or two-line comment above a definition.
I'd like to second this. As an occassional contributor, I find myself wading through a lot of code to deduce functions' purpose. While I'm often pleasantly surprised by the quality of the notes scattered about the code, per-definition Haddocks would fill in the many remaining gaps and provide a nice overview of each module. I agree that enforcing the quality of the rendered Haddocks is unnecessary. Once the language has been written there are many contributors (such as myself) who can further clean up the formatting. Cheers, - Ben

At the risk of sounding redundantly redundant, I'd like to third this.
My workflow for finding stuff in the GHC codebase is a mixture of grep
and Hoogle. Searching Hoogle for "+ghc :: [TyVar] -> Type -> Type" is
a huge timesaver, and Hoogle sends me to the generated haddock
comments. Usually the haddocks themselves aren't there, but the
"Source" link is a handy way to jump to the code and explore.
So having actual haddock documentation would only help in this regard.
The Notes are *great* for subtle issues with the implementation of
some function, but it'd be nice to have some commentary on how to
_use_ that function without having to understand how it works.
On Mon, Jun 30, 2014 at 3:42 PM, Ben Gamari
David Luposchainsky
writes: Hey list,
I am strongly in favour of the proposal. As a pedestrian-level GHC contributor, the *vast* majority of my time is spent trying to figure out what certain things do, when the answer could be found in a one- or two-line comment above a definition.
I'd like to second this. As an occassional contributor, I find myself wading through a lot of code to deduce functions' purpose. While I'm often pleasantly surprised by the quality of the notes scattered about the code, per-definition Haddocks would fill in the many remaining gaps and provide a nice overview of each module.
I agree that enforcing the quality of the rendered Haddocks is unnecessary. Once the language has been written there are many contributors (such as myself) who can further clean up the formatting.
Cheers,
- Ben
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

On 06/30/2014 04:19 PM, Richard Eisenberg wrote:
Thanks, Johan, for starting this discussion.
I mostly agree with the proposal. However, one (at times, serious) drawback to using Haddock is that it means that editing comments can cause parse failures. The way the GHC build works, these failures may not be detected until the end of a hacking session (if I'm using, say, `make 2`, as I tend to do) and then can be hard to diagnose. I've actually been bitten by this when working on GHC.
You should never get a parse failure anymore, as of 2.14.x versions. At worst, the comment will render in an ugly way but it will no longer cause failures due to typos in comment syntax. What might cause failures is if you put a Haddock comment where GHC doesn't expect it to. This should be fixed in GHC parser. If you don't want to mess around with pretty comments, all you have to do is to turn your function comments from ‘-- ’ to ’-- |’ and suddenly everyone else can benefit without source-diving.
So, I have to ask: why use Haddock? Do folks read the Haddock docs for GHC? (I don't, but perhaps that's because the docs aren't so good right now.) Would it be acceptable to change this proposal not to require Haddock docs?
I read the generated Haddock docs. The advantage of Haddock over something else here is that you can look at class instances, have clickable links and so on. Having to navigate on Haddock pages and reading the function docs elsewhere would be cluttered.
Even if we decide to keep this proposal about Haddock docs specifically, I would strongly request that correct rendering of the Haddock docs not be scrutinized. At the end of a hacking session (which is hard enough to find time for, as is), I don't want to be expected to look through the generated HTML to make sure that my typewriter font and italics are rendering correctly. This is something of a corollary to Simon's comment about wanting to refer to Notes from Haddock comments -- I would want the Haddock output to be quite secondary to the proper documentation in the source code.
(Note that this "demotion" of the role of Haddock is certainly not my practice in released libraries! But, Haddock is much less useful in an application like GHC than in a library.)
GHC is also a library, I think it's unreasonable to expect people to jump into source and manually tracking down all the comments (if any) when they want to use GHC API.
All that said, I do agree with the intent of this proposal and am happy to take on my part of the burden of documenting new (and perhaps some old) functions as I work. I have been very guilty of the "broken window" effect in not documenting new code.
Thanks, Richard
-- Mateusz K.

Agreed, let's do it. Thanks for the well-argued proposal. Next up: consistent style :-) Cheers, Simon On 27/06/2014 10:51, Johan Tibell wrote:
Hi!
I found myself exploring new parts of the GHC code base the last few weeks (exciting!), which again reminded me of my biggest frustration when working on GHC: the lack of per-function/type (Haddock) comments.
GHC code is sometimes commented with "notes", which are great but tend to (1) mostly cover the exceptional cases and (2) talk about the implementation of a function, not how a caller might use it or why.
Lack of documentation, in GHC and other software projects, usually has (at least) two causes:
* Programmers comment code they think is "complex enough to warrant a comment". The problem is that the author is usually a poor judge of what's complex enough, because he/she is too familiar with the code and tends to under-document code when following this principle. * Documenting is boring and tends to have little benefit the person writing to documentation. Given lack of incentives we tend to document less than we ought to.
I've only seen one successful way to combat the lack of documentation that stems from the above: have the project's style guide mandate that top-level functions and types (or at least those that are exported) have documentation. This works well at Google.
Anecdote: we have one code base inside Google that was until recently exempt from this rule and documentation is almost completely absent in that code base, even though hundreds of engineers work on and need to understand it every day. This breeds institutional knowledge problems i.e. if the author of a core piece of code leaves, lots of knowledge is lost.
*Proposal: *I propose that we require that new top-level functions and types have Haddock comments, even if they start out as a single, humble sentence.
I've found that putting even that one sentence (1) helps new users and (2) establishes a place for improvements to be made. There's a strong "broken window" effect to lack of comments, in that lack of comments breeds more lack of comments as developers follow established practices.
We should add this requirement to the style guide. Having it as a written down policy tends to prevent having to re-hash the whole argument about documentation over and over again. This has also helped us a lot at Google, because programmers can spend endless amount of time arguing about comments, placement of curly braces, etc. and having a written policy helps cut down on that.
To give an idea of how to write good comments, here are two examples of undocumented code I ran into in GHC and how better comments would have helped.
*First example* In compiler/nativeGen/X86/Instr.hs there's a (local) function called mkRUR, which is a helper function use when computing instruction register usage.
The first question that I asked upon seeing uses of that function was "what does RUR stand for?" Given the context the function is in, I guessed it stands for read-update-read, because R is used to mean "read" in the enclosing function and "updating" is related to "reading" so that must be what U stands for. It turns out that it stands for RegUsageReadonly. Here's a comment that would have captured, in a single sentence, what this function is for:
-- | Create register usage info for instruction that only -- reads registers. mkRUR src = src' `seq` RU src' [] where src' = filter (interesting platform) src
That already a big improvement. A note about the register filtering, which means that not all registers you pass to the function will be recorded as being read in the end, could also be useful.
Aside: providing a type signature, which would have made it clear that the return type is RU, might also have helped in this particular case.
*Second example* In the same file there a function called x86_regUsageOfInstr. It's the function that encloses the local function mkRUR above.
I could figure out that this function has something to do with register usage, of the instruction passed as an argument, and that register usage is important for the register allocator. However, trying to understand in more detail what that meant was more of challenge than it needed to be. First, a comment more clearly explaining what computing register usage means in practice would be helpful:
-- | Returns which registers are read and written by this -- instruction, as a (read, written) pair. This info is used -- by the register allocator. x86_regUsageOfInstr :: Platform -> Instr -> RegUsage
The reason mentioning that the return value is essentially a (read, written) pair is helpful is because the body of the function a big case statement full of lines like this one:
GCMP _ src1 src2 -> mkRUR [src1,src2] ... FDIV _ src dst -> usageRM src dst
It's not immediately clear that all the various helper functions used here just end up computing a pair of the above form. A top-level comment lets you understand what's going on without understanding exactly what all these helper functions are doing.
Thoughts?
-- Johan
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

That has a high chance of backfiring and requiring everyone to use do {
...; ... } with explicit braces and semis. ;)
-Edward
On Wed, Jul 2, 2014 at 4:08 AM, Simon Marlow
Agreed, let's do it. Thanks for the well-argued proposal.
Next up: consistent style :-)
Cheers, Simon
On 27/06/2014 10:51, Johan Tibell wrote:
Hi!
I found myself exploring new parts of the GHC code base the last few weeks (exciting!), which again reminded me of my biggest frustration when working on GHC: the lack of per-function/type (Haddock) comments.
GHC code is sometimes commented with "notes", which are great but tend to (1) mostly cover the exceptional cases and (2) talk about the implementation of a function, not how a caller might use it or why.
Lack of documentation, in GHC and other software projects, usually has (at least) two causes:
* Programmers comment code they think is "complex enough to warrant a
comment". The problem is that the author is usually a poor judge of what's complex enough, because he/she is too familiar with the code and tends to under-document code when following this principle. * Documenting is boring and tends to have little benefit the person
writing to documentation. Given lack of incentives we tend to document less than we ought to.
I've only seen one successful way to combat the lack of documentation that stems from the above: have the project's style guide mandate that top-level functions and types (or at least those that are exported) have documentation. This works well at Google.
Anecdote: we have one code base inside Google that was until recently exempt from this rule and documentation is almost completely absent in that code base, even though hundreds of engineers work on and need to understand it every day. This breeds institutional knowledge problems i.e. if the author of a core piece of code leaves, lots of knowledge is lost.
*Proposal: *I propose that we require that new top-level functions and
types have Haddock comments, even if they start out as a single, humble sentence.
I've found that putting even that one sentence (1) helps new users and (2) establishes a place for improvements to be made. There's a strong "broken window" effect to lack of comments, in that lack of comments breeds more lack of comments as developers follow established practices.
We should add this requirement to the style guide. Having it as a written down policy tends to prevent having to re-hash the whole argument about documentation over and over again. This has also helped us a lot at Google, because programmers can spend endless amount of time arguing about comments, placement of curly braces, etc. and having a written policy helps cut down on that.
To give an idea of how to write good comments, here are two examples of undocumented code I ran into in GHC and how better comments would have helped.
*First example*
In compiler/nativeGen/X86/Instr.hs there's a (local) function called mkRUR, which is a helper function use when computing instruction register usage.
The first question that I asked upon seeing uses of that function was "what does RUR stand for?" Given the context the function is in, I guessed it stands for read-update-read, because R is used to mean "read" in the enclosing function and "updating" is related to "reading" so that must be what U stands for. It turns out that it stands for RegUsageReadonly. Here's a comment that would have captured, in a single sentence, what this function is for:
-- | Create register usage info for instruction that only -- reads registers. mkRUR src = src' `seq` RU src' [] where src' = filter (interesting platform) src
That already a big improvement. A note about the register filtering, which means that not all registers you pass to the function will be recorded as being read in the end, could also be useful.
Aside: providing a type signature, which would have made it clear that the return type is RU, might also have helped in this particular case.
*Second example*
In the same file there a function called x86_regUsageOfInstr. It's the function that encloses the local function mkRUR above.
I could figure out that this function has something to do with register usage, of the instruction passed as an argument, and that register usage is important for the register allocator. However, trying to understand in more detail what that meant was more of challenge than it needed to be. First, a comment more clearly explaining what computing register usage means in practice would be helpful:
-- | Returns which registers are read and written by this -- instruction, as a (read, written) pair. This info is used -- by the register allocator. x86_regUsageOfInstr :: Platform -> Instr -> RegUsage
The reason mentioning that the return value is essentially a (read, written) pair is helpful is because the body of the function a big case statement full of lines like this one:
GCMP _ src1 src2 -> mkRUR [src1,src2] ... FDIV _ src dst -> usageRM src dst
It's not immediately clear that all the various helper functions used here just end up computing a pair of the above form. A top-level comment lets you understand what's going on without understanding exactly what all these helper functions are doing.
Thoughts?
-- Johan
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

Which makes a lot of GHC code more readable — I’m serious!
Manuel
PS: I have resisted it for a while, but after slogging through GHC for extended periods, I’ve come to appreciate the additional clarity in large and tricky functions (e.g., in the type checker & renamer).
Edward Kmett
That has a high chance of backfiring and requiring everyone to use do { ...; ... } with explicit braces and semis. ;)
-Edward
On Wed, Jul 2, 2014 at 4:08 AM, Simon Marlow
wrote: Agreed, let's do it. Thanks for the well-argued proposal. Next up: consistent style :-)
Cheers, Simon
On 27/06/2014 10:51, Johan Tibell wrote: Hi!
I found myself exploring new parts of the GHC code base the last few weeks (exciting!), which again reminded me of my biggest frustration when working on GHC: the lack of per-function/type (Haddock) comments.
GHC code is sometimes commented with "notes", which are great but tend to (1) mostly cover the exceptional cases and (2) talk about the implementation of a function, not how a caller might use it or why.
Lack of documentation, in GHC and other software projects, usually has (at least) two causes:
* Programmers comment code they think is "complex enough to warrant a
comment". The problem is that the author is usually a poor judge of what's complex enough, because he/she is too familiar with the code and tends to under-document code when following this principle. * Documenting is boring and tends to have little benefit the person
writing to documentation. Given lack of incentives we tend to document less than we ought to.
I've only seen one successful way to combat the lack of documentation that stems from the above: have the project's style guide mandate that top-level functions and types (or at least those that are exported) have documentation. This works well at Google.
Anecdote: we have one code base inside Google that was until recently exempt from this rule and documentation is almost completely absent in that code base, even though hundreds of engineers work on and need to understand it every day. This breeds institutional knowledge problems i.e. if the author of a core piece of code leaves, lots of knowledge is lost.
*Proposal: *I propose that we require that new top-level functions and
types have Haddock comments, even if they start out as a single, humble sentence.
I've found that putting even that one sentence (1) helps new users and (2) establishes a place for improvements to be made. There's a strong "broken window" effect to lack of comments, in that lack of comments breeds more lack of comments as developers follow established practices.
We should add this requirement to the style guide. Having it as a written down policy tends to prevent having to re-hash the whole argument about documentation over and over again. This has also helped us a lot at Google, because programmers can spend endless amount of time arguing about comments, placement of curly braces, etc. and having a written policy helps cut down on that.
To give an idea of how to write good comments, here are two examples of undocumented code I ran into in GHC and how better comments would have helped.
*First example*
In compiler/nativeGen/X86/Instr.hs there's a (local) function called mkRUR, which is a helper function use when computing instruction register usage.
The first question that I asked upon seeing uses of that function was "what does RUR stand for?" Given the context the function is in, I guessed it stands for read-update-read, because R is used to mean "read" in the enclosing function and "updating" is related to "reading" so that must be what U stands for. It turns out that it stands for RegUsageReadonly. Here's a comment that would have captured, in a single sentence, what this function is for:
-- | Create register usage info for instruction that only -- reads registers. mkRUR src = src' `seq` RU src' [] where src' = filter (interesting platform) src
That already a big improvement. A note about the register filtering, which means that not all registers you pass to the function will be recorded as being read in the end, could also be useful.
Aside: providing a type signature, which would have made it clear that the return type is RU, might also have helped in this particular case.
*Second example*
In the same file there a function called x86_regUsageOfInstr. It's the function that encloses the local function mkRUR above.
I could figure out that this function has something to do with register usage, of the instruction passed as an argument, and that register usage is important for the register allocator. However, trying to understand in more detail what that meant was more of challenge than it needed to be. First, a comment more clearly explaining what computing register usage means in practice would be helpful:
-- | Returns which registers are read and written by this -- instruction, as a (read, written) pair. This info is used -- by the register allocator. x86_regUsageOfInstr :: Platform -> Instr -> RegUsage
The reason mentioning that the return value is essentially a (read, written) pair is helpful is because the body of the function a big case statement full of lines like this one:
GCMP _ src1 src2 -> mkRUR [src1,src2] ... FDIV _ src dst -> usageRM src dst
It's not immediately clear that all the various helper functions used here just end up computing a pair of the above form. A top-level comment lets you understand what's going on without understanding exactly what all these helper functions are doing.
Thoughts?
-- Johan
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

I also support this proposal (I actually tripped up on these exact
functions as well!) As stated elsewhere, I think this should be the
case for all top-level functions, not just exported ones.
Of course, I'd also like it if this rule explicitly extended to
top-level data types, type classes, etc as well. I believe that was
the intention but I'm just making sure. :)
(Finally, I actually would like some kind of mechanical enforcement of
this, but I don't think it has to be a hard rule - we shouldn't reject
things on that basis alone. I'm not sure how we would do that anyway,
though.)
On Fri, Jun 27, 2014 at 4:51 AM, Johan Tibell
Hi!
I found myself exploring new parts of the GHC code base the last few weeks (exciting!), which again reminded me of my biggest frustration when working on GHC: the lack of per-function/type (Haddock) comments.
GHC code is sometimes commented with "notes", which are great but tend to (1) mostly cover the exceptional cases and (2) talk about the implementation of a function, not how a caller might use it or why.
Lack of documentation, in GHC and other software projects, usually has (at least) two causes:
Programmers comment code they think is "complex enough to warrant a comment". The problem is that the author is usually a poor judge of what's complex enough, because he/she is too familiar with the code and tends to under-document code when following this principle. Documenting is boring and tends to have little benefit the person writing to documentation. Given lack of incentives we tend to document less than we ought to.
I've only seen one successful way to combat the lack of documentation that stems from the above: have the project's style guide mandate that top-level functions and types (or at least those that are exported) have documentation. This works well at Google.
Anecdote: we have one code base inside Google that was until recently exempt from this rule and documentation is almost completely absent in that code base, even though hundreds of engineers work on and need to understand it every day. This breeds institutional knowledge problems i.e. if the author of a core piece of code leaves, lots of knowledge is lost.
Proposal: I propose that we require that new top-level functions and types have Haddock comments, even if they start out as a single, humble sentence.
I've found that putting even that one sentence (1) helps new users and (2) establishes a place for improvements to be made. There's a strong "broken window" effect to lack of comments, in that lack of comments breeds more lack of comments as developers follow established practices.
We should add this requirement to the style guide. Having it as a written down policy tends to prevent having to re-hash the whole argument about documentation over and over again. This has also helped us a lot at Google, because programmers can spend endless amount of time arguing about comments, placement of curly braces, etc. and having a written policy helps cut down on that.
To give an idea of how to write good comments, here are two examples of undocumented code I ran into in GHC and how better comments would have helped.
First example In compiler/nativeGen/X86/Instr.hs there's a (local) function called mkRUR, which is a helper function use when computing instruction register usage.
The first question that I asked upon seeing uses of that function was "what does RUR stand for?" Given the context the function is in, I guessed it stands for read-update-read, because R is used to mean "read" in the enclosing function and "updating" is related to "reading" so that must be what U stands for. It turns out that it stands for RegUsageReadonly. Here's a comment that would have captured, in a single sentence, what this function is for:
-- | Create register usage info for instruction that only -- reads registers. mkRUR src = src' `seq` RU src' [] where src' = filter (interesting platform) src
That already a big improvement. A note about the register filtering, which means that not all registers you pass to the function will be recorded as being read in the end, could also be useful.
Aside: providing a type signature, which would have made it clear that the return type is RU, might also have helped in this particular case.
Second example In the same file there a function called x86_regUsageOfInstr. It's the function that encloses the local function mkRUR above.
I could figure out that this function has something to do with register usage, of the instruction passed as an argument, and that register usage is important for the register allocator. However, trying to understand in more detail what that meant was more of challenge than it needed to be. First, a comment more clearly explaining what computing register usage means in practice would be helpful:
-- | Returns which registers are read and written by this -- instruction, as a (read, written) pair. This info is used -- by the register allocator. x86_regUsageOfInstr :: Platform -> Instr -> RegUsage
The reason mentioning that the return value is essentially a (read, written) pair is helpful is because the body of the function a big case statement full of lines like this one:
GCMP _ src1 src2 -> mkRUR [src1,src2] ... FDIV _ src dst -> usageRM src dst
It's not immediately clear that all the various helper functions used here just end up computing a pair of the above form. A top-level comment lets you understand what's going on without understanding exactly what all these helper functions are doing.
Thoughts?
-- Johan
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
-- Regards, Austin Seipp, Haskell Consultant Well-Typed LLP, http://www.well-typed.com/

On Wed, Jul 2, 2014 at 6:04 PM, Austin Seipp
Of course, I'd also like it if this rule explicitly extended to top-level data types, type classes, etc as well. I believe that was the intention but I'm just making sure. :)
That was the intention.
(Finally, I actually would like some kind of mechanical enforcement of this, but I don't think it has to be a hard rule - we shouldn't reject things on that basis alone. I'm not sure how we would do that anyway, though.)
The way I suggest we do this, if we do this, is to add a linter to Phabricator that adds a note to the code review that the new code lacks the appropriate docs. That way we encourage users to add them, without e.g. making validate fail or something similar. This is what we do at Google (and FB too I presume).

Yes, Phabricator/arcanist is probably the best way to do it, I agree.
I was more wondering technically how we'd enforce it (just regex?
parse/lex the code for top level definitions? etc). There are also
other opportunities for linting here, so we should think about that a
bit.
Anyway, good stuff!
On Wed, Jul 2, 2014 at 11:33 AM, Johan Tibell
On Wed, Jul 2, 2014 at 6:04 PM, Austin Seipp
wrote: Of course, I'd also like it if this rule explicitly extended to top-level data types, type classes, etc as well. I believe that was the intention but I'm just making sure. :)
That was the intention.
(Finally, I actually would like some kind of mechanical enforcement of this, but I don't think it has to be a hard rule - we shouldn't reject things on that basis alone. I'm not sure how we would do that anyway, though.)
The way I suggest we do this, if we do this, is to add a linter to Phabricator that adds a note to the code review that the new code lacks the appropriate docs. That way we encourage users to add them, without e.g. making validate fail or something similar. This is what we do at Google (and FB too I presume).
-- Regards, Austin Seipp, Haskell Consultant Well-Typed LLP, http://www.well-typed.com/
participants (15)
-
Andrew Farmer
-
Austin Seipp
-
Ben Gamari
-
David Fox
-
David Luposchainsky
-
Dominick Samperi
-
Edward Kmett
-
Howard B. Golden
-
Iavor Diatchki
-
Johan Tibell
-
Manuel M T Chakravarty
-
Mateusz Kowalczyk
-
Richard Eisenberg
-
Simon Marlow
-
Simon Peyton Jones