Edward Kmett <ekmett@gmail.com>:

That has a high chance of backfiring and requiring everyone to use do { ...; ... } with explicit braces and semis. ;)

-Edward

On Wed, Jul 2, 2014 at 4:08 AM, Simon Marlow <marlowsd@gmail.com> wrote:
Agreed, let's do it. Thanks for the well-argued proposal.

Next up: consistent style :-)

Cheers,
Simon

On 27/06/2014 10:51, Johan Tibell wrote:

Hi!

I found myself exploring new parts of the GHC code base the last few
weeks (exciting!), which again reminded me of my biggest frustration
when working on GHC: the lack of per-function/type (Haddock) comments.

GHC code is sometimes commented with "notes", which are great but tend
to (1) mostly cover the exceptional cases and (2) talk about the
implementation of a function, not how a caller might use it or why.

Lack of documentation, in GHC and other software projects, usually has
(at least) two causes:

* Programmers comment code they think is "complex enough to warrant a

comment". The problem is that the author is usually a poor judge of
what's complex enough, because he/she is too familiar with the code
and tends to under-document code when following this principle.
* Documenting is boring and tends to have little benefit the person

writing to documentation. Given lack of incentives we tend to
document less than we ought to.

I've only seen one successful way to combat the lack of documentation
that stems from the above: have the project's style guide mandate that
top-level functions and types (or at least those that are exported) have
documentation. This works well at Google.

Anecdote: we have one code base inside Google that was until recently
exempt from this rule and documentation is almost completely absent in
that code base, even though hundreds of engineers work on and need to
understand it every day. This breeds institutional knowledge problems
i.e. if the author of a core piece of code leaves, lots of knowledge is
lost.

*Proposal: *I propose that we require that new top-level functions and

types have Haddock comments, even if they start out as a single, humble
sentence.

I've found that putting even that one sentence (1) helps new users and
(2) establishes a place for improvements to be made. There's a strong
"broken window" effect to lack of comments, in that lack of comments
breeds more lack of comments as developers follow established practices.

We should add this requirement to the style guide. Having it as a
written down policy tends to prevent having to re-hash the whole
argument about documentation over and over again. This has also helped
us a lot at Google, because programmers can spend endless amount of time
arguing about comments, placement of curly braces, etc. and having a
written policy helps cut down on that.

To give an idea of how to write good comments, here are two examples of
undocumented code I ran into in GHC and how better comments would have
helped.

*First example*

In compiler/nativeGen/X86/Instr.hs there's a (local) function called
mkRUR, which is a helper function use when computing instruction
register usage.

The first question that I asked upon seeing uses of that function was
"what does RUR stand for?" Given the context the function is in, I
guessed it stands for read-update-read, because R is used to mean "read"
in the enclosing function and "updating" is related to "reading" so that
must be what U stands for. It turns out that it stands for
RegUsageReadonly. Here's a comment that would have captured, in a single
sentence, what this function is for:

-- | Create register usage info for instruction that only
-- reads registers.
mkRUR src = src' `seq` RU src' []
where src' = filter (interesting platform) src

That already a big improvement. A note about the register filtering,
which means that not all registers you pass to the function will be
recorded as being read in the end, could also be useful.

Aside: providing a type signature, which would have made it clear that
the return type is RU, might also have helped in this particular case.

*Second example*

In the same file there a function called x86_regUsageOfInstr. It's the
function that encloses the local function mkRUR above.

I could figure out that this function has something to do with register
usage, of the instruction passed as an argument, and that register usage
is important for the register allocator. However, trying to understand
in more detail what that meant was more of challenge than it needed to
be. First, a comment more clearly explaining what computing register
usage means in practice would be helpful:

-- | Returns which registers are read and written by this
-- instruction, as a (read, written) pair. This info is used
-- by the register allocator.
x86_regUsageOfInstr :: Platform -> Instr -> RegUsage

The reason mentioning that the return value is essentially a (read,
written) pair is helpful is because the body of the function a big case
statement full of lines like this one:

GCMP _ src1 src2 -> mkRUR [src1,src2]
...
FDIV _ src dst -> usageRM src dst

It's not immediately clear that all the various helper functions used
here just end up computing a pair of the above form. A top-level comment
lets you understand what's going on without understanding exactly what
all these helper functions are doing.

Thoughts?

-- Johan

_______________________________________________
ghc-devs mailing list
ghc-devs@haskell.org
http://www.haskell.org/mailman/listinfo/ghc-devs

_______________________________________________
ghc-devs mailing list
ghc-devs@haskell.org
http://www.haskell.org/mailman/listinfo/ghc-devs

_______________________________________________
ghc-devs mailing list
ghc-devs@haskell.org
http://www.haskell.org/mailman/listinfo/ghc-devs