RFC: style cleanup & guidelines for GHC, and related bikeshedding

2 Jul 2014

      ...
From these numbers, we can see a few useful things at least, primarily
Hi *,

First off, WARNING: BIKESHEDDING AHEAD.

With that out of the way - today on IRC, there was some discussion
about some stylistic/consistency issues in GHC, and being spurred by
Johans recent proposal for top-level documentation, I figured perhaps
we should beat the drum on this issue as well.

The TL;DR is that GHC has a lot of inconsistent style issues,
including things like:

 - Mixing literate haskell with non-literate haskell files
 - Legacy code with tabs and spaces intermixed
 - Related to the last one, trailing whitespace
 - Mixing styles of do notation in different parts of the compiler
(braces vs no braces)
 - Probably things like indentation mismatches even in the same code
 - Probably many other things I've missed, obvious or not.

These issues by themselves aren't too bad, but together they make the
coding style for GHC very inconsistent, and this hurts maintainability
a bit I feel. Furthermore, some of these issues block related
improvements - for example,
https://ghc.haskell.org/trac/ghc/ticket/9230 which is probably quite
reasonable will likely be a bit annoying to implement until GHC itself
is de-tabbed - we use -Werror during ./validate. This particular issue
is what started the discussion.

Also, with developers now using arcanist and phabricator, they have
linting enabled for new patches, but they will often warn about
surrounding issues, mostly tabs and trailing spaces. This is a bit
annoying for submitters, and would be fixed by enforcing it.

First attack plan
~~~~~~~~~~~~~~~

So, to start, I'd like to propose that we make some guidelines for
these kinds of things, and also a plan to fix some of them. To start:

 #1) We should really consider going ahead and detabbing the remaining
files that have them. We already enforce this on new commits with git
hooks, but by doing this, we can make -fwarn-tabs a default flag and
then validate with -Werror in the development process.

 #2) Similarly, we should kill all the trailing whitespace. (I think
this is less controversial than #1)

 #3) We should most certainly move the remaining files from literate
haskell to non-literate haskell. Most of the files in the compiler are
already in this form, and the literate haskell documentation can't be
used to generate PDFs or anything similar. I suggest we get rid of it.
More Haskell users use non-literate files anyway. This is probably the
least controversial.

Merge issues
~~~~~~~~~~~~~~~~~

The reason we haven't done the above three things historically is that
it makes merge conflicts nastier. A useful approximation suggested on
IRC might be to detab and remove whitespace for files older than a
certain date (say, 6 months).

However, in general I'm thinking perhaps it's best to go ahead and
bite the bullet. maybe. I'd like to know what other people think! If
we have a vote and most people are in favor of doing this, maybe we
should really do it.

I'd especially like to hear about this if you have an outstanding branch.

Some numbers on these issues
~~~~~~~~~~~~~~~~~~~~~~~~

Here are some quick numbers on where most of the tabs reside, as well
as the breakdown of literate files vs non-literate files.

NOTE: these tests occurred in the 'compiler' subdirectory of the GHC
repository, which is where most of the relevant code is.

LITERATE vs NON-LITERATE:

$ find . -type f -iname '*.hs' | wc -l
206

$ find . -type f -iname '*.lhs' | wc -l
194

Non-literate wins by a slim margin! But having the compiler divided in
half is really not a good thing IMO...

NUMBER OF TABS PER SUBDIRECTORY:

NOTE: this counts the number of lines which have tabs in them. It does
not count the total number of tab occurrences.

$ for x in `echo */`; do echo -n "$x:\t\t"; find $x -type f -regex
'.*\.\(lhs\|hs\)' | xargs grep -P '\t' | wc -l; done
basicTypes/: 919
cbits/: 0
cmm/: 38
codeGen/: 0
coreSyn/: 843
deSugar/: 545
ghci/: 90
hsSyn/: 120
iface/: 213
llvmGen/: 0
main/: 8
nativeGen/: 1213
parser/: 19
prelude/: 182
profiling/: 39
rename/: 188
simplCore/: 754
simplStg/: 0
specialise/: 0
stgSyn/: 0
stranal/: 336
typecheck/: 1171
types/: 301
utils/: 220
vectorise/: 0

that there are definitely some places where removing tabs should be
easy. For example, parser/, profiling/, main/, and cmm/ can all be
de-tabbed without much of a problem, I think.

nativeGen is very often not touched, so even though it has a *huge*
amount of tabs, it can likely be de-tabbed as well with minimal
impact.

Other style issues
~~~~~~~~~~~~~~~~~

We should also discuss some related issues, like what general
block-width to use for indentations, naming conventions, and other
stuff. However, I leave this all to you, and perhaps it is best we
split that part off into a separate thread. Some things I'd like you
all to consider:

 - Block width for indentation
 - Naming conventions (we use camelCase and_underscores_sometimes
which isReally_confusing)
 - Import/export styles (I think we have some sloppiness here too)
 - Other things worth arguing forever about.

Thoughts on the above issues?

-- 
Regards,

Austin Seipp, Haskell Consultant
Well-Typed LLP, http://www.well-typed.com/

Austin Seipp

Sergei Trofimovich

Richard Eisenberg

Iavor Diatchki

Jan Stolarek

Joachim Breitner

Alexander Berntsen

Jan Stolarek

Alexander Berntsen

Andrew Farmer

Simon Peyton Jones

Johan Tibell

Jan Stolarek

Simon Marlow

Simon Marlow

Manuel M T Chakravarty

Simon Peyton Jones

Johan Tibell

Manuel M T Chakravarty

Alexander Berntsen

tags

participants (12)