Re: GHC source code improvement ideas

4 Jan 2008

      Twan van Laarhoven wrote:
...
Hello GHC people,
I was trying my hand at some GHC hacking, and I couldn't help but notice 
that much of the code looks (IMHO) horrible. There are some things that 
look as if they haven't been touched since the Haskell 1.3 days. The 
most striking is the use of `thenXX` instead of do notation.
I was thinking of cleaning up some of this. In particular:
1a. Use do notation where possible instead of `thenXX`.
Yes.
...
1b. Even better, use Applicative where possible. For example
ds_type (HsFunTy ty1 ty2)
          = dsHsType ty1            `thenM` \ tau_ty1 ->
            dsHsType ty2            `thenM` \ tau_ty2 ->
            returnM (mkFunTy tau_ty1 tau_ty2)
Could be rewritten as
ds_type (HsFunTy ty1 ty2)
          = mkFunTy <$> dsHsType ty1 <*> dsHsType ty2
No, IMO.  This just adds another abstraction that a potential GHC 
contributor has to learn about.  Including me :-)  It doesn't hurt to write 
  out the code in full sometimes, and as the GHC coding style guidelines say:

   Remember: other people have to be able to quickly understand what you've
   done, and overuse of abstractions just serves to obscure the really
   tricky stuff, and there's no shortage of that in GHC.

http://hackage.haskell.org/trac/ghc/wiki/Commentary/CodingStyle

(I think I might be quoting myself there, in which case it doesn't really 
add much credibility to my argument :).

Also bear in mind that we have to be able to compile GHC with older 
versions of itself (currently back to 6.2.2), which explains why we're 
often conservative with respect to using external libraries.
...
2. Investigate the need for all these mappM functions. To quote the 
source: "Standard combinators, but specialised for this monad (for 
efficiency)". Wouldn't a SPECIALIZE pragma work just as well?
It does look like we could do away with some of those, yes.
...
3. In general, use standard functions when they are available. 
Currently GHC has its own sort function and its own versions of 
FiniteMap/Data.Map and Text.PrettyPrint. Using standard functions/data 
structures will a) make GHC smaller and therefore easier to maintain, 
and b) will make the code easier to understand for someone who knows the 
standard library.
In general yes.  But bear in mind there's another principle at work here: 
we want to insulate GHC as much as possible from its environment, to reduce 
the chances of spurious failures or performance regressions.

I believe our version of Text.PrettyPrint has diverged from the library 
version - last time I looked it wasn't obvious that we could replace it.

Regarding Data.Map, I'd be interested in trying out AVL trees instead, to 
see if they have better performance, but then we'd have to import the code 
of course.
...
4. A more radical change would be introducing hierarchical modules. 
This could be just a matter of renaming the directories to start with an 
upper case character and changing the import declarations. This gives 
module names like "Typecheck.TcGadt". The tc is redundant here, so 
perhaps it should be renamed to "Typecheck.Gadt" or "Typecheck.GADT". 
Perhaps even better would be "GHC.Typecheck.GADT", this way some modules 
can be exposed as part of the GHC api.
Yes, I've wondered about doing this in the past.  It does mean more typing, 
though: clearly 'import BasicTypes.Id' is more painful than just 'import 
Id' and doesn't add much, so IMO we'd need to put some thought into a 
structure that makes sense.
...
How do the GHC developers feel about this? Would such paches be gladly 
excepted? Or will they be directly forwarded to the bit bucket?
Within the constraints I've outlined above, cleanups are definitely welcome.

Another worthwhile gardening-type activity is to go around elimianting 
warnings.  Many modules have {-# OPTIONS -w #-} at the top, waiting for 
someone to go in and clean up all the warnings.  They do catch real bugs, 
so this is not a waste of time by any means.

In addition to this, things like

  - identifying and removing dead code
  - commoning up duplicate code fragments
  - improving test coverage
  - profiling and optimising hotspots

are all useful tasks.

Cheers,
	Simon

Re: GHC source code improvement ideas

Simon Marlow