
Twan van Laarhoven wrote:
Hello GHC people,
I was trying my hand at some GHC hacking, and I couldn't help but notice that much of the code looks (IMHO) horrible. There are some things that look as if they haven't been touched since the Haskell 1.3 days. The most striking is the use of `thenXX` instead of do notation.
I was thinking of cleaning up some of this. In particular:
1a. Use do notation where possible instead of `thenXX`.
Yes.
1b. Even better, use Applicative where possible. For example
ds_type (HsFunTy ty1 ty2) = dsHsType ty1 `thenM` \ tau_ty1 -> dsHsType ty2 `thenM` \ tau_ty2 -> returnM (mkFunTy tau_ty1 tau_ty2)
Could be rewritten as
ds_type (HsFunTy ty1 ty2) = mkFunTy <$> dsHsType ty1 <*> dsHsType ty2
No, IMO. This just adds another abstraction that a potential GHC contributor has to learn about. Including me :-) It doesn't hurt to write out the code in full sometimes, and as the GHC coding style guidelines say: Remember: other people have to be able to quickly understand what you've done, and overuse of abstractions just serves to obscure the really tricky stuff, and there's no shortage of that in GHC. http://hackage.haskell.org/trac/ghc/wiki/Commentary/CodingStyle (I think I might be quoting myself there, in which case it doesn't really add much credibility to my argument :). Also bear in mind that we have to be able to compile GHC with older versions of itself (currently back to 6.2.2), which explains why we're often conservative with respect to using external libraries.
2. Investigate the need for all these mappM functions. To quote the source: "Standard combinators, but specialised for this monad (for efficiency)". Wouldn't a SPECIALIZE pragma work just as well?
It does look like we could do away with some of those, yes.
3. In general, use standard functions when they are available. Currently GHC has its own sort function and its own versions of FiniteMap/Data.Map and Text.PrettyPrint. Using standard functions/data structures will a) make GHC smaller and therefore easier to maintain, and b) will make the code easier to understand for someone who knows the standard library.
In general yes. But bear in mind there's another principle at work here: we want to insulate GHC as much as possible from its environment, to reduce the chances of spurious failures or performance regressions. I believe our version of Text.PrettyPrint has diverged from the library version - last time I looked it wasn't obvious that we could replace it. Regarding Data.Map, I'd be interested in trying out AVL trees instead, to see if they have better performance, but then we'd have to import the code of course.
4. A more radical change would be introducing hierarchical modules. This could be just a matter of renaming the directories to start with an upper case character and changing the import declarations. This gives module names like "Typecheck.TcGadt". The tc is redundant here, so perhaps it should be renamed to "Typecheck.Gadt" or "Typecheck.GADT". Perhaps even better would be "GHC.Typecheck.GADT", this way some modules can be exposed as part of the GHC api.
Yes, I've wondered about doing this in the past. It does mean more typing, though: clearly 'import BasicTypes.Id' is more painful than just 'import Id' and doesn't add much, so IMO we'd need to put some thought into a structure that makes sense.
How do the GHC developers feel about this? Would such paches be gladly excepted? Or will they be directly forwarded to the bit bucket?
Within the constraints I've outlined above, cleanups are definitely welcome. Another worthwhile gardening-type activity is to go around elimianting warnings. Many modules have {-# OPTIONS -w #-} at the top, waiting for someone to go in and clean up all the warnings. They do catch real bugs, so this is not a waste of time by any means. In addition to this, things like - identifying and removing dead code - commoning up duplicate code fragments - improving test coverage - profiling and optimising hotspots are all useful tasks. Cheers, Simon