
On Thu, Feb 14, 2013 at 03:48:51PM +0100, Joachim Breitner wrote:
Yesterday, I experimented a bit with base’s code, first beginning with as few modules as possible and adding what’s required; then starting with the whole thing and trying to remove e.g. IO.
But clearly it is not easy: 1. Int requires throw DivideByZero; Monad requires error. That pulls in Exceptions. 2. Exceptions require Typeable. 3. Typeable is implemented with GHC.Fingerprint. 4. GHC.Fingerprint is implemented with Foreign and IO. 5. Foreign clearly needs Int and the like.
so it is not clear how to pull out a pure base without IO and Foreign. Are there any good tricks how to break these interdependencies?
We'll probably end up with a package (let's say ghc-bottom for now) right at the bottom of the hierarchy which contains a minimal set of definitions for Typeable, throw, etc, in modules like GHC.Typeable, GHC.IO (or, if it's small and the dependencies are circular, might be easier to have it all in a single module). It might be possible to merge this into ghc-prim later, but don't worry about that for now. These definitions would then be re-exported by Data.Typeable, Control.Exception etc higher up. Other libraries would be discouraged from depending on ghc-bottom, either socially or by needing to jump through extra hoops in a .cabal file fo allow it. However, this is the hard end to start from, because the modules right at the "bottom" depend on things much higher up, with cyclic imports. It'll be a lot easier to pull modules off the top instead, as nothing imports them. They also tend to be less magical (e.g. don't have wired-in names). Also, don't worry about necessarily pulling out /all/ the modules you want for a package all at once. Once you have smaller chunks it'll be easier to see what changes are necessary to move modules up/down. You may also need to leave e.g. some of IO lower down than the io package in a GHC.* module, and then to re-export it from io. Once the top has been pulled at as much as possible, it'll be a lot easier to see what's going on with the remainder, and work out what needs to be done to break the biggest loops.
There are other issues, some avoidable (such as the hard-coded base:Num constraint on literals); I collected a list on http://hackage.haskell.org/trac/ghc/wiki/SplitBase
Right, for things with built-in syntax you'll have to update GHC's wired-in names as you go. This will mostly just mean changing the modules, e.g. gHC_ENUM = mkBaseModule (fsLit "GHC.Enum") in compiler/prelude/PrelNames.lhs might become gHC_ENUM = mkGhcPureModule (fsLit "GHC.Enum") with mkGhcPureModule defined analogously to mkBaseModule, but occasionally you might want to move a definition to another module. If you get stuck, just yell.
Maybe the proper is to reverse the whole approach: Leave base as it is, and then build re-exporting smaller packages (e.g. a base-pure) on top of it. The advantage is: * No need to rewrite the tightly intertwined base. * Libraries still have the option to have tighter dependencies. * Base can evolve with lots of breaking changes, as long as they do not affect the API by the smaller packages. * Development of this collection can happen outside the GHC tree. * Alternative implementations for (some of) these packages can be created, if the reason why they could not be moved out of base is one of implementation, not of API
Disadvantages: * No-one would use the new packages unless they come with GHC; e.g. not a perfect analogy, but compare the number of rev-deps according to http://packdeps.haskellers.com/reverse of the various *prelude* packages vs base: 4831 base 6 basic-prelude 8 classy-prelude 4 classy-prelude-conduit 2 custom-prelude 1 general-prelude 1 modular-prelude 17 numeric-prelude 2 prelude-extras * If it comes with GHC, it would mean us permanently maintaining the two levels * base would still be an opaque blob, with too many modules and cyclic imports, which makes development tricky Thanks Ian