
Hi all, There has been some discussion recently about the base library. In particular, base is supposed to follow the PvP, which means that certain sorts of changes require increasing the version number. When the version number is increased, this makes work for lots of people, as any packages correctly using an upper bound on base's version number need to be updated. For modules like Data.List this makes sense, but base also contains a number of modules in the GHC.* hierarchy which, while exposed, are really internal, and much less stable than the "public" API. There is also another issue: Currently, it is not possible for a package to specify, in its dependencies, whether or not it uses these GHC.* modules. We therefore can't easily tell whether a package is sensitive to changes in those modules, or whether it can be used with hugs, nhc98, etc. We've come up with 3 possible ways forward. Comments, suggestions and criticisms welcomed! Option 1 -------- In order to solve the version number issue, we could simply state that "base follows the PvP, but only for shared module hierarchies". However, it would be impossible for packages which /do/ need GHC.* modules to give accurately versioned dependencies, and it wouldn't solve the other issue at all. Option 2 -------- Another possible solution would be to rename the base package to base-internals, and to make base only re-export the "public" modules. This would require either renaming each module M to Internal.M, or for every implementation to support something like GHC's PackageImports extension. Option 3 -------- The other alternative is to try to split base into two parts: The shared "public" modules, and the internal GHC.* modules. Then GHC would have ghc-base, hugs would have hugs-base, etc, and there would be a common base package built on top of the appropriate impl-base. To do this with minimal loss of code sharing is a large task, and hard to just do in a branch, as merging changes made in the base HEAD is a pain when the file the patch applies to has moved to another repository. Thus in the short term we would expect to give up a reasonable amount of code sharing, but we hope that once we have separate impl-base packages it will be easier to untangle their contents (as impl-base will be significantly smaller than base currently is, and because we can rearrange imports and code inside impl-base without having to worry about breaking other implementations), and then regain as much sharing as possible. I've had a look at what can be done, and the first cut looks like this: We start off with 143 modules in base, 89 of which are public and 54 of which are GHC.*. Afterwards, base has 53 modules, 2 of which are GHC.*: * GHC.Exts, which could be moved into ghc-base, but would take Data.String with it, or it could go into a ghc-exts package. As this is "more public" than the other GHC.* modules, this seems like a reasonable thing to do anyway * GHC.Desugar, which could be put into its own package if nothing else Here's the module graph, which looks quite sane: http://community.haskell.org/~igloo/base-small.png That leaves 90 modules in ghc-base, 52 of which are GHC.*, and 38 of which are in the portable namespace. So almost all GHC.* modules have moved, and more than half of the portable modules are in base. These 38 are the interesting ones: Control.Exception.Base Control.Monad Data.Bits Data.Char Data.Dynamic Data.Either Data.HashTable Data.Int Data.List Data.Maybe Data.Tuple Data.Typeable Data.Word Foreign Foreign.C Foreign.C.Error Foreign.C.String Foreign.C.Types Foreign.ForeignPtr Foreign.Marshal Foreign.Marshal.Alloc Foreign.Marshal.Array Foreign.Marshal.Error Foreign.Marshal.Pool Foreign.Marshal.Utils Foreign.Ptr Foreign.StablePtr Foreign.Storable Numeric System.IO.Error System.IO.Unsafe System.Posix.Internals System.Posix.Types Text.ParserCombinators.ReadP Text.ParserCombinators.ReadPrec Text.Read.Lex Text.Show Unsafe.Coerce Some of them are easy to move back into base, e.g. Foreign contains no code, and Numeric is only needed so that GHC.Ptr can use showHex for its Show instance. Some are more integral to the implementation of the rest of GHC.*. I've only tried this for amd64/Linux, so it's possible that dependencies on other platforms could cause additional problems. Here's the module graph, which is somewhat messier than base's: http://community.haskell.org/~igloo/ghc-base-small.png As with option 2, for each implementation, either impl-base needs to rename the public modules M to Impl.M, or it needs to implement something like GHC's PackageImports extension. Thanks Ian

On Thu, Jun 25, 2009 at 04:02:14PM +0100, Ian Lynagh wrote:
Option 1 --------
In order to solve the version number issue, we could simply state that "base follows the PvP, but only for shared module hierarchies". However, it would be impossible for packages which /do/ need GHC.* modules to give accurately versioned dependencies, and it wouldn't solve the other issue at all.
There's an existing policy of no change to the portable modules or GHC.Exts in GHC bugfix releases, and it seems it is the practice to increase the first component of base's version with each major GHC release. Perhaps formalizing that (which is consistent with the PVP) would be enough: - packages that use only the public API depend on base < 5 - packages that use the GHC internal modules depend on base < 4.x It does mean that packages need to be updated each year, if only to update their dependencies.

| There has been some discussion recently about the base library. In | particular, base is supposed to follow the PvP, which means that certain | sorts of changes require increasing the version number. When the version | number is increased, this makes work for lots of people, as any packages | correctly using an upper bound on base's version number need to be | updated. Just to add to Ian's message, I see two goals for the proposed split of the base package into two, say 'new-base' and 'ghc-base'. (Although 'new-base' will certainly be called 'base'; but it's easier to refer to it that way in this message.) 1. Stable interfaces new-base will expose a relatively stable API. So its version number will change relatively slowly, and packages that depend on it will continue to work wihtout change. (They will still need to be recompiled, but their package dependencies will be unchanged.) ghc-base exposes a much larger, and much less stable API. Some other packages need acces to these internals, and are prepared to change more frequently as ghc-base changes. 2. Shared code There is quite a bit of code in 'base' that is independent of any particular compiler. But it's all mixed up with GHC-specific or Hugs-specific code, with CPP-ery to keep them apart. It'd b cool if new-base was compiler independent, with all the compiler-dependent piecse in 'ghc-base'. Now, some observations: (a) Goal (1) is the Most Important Goal by far. It is the main reason we are proposing to split up the base package in the first place. Goal (2) is nice, but I don't think anyone has really been seriously inconvenienced by the mixed up code in current 'base'. (b) Goal (1) could be achieved by leaving ALL the code in ghc-base, and making new-base into a package that simply imports goop from ghc-base, and re-exports the stable API that new-base exposes. That would fully achieve (1) and not achieve (2) at all. (c) Achieving (2) is jolly hard. Fully achieving it is probably impossible. And it's fragile: there are big recursive loops -- I think one involves IOException, and just one dependency can completely screw up a proposed separation. I think we could waste a lot of effort into trying to tease the two apart with little gain. So I argue that we should focus on (1), and pick up as much of (2) as convenient. Concretely I propose: - That we split into two as proposed - That the baseline starting point is (b) above: everything in ghc-base. (Which suggests that is not a good name for the package.) - That we move as much stuff *as convenient* from ghc-base to new-base. Ian's message suggests that quite a lot can actually move, but *the details of what lives where does not matter much*. - That we de-emphasise the idea that the compiler-independent stuff goes in new-base and comiler-dependent stuff goes in ghc-base (or hugs-base). If it makes sense otherwise, it's fine for there to be compiler-specific stuff in new-base, and unavoidable (perhaps dragged in by transitive dependencies) to have compiler-independent stuff in ghc-base. All this is pretty much as Ian suggests. I'm just wanting to make sure we have the emphasis right. Simon

On 26/06/2009 08:23, Simon Peyton-Jones wrote:
| There has been some discussion recently about the base library. In | particular, base is supposed to follow the PvP, which means that certain | sorts of changes require increasing the version number. When the version | number is increased, this makes work for lots of people, as any packages | correctly using an upper bound on base's version number need to be | updated.
Just to add to Ian's message, I see two goals for the proposed split of the base package into two, say 'new-base' and 'ghc-base'. (Although 'new-base' will certainly be called 'base'; but it's easier to refer to it that way in this message.)
1. Stable interfaces
new-base will expose a relatively stable API. So its version number will change relatively slowly, and packages that depend on it will continue to work wihtout change. (They will still need to be recompiled, but their package dependencies will be unchanged.)
ghc-base exposes a much larger, and much less stable API. Some other packages need acces to these internals, and are prepared to change more frequently as ghc-base changes.
2. Shared code
There is quite a bit of code in 'base' that is independent of any particular compiler. But it's all mixed up with GHC-specific or Hugs-specific code, with CPP-ery to keep them apart. It'd b cool if new-base was compiler independent, with all the compiler-dependent piecse in 'ghc-base'.
Now, some observations:
(a) Goal (1) is the Most Important Goal by far. It is the main reason we are proposing to split up the base package in the first place. Goal (2) is nice, but I don't think anyone has really been seriously inconvenienced by the mixed up code in current 'base'.
(b) Goal (1) could be achieved by leaving ALL the code in ghc-base, and making new-base into a package that simply imports goop from ghc-base, and re-exports the stable API that new-base exposes. That would fully achieve (1) and not achieve (2) at all.
(c) Achieving (2) is jolly hard. Fully achieving it is probably impossible. And it's fragile: there are big recursive loops -- I think one involves IOException, and just one dependency can completely screw up a proposed separation. I think we could waste a lot of effort into trying to tease the two apart with little gain.
So I argue that we should focus on (1), and pick up as much of (2) as convenient. Concretely I propose:
- That we split into two as proposed
- That the baseline starting point is (b) above: everything in ghc-base. (Which suggests that is not a good name for the package.)
- That we move as much stuff *as convenient* from ghc-base to new-base. Ian's message suggests that quite a lot can actually move, but *the details of what lives where does not matter much*.
- That we de-emphasise the idea that the compiler-independent stuff goes in new-base and comiler-dependent stuff goes in ghc-base (or hugs-base). If it makes sense otherwise, it's fine for there to be compiler-specific stuff in new-base, and unavoidable (perhaps dragged in by transitive dependencies) to have compiler-independent stuff in ghc-base.
I fully agree that (2) is hard. Having thought about this a bit more, I'm tending towards moving *no code at all* into new-base, keeping it as a pure re-export of bits from base-internals only. It's much simpler: - we always know where the code is - there's no arbitrary split based on implementation details - we never have to move things just because the implementation details change I don't think moving just some of the modules is going to make life easier for anyone, and it'll make life more complicated for some of us. Cheers, Simon

Simon Peyton-Jones wrote:
(c) Achieving (2) is jolly hard. Fully achieving it is probably impossible. And it's fragile: there are big recursive loops -- I think one involves IOException, and just one dependency can completely screw up a proposed separation. I think we could waste a lot of effort into trying to tease the two apart with little gain.
since we can't have two packages "ghc-base" and "new-base" that depend on each other. (so that they each export their own APIs). The canonical solution is to mash them into the same package that else uses, but from which parts are exported by "ghc-base" and "new-base". Like "secret-base" contains all the code, some of which is re-exported by "ghc-base" and some by "new-base". The proposal to move everything into "ghc-base" is saying "secret-base"="ghc-base", which is an acceptable solution if you don't mind "ghc-base" wantonly exporting everything. Someone suggested a "ghc-exts" package that would export only the sanctioned GHC.* modules. Then we have equality mapping Isaacs-secret-base = ghc-base Isaacs-ghc-base = ghc-exts Anyway we're probably going to end up with one package that has all the implementation, unless someone demonstrates that keeping some code in new-base would actually ever increase code sharing anywhere. I'm suggesting, 1) that we should probably recommend that user-level packages not import "secret-base", on the theory that even ghc-exts has a more stable/meaningful API? Discuss. and (2), we need a better name for it than "secret-base" or "ghc-base", maybe something like "base-internals"? -Isaac

Hello Bart, Saturday, June 27, 2009, 12:06:06 PM, you wrote:
and (2), we need a better name for it than "secret-base" or "ghc-base", maybe something like "base-internals"?
it's ghc-base, really -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

On Sat, Jun 27, 2009 at 12:29:54PM +0400, Bulat Ziganshin wrote:
Hello Bart,
Saturday, June 27, 2009, 12:06:06 PM, you wrote:
and (2), we need a better name for it than "secret-base" or "ghc-base", maybe something like "base-internals"?
it's ghc-base, really
If it contains all of what is currently in base then it will continue to be shared by all the implementations. Thanks Ian

Hello Ian, Saturday, June 27, 2009, 3:49:18 PM, you wrote:
it's ghc-base, really
If it contains all of what is currently in base then it will continue to be shared by all the implementations.
other implementations share only part of the code (not including ghc/*) so they have hugs-base, nhc-base and so on. of course, it's just a cosmetic remark -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Ian Lynagh schrieb:
Option 1 --------
In order to solve the version number issue, we could simply state that "base follows the PvP, but only for shared module hierarchies". However, it would be impossible for packages which /do/ need GHC.* modules to give accurately versioned dependencies, and it wouldn't solve the other issue at all.
Option 2 --------
Another possible solution would be to rename the base package to base-internals, and to make base only re-export the "public" modules. This would require either renaming each module M to Internal.M, or for every implementation to support something like GHC's PackageImports extension.
Option 3 --------
The other alternative is to try to split base into two parts: The shared "public" modules, and the internal GHC.* modules. Then GHC would have ghc-base, hugs would have hugs-base, etc, and there would be a common base package built on top of the appropriate impl-base.
This sounds most sensible to me. I would also like to see System.IO.Unsafe in a separate package. This would simplify running of untrusted code.

On 27/06/2009 02:07, Henning Thielemann wrote:
Ian Lynagh schrieb:
Option 1 --------
In order to solve the version number issue, we could simply state that "base follows the PvP, but only for shared module hierarchies". However, it would be impossible for packages which /do/ need GHC.* modules to give accurately versioned dependencies, and it wouldn't solve the other issue at all.
Option 2 --------
Another possible solution would be to rename the base package to base-internals, and to make base only re-export the "public" modules. This would require either renaming each module M to Internal.M, or for every implementation to support something like GHC's PackageImports extension.
Option 3 --------
The other alternative is to try to split base into two parts: The shared "public" modules, and the internal GHC.* modules. Then GHC would have ghc-base, hugs would have hugs-base, etc, and there would be a common base package built on top of the appropriate impl-base.
This sounds most sensible to me. I would also like to see System.IO.Unsafe in a separate package. This would simplify running of untrusted code.
I think the point that hasn't been made clearly so far is this: it's simply not possible to get a clean split between the compiler-specific portions and the portable portions of the base package. Imagine trying to do this. Firstly, you have to establish a clear API boundary between the portable code and the compiler-specific implementations of various primitives. For instance, the definition of the Prelude in the Haskell 98 report refers to things like "primPlusInt". Now, move the implementations of the compiler-specific primitives into their own package, leaving the portable code behind. In theory this sounds fine, but in practice the compiler-specific code wants to depend on portable bits. A basic example is list operations like 'map'; almost certainly you'll want this in the compiler-specific package. There are many more; for example the FFI is needed pretty early on, but most of the FFI libraries are portable. When you start separating dependencies it all gets pretty tedious, and what's more you're never sure whether the dependencies might change in the future, requiring yet more code to move across the boundary in one direction or another. And you'll invariably end up with a bunch of portable code on the wrong side of the boundary. There's no good way to do it. I argue that separating the implementations like this (a) is a time sink and (b) makes the code harder, not easier, to work with. What's really important is that we can expose a portable API. The implementation can stay all in one package where it is convenient, letting us interleave portable and non-portable code in the dependency graph arbitrarily. Obviously we should continue to make every attempt to clearly separate portable from non-portable code using the module system, within the base package. So, unless it isn't clear, I claim option (2) is the way forward. Now, there are a couple of problems with it: - firstly, compilers other than GHC don't support re-exporting modules from packages - Haddock doesn't support it properly either, unless Isaac Dupree's SoC project comes to fruition in time for the release. This would be a show-stopper. Cheers, Simon

Simon Marlow wrote:
- Haddock doesn't support it properly either, unless Isaac Dupree's SoC project comes to fruition in time for the release. This would be a show-stopper.
[Haddock being able to document exported identifiers that are originally defined in a different package, that is.] What exactly is the timing here? Good timing would be to get any needed changes to the GHC API into GHC in time for 6.12, is that right? Are there any particular release-cycle deadlines for that (just curious. I'm hard at work on it so I'm hoping it'll be done sometime soon :-)) -Isaac

On 29/06/2009 16:07, Isaac Dupree wrote:
Simon Marlow wrote:
- Haddock doesn't support it properly either, unless Isaac Dupree's SoC project comes to fruition in time for the release. This would be a show-stopper.
[Haddock being able to document exported identifiers that are originally defined in a different package, that is.] What exactly is the timing here? Good timing would be to get any needed changes to the GHC API into GHC in time for 6.12, is that right? Are there any particular release-cycle deadlines for that (just curious. I'm hard at work on it so I'm hoping it'll be done sometime soon :-))
We'd need the changes in GHC *and* Haddock in time for GHC 6.12.1, since the two are shipped together, and we need Haddock working to build the 6.12.1 library docs. I think it's doable, though. Cheers, Simon

Simon Marlow
6.12 ... Are there any particular release-cycle deadlines for that
We'd need the changes in GHC *and* Haddock in time for GHC 6.12.1, since the two are shipped together, and we need Haddock working to build the 6.12.1 library docs. I think it's doable, though.
GHC usually tries to ship a major new release every year, around ICFP time. The conference is rather earlier this yes than most years, so is the ghc release schedule for 6.12.1 time-advanced as well? Regards, Malcolm

On 30/06/2009 09:38, Malcolm Wallace wrote:
Simon Marlow
wrote: 6.12 ... Are there any particular release-cycle deadlines for that We'd need the changes in GHC *and* Haddock in time for GHC 6.12.1, since the two are shipped together, and we need Haddock working to build the 6.12.1 library docs. I think it's doable, though.
GHC usually tries to ship a major new release every year, around ICFP time. The conference is rather earlier this yes than most years, so is the ghc release schedule for 6.12.1 time-advanced as well?
We haven't fixed a date, but I imagine we'll try to release in September, October at the latest. Cheers, Simon
participants (9)
-
Bart Massey
-
Bulat Ziganshin
-
Henning Thielemann
-
Ian Lynagh
-
Isaac Dupree
-
Malcolm Wallace
-
Ross Paterson
-
Simon Marlow
-
Simon Peyton-Jones