Proposal: Slim base-5 API package

Hi, The problem =========== currently, base is a big beast that mixes a lot of different aspects, from really basic stuff like Data.List to quite specific system libraries like System.Console.GetOpt to gory GHC details such as GHC.Conc.Signal. There are various issues with this: * No implementation of a module included in base is able to use stuff from other libraries, like containers. There are even copies of container code in base. * Changes to the API of obscure modules like GHC.Conc.Signal require version bumps in base, which cause lots of depending packages to upload new version just to change their dependency. * The large expose surface of base makes refactoring like the actual split of the base implementation harder. * Compilers like haste or fay have a hard time providing a proper base, as many of the modules do not make sense in that setting. The suggestion ============== So I’d like to suggest that we turn the base package into a pure API package. This means that there will be no code in the package at all, only re-exports from other packages. The current code can then go into a base-ghc-impl package (or even many packages). Furthermore, the base package is stripped down to a sensible base API. Most, if not all, of the GHC package should go. There are other rather specific modules (Text.Show.Functions, Text.Read.Lex) where I am unsure whether they are useful in base. For such modules we will see if this is a good point to put them in a separate package. Also the module contents can be cleaned up at this point. The result should become base-5, and will hopefully be a more stable base library. „Below“ base-5 we can then wildly split and rearrange stuff, without disturbing our beloved users too much. Code that really needs the removed modules will have to build-depend on the implementing package and hence say „I really need these and I’m willing to pay by having to bump my dependencies more often.“ Way forward =========== If this is seen as mostly desirable here and by the maintainers, i.e. the committee that can become active ;-), the next step would be to identify a good base API. For that, we should study representative code on hackage: What modules from base have been importet, which functions are used. Based on that data I hope it will be clear what a good base is. Does someone have a tool around that can help with that? I guess compiling much of hackage with -ddump-minimal-imports and postprocessing that should work. If base-ghc-impl and base import the same module names (which would make sense), currently a library cannot depend on both libraries. This would be bad, as importing just one module from base-ghc-impl, and even only to test stuff, would require changing all imports (or use package imports). This can be avoided if we can add a feature GHC: Module re-exports. Similar to how GHC does not complain if the same definition is imported via different modules, it should not complain if the same Module is imported via different packages. A package re-export can either be an empty module import and exporting exactly one module, or an entry in the package metadata – but that’s an implementation detail. I have no roadmap with times at hand; maybe someone more knowledgeable of GHC and platform release processes and timings can comment on that. Dangers ======= The whole exercise will only be useful if afterwards, a large fraction of packages can actually use base and just base, and do not have to import the implementing packages directly. Hence the imports study in advance. There are lots of proposals floating around that modify base in a non-trivial manner. For some of them it might make sense to include them when bumping base to 5, but the danger is that there are too many, some of them rather intrusive, that it would hold up this proposal to decide on all of them. Also note that this proposal, if it goes as planned, will _not_ require code changes to _most_ packages. Which is good. Thanks for your attention, and please let us know what you think, Joachim -- Joachim “nomeata” Breitner mail@joachim-breitner.de • http://www.joachim-breitner.de/ Jabber: nomeata@joachim-breitner.de • GPG-Key: 0x4743206C Debian Developer: nomeata@debian.org

On Mon, 15 Jul 2013, Joachim Breitner wrote:
The problem ===========
currently, base is a big beast that mixes a lot of different aspects, from really basic stuff like Data.List to quite specific system libraries like System.Console.GetOpt to gory GHC details such as GHC.Conc.Signal. There are various issues with this:
* No implementation of a module included in base is able to use stuff from other libraries, like containers. There are even copies of container code in base. * Changes to the API of obscure modules like GHC.Conc.Signal require version bumps in base, which cause lots of depending packages to upload new version just to change their dependency. * The large expose surface of base makes refactoring like the actual split of the base implementation harder. * Compilers like haste or fay have a hard time providing a proper base, as many of the modules do not make sense in that setting.
How is your proposal related to your SplitBase effort: http://ghc.haskell.org/trac/ghc/wiki/SplitBase ?

It pretty much is the evolution of the SplitBase API effort. One of the main reasons why Joachim joined the core libraries committee was to work on refining split base into an implementable specification. -Edward On Mon, Jul 15, 2013 at 12:22 PM, Henning Thielemann < lemming@henning-thielemann.de> wrote:
On Mon, 15 Jul 2013, Joachim Breitner wrote:
The problem
===========
currently, base is a big beast that mixes a lot of different aspects, from really basic stuff like Data.List to quite specific system libraries like System.Console.GetOpt to gory GHC details such as GHC.Conc.Signal. There are various issues with this:
* No implementation of a module included in base is able to use stuff from other libraries, like containers. There are even copies of container code in base. * Changes to the API of obscure modules like GHC.Conc.Signal require version bumps in base, which cause lots of depending packages to upload new version just to change their dependency. * The large expose surface of base makes refactoring like the actual split of the base implementation harder. * Compilers like haste or fay have a hard time providing a proper base, as many of the modules do not make sense in that setting.
How is your proposal related to your SplitBase effort: http://ghc.haskell.org/trac/**ghc/wiki/SplitBasehttp://ghc.haskell.org/trac/ghc/wiki/SplitBase ?
______________________________**_________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/**mailman/listinfo/librarieshttp://www.haskell.org/mailman/listinfo/libraries

On Mon, 15 Jul 2013, Edward Kmett wrote:
It pretty much is the evolution of the SplitBase API effort.
One of the main reasons why Joachim joined the core libraries committee was to work on refining split base into an implementable specification.
I very like the splitting of the base package. I think this will also help other compilers to join the Cabal game. Currently it is hardly possible to compile an arbitrary package from Hackage with a compiler other than GHC, because every package depends on 'base'. If you want to make your package available for more compilers you have to provide a different Build-Depends list for every compiler you want to support. It would be cool to have packages that are warrantedly free of IO, in order to run them in environments where no IO is allowed (e.g. a web-server running submitted Haskell code). (The only objection I have, is to call a package without IO "pure-base", because IO is also purely functional in Haskell.)

Hi, Am Montag, den 15.07.2013, 18:22 +0200 schrieb Henning Thielemann:
How is your proposal related to your SplitBase effort: http://ghc.haskell.org/trac/ghc/wiki/SplitBase
I wouldn’t call it “my” effort; I was just joining an existing discussion and tried to add more concrete data points to it. The relation is that a base-5 api package would help us achieve Goal 1 (fewer version bumps), Goal 4, Goal 5, and help a bit with Goal 3 (as defined on the wiki page). It does not directly help with Goal 2, but it set a precedent and maybe make a, say, base-pure API package which re-exports only non-IO modules more likely to be used. Also the mentioned extension to the GHC package system (module re-exports) helps in that way. Or put differently: Instead of deciding between the two approaches listes on the wiki page, we use approach (A) to make (B) possible. Am Montag, den 15.07.2013, 19:07 +0200 schrieb Henning Thielemann: If you want to make your package
available for more compilers you have to provide a different Build-Depends list for every compiler you want to support. It would be cool to have packages that are warrantedly free of IO, in order to run them in environments where no IO is allowed (e.g. a web-server running submitted Haskell code). (The only objection I have, is to call a package without IO "pure-base", because IO is also purely functional in Haskell.)
precisely. The base package that this proposal talks about does not do that, but it shows the way, and the introduction for a base-“pure” would then be easily possible and non-intrusive (besides the question of where the Prelude lives, but lets keep that discussion separate). Do you have a suggestion for a better name than pure-base? pure-unreal-world, as there is no notion of the RealWorld# state token ;-)? Greetings, Joachim -- Joachim “nomeata” Breitner mail@joachim-breitner.de • http://www.joachim-breitner.de/ Jabber: nomeata@joachim-breitner.de • GPG-Key: 0x4743206C Debian Developer: nomeata@debian.org

On Mon, 15 Jul 2013, Joachim Breitner wrote:
Do you have a suggestion for a better name than pure-base? pure-unreal-world, as there is no notion of the RealWorld# state token ;-)?
You know, finding identifiers is one of the most difficult problems in computer science. The first names that come to my mind are all negating, like base-non-io, base-non-monadic (but I think Control.Monad could be part of the package). I'd prefer a positive name, since I think non-IO code should be the normal case in Haskell and I/O the exception. base-basic, base-simple, base-plain, base-core, base-tame, base-gentle (won't launch dangerous things) - I don't know.

On 2013-07-15 09:56, Joachim Breitner wrote: [--snip--]
The suggestion ==============
So I’d like to suggest that we turn the base package into a pure API package. This means that there will be no code in the package at all, only re-exports from other packages. The current code can then go into a base-ghc-impl package (or even many packages).
A question related to this: Is it possible for such an "API package" to actually explicitly specify type signatures of its re-exports, perhaps using abstract types where no particular implementation would be dictated? It seems to me that this would be highly valuable to provide a) documentation which could be shared across all implementations, b) a contract such that a given implementation can tell whether it really is conforming to expected type signatures, and c) a contract such that *users* of base can be certain that they're not unwittingly relying on implementation details in a given implementation of base. Without such checks I fear that things may end up in a bit of a mess type-wise if there *were* actually to be multiple implementations at some point. (Though it would certainly improve on the curren situtation). Overall, the proposal sounds like a great idea to start to get a handle on the mess that is base ;). Regards,

On Mon, 15 Jul 2013, Bardur Arantsson wrote:
A question related to this:
Is it possible for such an "API package" to actually explicitly specify type signatures of its re-exports, perhaps using abstract types where no particular implementation would be dictated?
I think this can be achieved by writing things like: module Data.List (map) where import qualified Base.List as List map :: (a -> b) -> [a] -> [b] map = List.map However it means, that the implementing package must use other module names than 'base'. And you lose the Haddock comment. But maybe the Haddock comment should be attached to Data.List anyway, instead of GHC.List? I don't know.

Hi, Am Montag, den 15.07.2013, 19:36 +0200 schrieb Bardur Arantsson:
On 2013-07-15 09:56, Joachim Breitner wrote: A question related to this:
Is it possible for such an "API package" to actually explicitly specify type signatures of its re-exports, perhaps using abstract types where no particular implementation would be dictated?
It seems to me that this would be highly valuable to provide a) documentation which could be shared across all implementations, b) a contract such that a given implementation can tell whether it really is conforming to expected type signatures, and c) a contract such that *users* of base can be certain that they're not unwittingly relying on implementation details in a given implementation of base.
I don’t think so; this would be a very different module system then. What you can do is this: #ifdef API module Foo where -- | Documentation for Bar data Bar -- abstract -- | Documentation for f f :: Int -> Bar f = undefined #else module Foo (Foo) where import "impl-package" Foo #fi so you can compile it with -DAPI to get the specification and the documentation, and compile it without to get the real thing, and then have tool support to compare the two(b). Also, with -DAPI you can get the shared documentation (a). c) can already be achieved now by just exporting the relevant details: Via base, you only can import the Foo type abstractly; if you want to get access to its constructors, you have to import base-impl.
Without such checks I fear that things may end up in a bit of a mess type-wise if there *were* actually to be multiple implementations at some point. (Though it would certainly improve on the curren situtation).
I don’t think the plan is to have different implementations of stuff in base at the same time. If different compilers (fay) replace the base-implementing packages, then there is another implementation, but that would never interfere with a definition on GHC. Or what scenario do you have in mind?
Overall, the proposal sounds like a great idea to start to get a handle on the mess that is base ;).
Thanks, Joachim -- Joachim “nomeata” Breitner mail@joachim-breitner.de • http://www.joachim-breitner.de/ Jabber: nomeata@joachim-breitner.de • GPG-Key: 0x4743206C Debian Developer: nomeata@debian.org

On Tue, 16 Jul 2013, Joachim Breitner wrote:
I don’t think the plan is to have different implementations of stuff in base at the same time. If different compilers (fay) replace the base-implementing packages, then there is another implementation, but that would never interfere with a definition on GHC. Or what scenario do you have in mind?
I think his wish was to ensure that e.g. 'map' has the same type in GHC, fay, JHC and others, although it might be implemented differently (with foldr/build, or stream-fusion, or whatever).

On 2013-07-16 23:45, Henning Thielemann wrote:
On Tue, 16 Jul 2013, Joachim Breitner wrote:
I don’t think the plan is to have different implementations of stuff in base at the same time. If different compilers (fay) replace the base-implementing packages, then there is another implementation, but that would never interfere with a definition on GHC. Or what scenario do you have in mind?
I think his wish was to ensure that e.g. 'map' has the same type in GHC, fay, JHC and others, although it might be implemented differently (with foldr/build, or stream-fusion, or whatever).
Exactly! (But perhaps I'm just a natural pessimist -- it may not matter much in practice.) Regards,

On Wed, 17 Jul 2013, Bardur Arantsson wrote:
On 2013-07-16 23:45, Henning Thielemann wrote:
I think his wish was to ensure that e.g. 'map' has the same type in GHC, fay, JHC and others, although it might be implemented differently (with foldr/build, or stream-fusion, or whatever).
Exactly!
(But perhaps I'm just a natural pessimist -- it may not matter much in practice.)
I guess that it can happen easily, that a function type has different class constraints based on the implementation. A type enforced by the 'base' package would avoid that people rely on weak constraints of a certain implementation.

Hi, Am Mittwoch, den 17.07.2013, 09:46 +0200 schrieb Henning Thielemann:
On Wed, 17 Jul 2013, Bardur Arantsson wrote:
On 2013-07-16 23:45, Henning Thielemann wrote:
I think his wish was to ensure that e.g. 'map' has the same type in GHC, fay, JHC and others, although it might be implemented differently (with foldr/build, or stream-fusion, or whatever).
Exactly!
(But perhaps I'm just a natural pessimist -- it may not matter much in practice.)
I guess that it can happen easily, that a function type has different class constraints based on the implementation. A type enforced by the 'base' package would avoid that people rely on weak constraints of a certain implementation.
there won’t be many base re-implementations, and these will be carefully crafted. I’d leave it to those people to pay attention to these details, without over-engineering a technical solution. It should be easily to whack up a script that extracts the API of a package (e.g. a list of modules and their contents); just run in on both implementations and diff them. In fact, I think I saw someone create such a tool already; I just don’t remember who or where. If you want to make it more formal, we can put this file somewhere as a „specification“. Greetings, Joachim -- Joachim “nomeata” Breitner mail@joachim-breitner.de • http://www.joachim-breitner.de/ Jabber: nomeata@joachim-breitner.de • GPG-Key: 0x4743206C Debian Developer: nomeata@debian.org

On Wed, 17 Jul 2013, Joachim Breitner wrote:
there won’t be many base re-implementations, and these will be carefully crafted. I’d leave it to those people to pay attention to these details, without over-engineering a technical solution.
It should be easily to whack up a script that extracts the API of a package (e.g. a list of modules and their contents); just run in on both implementations and diff them. In fact, I think I saw someone create such a tool already; I just don’t remember who or where.
If you want to make it more formal, we can put this file somewhere as a „specification“.
Even if the wanted types won't be enforced by 'base' and there would not be a tool to check correct types, we could simply have a package 'base-check' containing the wanted types and assignments to compiler dependent implementations. If you can compile that package with your compiler then the types of the implementation are fine.

Hi, Am Montag, den 15.07.2013, 09:56 +0200 schrieb Joachim Breitner:
Furthermore, the base package is stripped down to a sensible base API. Most, if not all, of the GHC package should go. There are other rather specific modules (Text.Show.Functions, Text.Read.Lex) where I am unsure whether they are useful in base. For such modules we will see if this is a good point to put them in a separate package.
I went ahead and analyzed the import statements from 603 Haskell packages (I picked those from Debian including the patches¹ because I know that they all compile with each other) and built the following report: http://darcs.nomeata.de/import-analyzer/report-debian-2013-07-17.html Nice example for Zipf's law. Unfortunately, -ddump-minial-imports does not generate an import statement for Prelude, so no information about the Prelude symbols is present. The „Anything“ pseudo-module just shows the number of packages analyzed. The GHC.* tree of modules is used by very view packages: 39 of 603; mostly for unboxed arithmetic. I think it would not be too hard to expect those packages to import a separate package from base for that (maybe ghc-exts), and make the base-5 package API muss less GHC-specific. Many modules in base are used by very few packages. Some because they are relatively new (Data.Tuple), but others can probably be demoted from their base status without too much disturbance (Data.HashTable, System.Mem.StableName, Text.Read.Lex). Larger cleanups (such as creating a base-without-missiles) should happen in separate packages, so that those who do not care can just continue to use base as before. Greetings, Joachim ¹ See http://anonscm.debian.org/darcs/pkg-haskell/tools/all-packages/packages.txt for the list of packages in Debian and http://anonscm.debian.org/darcs/pkg-haskell/tools/all-packages/patches/ for our patches. -- Joachim “nomeata” Breitner mail@joachim-breitner.de • http://www.joachim-breitner.de/ Jabber: nomeata@joachim-breitner.de • GPG-Key: 0x4743206C Debian Developer: nomeata@debian.org
participants (4)
-
Bardur Arantsson
-
Edward Kmett
-
Henning Thielemann
-
Joachim Breitner