Proposal: reduce base from the top

This is an attempt to propose a set of changes that we could reasonably make in the GHC 6.8 timeframe, that would significantly reduce the size of base and give us more flexibility to independently develop packages. This is not an attempt to solve all the issues we have in one go, but a practical incremental step towards the goal. I'm not trying to make base compiler-independent for example; that's a worthy goal, but it's not clear (at least to me) how to get there yet. What I'm doing here is disentangling the dependencies from the top down. We started on this path before GHC 6.6 (see http://hackage.haskell.org/trac/ghc/ticket/710), this proposal will clarify some the details of the next stage. There may be dependencies I've missed, but as far as I'm aware all of the following is possible without any significant rewriting of code. These reoganisations will allow more reuse: for example, more libraries will be able to make direct use of the unix/Win32 packages rather than being limited to the primitives available in the base package, so cleanups will be possible after the reoganisation. There are several things I'm not completely sure about, and some naming issues to resolve, so please take a look and comment if you can. I'll make one request regarding comments: please don't suggest renaming modules at this point. There are various modules whose names I'm aware are contentious, but I think it would be a distraction to discuss those as part of this proposal; let's do it separately. Here goes: System.Time --> new package old-time (dep. on unix/Win32) System.Locale --> new package old-locale System.Posix.Signals --> unix (System.Cmd depends on it, but moves to new package process) System.Directory --> new package directory (dep. on filepath, unix/Win32) System.Directory.Internals goes away, use filepath instead Data.Array.* --> new package array (maybe; I'm slightly dubious here) (dep. on concurrent for Data.Array.Diff) Data.Generics.* --> generics (maybe; Data class is defined for everything and is derivable) Data.ByteString.* --> fps (dep. on base, generics, array) Control.Concurrent.*, System.Timeout --> new package concurrent (needed by Data.Unique, where to move it?) Control.Parallel.* --> new package parallel System.Process, System.Cmd --> new package process (dep. on concurrent) Control.Applicative Data.Foldable, Data.Traversable Data.Map, Data.IntMap, Data.Set, Data.IntSet Data.Sequence, Data.Tree Data.HashTable Data.Graph ---> new package collections? containers? or split further? (dep. on array, generics, concurrent) System.Console.GetOpt ---> new package getopt? consoleutils? Text.PrettyPrint.* ---> new package pretty System.Random ---> new package random (modify to use time, not old-time) Other modules we could move: Text.Printf, Data.Unique, Data.Monoid, System.CPUTime. Topological sort of core packages with dependencies --------------------------------------------------- base unix/Win32 (base) generics (base) concurrent (base) parallel (base) filepath (base) Cabal (base) readline (base) regex-base (base) regex-posix (base, regex-base) regex-compat (base, regex-base, regex-posix) parsec (base) stm (base) template-haskell (base) pretty (base) (could drop from core-packages) getopt (base) (could drop from core-packages) old-locale (base) old-time (base, unix/Win32) array (base, generics, concurrent) fps (base, generics, array) process (base, unix/Win32, concurrent) directory (base, unix/Win32, filepath) time (base, unix/Win32) random (base, unix/Win32, time) haskell98 (old-time, old-locale, random, directory, process) containers (base, array, generics, concurrent)

Hi Simon, On Tue, Apr 03, 2007 at 11:21:34AM +0100, Simon Marlow wrote:
Data.ByteString.* --> fps (dep. on base, generics, array)
I think this should be called bytestring; fps will just be a random string to newcomers to Haskell.
Control.Concurrent.*, System.Timeout --> new package concurrent (needed by Data.Unique, where to move it?)
At worst it can go in a "unique" package.
Control.Applicative Data.Foldable, Data.Traversable Data.Map, Data.IntMap, Data.Set, Data.IntSet Data.Sequence, Data.Tree Data.HashTable Data.Graph ---> new package collections? containers? or split further? (dep. on array, generics, concurrent)
I assume you give "containers" as an option because I used it in #710, but I think I prefer "collections" myself. I'm not sure if it should be plural or singular. Logically it should match "array", but somehow "array" and "containers" feels right.
System.Console.GetOpt ---> new package getopt? consoleutils?
It might be nice to leave "getopt" for a future C getopt binding. I haven't checked for any dependencies you've missed or anything, but I think the way to do this sort of thing is to agree on a split in principle and then see if anything breaks when you try it. Overall it looks good, and makes later, more exciting refactoring easier! Thanks Ian

Hello Ian, Tuesday, April 3, 2007, 3:42:31 PM, you wrote:
Data.ByteString.* --> fps (dep. on base, generics, array)
I think this should be called bytestring; fps will just be a random string to newcomers to Haskell.
+1
Control.Applicative Data.Foldable, Data.Traversable Data.Map, Data.IntMap, Data.Set, Data.IntSet Data.Sequence, Data.Tree Data.HashTable Data.Graph ---> new package collections? containers? or split further? (dep. on array, generics, concurrent)
I assume you give "containers" as an option because I used it in #710, but I think I prefer "collections" myself.
Collections is the name of another existing library ;)
System.Console.GetOpt ---> new package getopt? consoleutils?
thhis may be joined with filesystem-related stuff into one package. how about having Unix and Win32 for low-level OS-specific functions and then OS_Services on top of them? -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

On Tuesday 03 April 2007 13:42, Ian Lynagh wrote:
On Tue, Apr 03, 2007 at 11:21:34AM +0100, Simon Marlow wrote: [...]
System.Console.GetOpt ---> new package getopt? consoleutils?
It might be nice to leave "getopt" for a future C getopt binding.
Hmmm, System.Console.GetOpt is meant as a functionally 100% complete Haskell-like replacement for getopt(3). Is there anything missing? Anyway, using a package for a single module looks a little bit like overkill to me. But if we do that, "getopt" should be the natural name. Cheers, S.

Hello Simon, Tuesday, April 3, 2007, 2:21:34 PM, you wrote:
This is not an attempt to solve all the issues we have in one go, but a practical incremental step towards the goal. I'm not trying to make base compiler-independent for example; that's a worthy goal, but it's not clear (at least to me) how to get there yet.
i'm all for this plan. long way starts with first step about "making base compiler-independent". its *interface* is already compiler-independent. if you say about implementation, it seems rather obvious for me - split it into ghc-base package that includes GHC.* and modules on which GHC.* depend and new-base package which contains the rest. then move any "#ifdef GHC" code into from new-base into *hc-base. then, as time permits, we can start to move ghc-independent code from ghc-base into new-base. meantime, faking Base package may be established that just reexports ghc-base and new-base, so user will not depend on where we moved each particular module my Core library was actually an attempt to separate GHC.* into ghc-specific and ghc-independent part of course, it's entirely separate proposal, we can return to discussing it in some future, probably after your plan will be implemented -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Bulat Ziganshin wrote:
Hello Simon,
Tuesday, April 3, 2007, 2:21:34 PM, you wrote:
This is not an attempt to solve all the issues we have in one go, but a practical incremental step towards the goal. I'm not trying to make base compiler-independent for example; that's a worthy goal, but it's not clear (at least to me) how to get there yet.
i'm all for this plan. long way starts with first step
about "making base compiler-independent". its *interface* is already compiler-independent. if you say about implementation, it seems rather obvious for me - split it into ghc-base package that includes GHC.* and modules on which GHC.* depend and new-base package which contains the rest. then move any "#ifdef GHC" code into from new-base into *hc-base. then, as time permits, we can start to move ghc-independent code from ghc-base into new-base. meantime, faking Base package may be established that just reexports ghc-base and new-base, so user will not depend on where we moved each particular module
Apart from anything else, we don't actually have support for packages that re-expose modules from other packages. This is an important feature and will indeed make package refactoring easier - care to look into implementing it? :-) Cheers, Simon

On Tue, Apr 03, 2007 at 11:21:34AM +0100, Simon Marlow wrote:
This is an attempt to propose a set of changes that we could reasonably make in the GHC 6.8 timeframe, that would significantly reduce the size of base and give us more flexibility to independently develop packages. [...] Control.Applicative Data.Foldable, Data.Traversable Data.Map, Data.IntMap, Data.Set, Data.IntSet Data.Sequence, Data.Tree Data.HashTable Data.Graph ---> new package collections? containers? or split further? (dep. on array, generics, concurrent)
Data.HashTable (and thus Data.Array.*) is used in the implementation of Data.Typeable. It also differs from the others in being a mutable data structure. I imagine that without it this package wouldn't need to depend on array and concurrent. Data.Monoid could possibly go here too. Another possibility is to split the 4 class modules from the concrete data structures.

On Tue, Apr 03, 2007 at 02:36:06PM +0100, Ross Paterson wrote:
On Tue, Apr 03, 2007 at 11:21:34AM +0100, Simon Marlow wrote:
This is an attempt to propose a set of changes that we could reasonably make in the GHC 6.8 timeframe, that would significantly reduce the size of base and give us more flexibility to independently develop packages. [...] Control.Applicative Data.Foldable, Data.Traversable Data.Map, Data.IntMap, Data.Set, Data.IntSet Data.Sequence, Data.Tree Data.HashTable Data.Graph ---> new package collections? containers? or split further? (dep. on array, generics, concurrent)
Data.HashTable (and thus Data.Array.*) is used in the implementation of Data.Typeable. It also differs from the others in being a mutable data structure. I imagine that without it this package wouldn't need to depend on array and concurrent.
I hear that a lot - but as I see it, why does *Typeable* need to be in base? As long as we have a portable Unsafe.Coerce (exists now), and a portable equivalent to GHC.Prim.Any (should be trivial to add to both Hugs and YHC - names anyone?) Typeable/Dynamic can be a portable high-level library. My cursory grepping of the '*.*hs' files in base reveals only one non-deriving non-SYB use of Typeable/Dynamic/TypeRep: DynException. I suppose this highlights a major problem with the class/module system. If package A defines FooLike, and package B defines Foo, one needs to depend on the other in order for instances to exist. Assuming you don't want O(n^2) instance packages floating around. Random other eample: QuickCheck (Arbitrary) depends on most data types. Stefan

Stefan O'Rear wrote:
On Tue, Apr 03, 2007 at 02:36:06PM +0100, Ross Paterson wrote:
On Tue, Apr 03, 2007 at 11:21:34AM +0100, Simon Marlow wrote:
This is an attempt to propose a set of changes that we could reasonably make in the GHC 6.8 timeframe, that would significantly reduce the size of base and give us more flexibility to independently develop packages. [...] Control.Applicative Data.Foldable, Data.Traversable Data.Map, Data.IntMap, Data.Set, Data.IntSet Data.Sequence, Data.Tree Data.HashTable Data.Graph ---> new package collections? containers? or split further? (dep. on array, generics, concurrent) Data.HashTable (and thus Data.Array.*) is used in the implementation of Data.Typeable. It also differs from the others in being a mutable data structure. I imagine that without it this package wouldn't need to depend on array and concurrent.
I hear that a lot - but as I see it, why does *Typeable* need to be in base? As long as we have a portable Unsafe.Coerce (exists now), and a portable equivalent to GHC.Prim.Any (should be trivial to add to both Hugs and YHC - names anyone?) Typeable/Dynamic can be a portable high-level library.
My cursory grepping of the '*.*hs' files in base reveals only one non-deriving non-SYB use of Typeable/Dynamic/TypeRep: DynException.
Separating Typeable and Dynamic from base is certainly something we should try to do, but I think this belongs in the next stage. The DynException dependency might be removed if we switch to using extensible exceptions along the lines of my Haskell Workshop paper from last year. Cheers, Simon

Ross Paterson wrote:
On Tue, Apr 03, 2007 at 11:21:34AM +0100, Simon Marlow wrote:
This is an attempt to propose a set of changes that we could reasonably make in the GHC 6.8 timeframe, that would significantly reduce the size of base and give us more flexibility to independently develop packages. [...] Control.Applicative Data.Foldable, Data.Traversable Data.Map, Data.IntMap, Data.Set, Data.IntSet Data.Sequence, Data.Tree Data.HashTable Data.Graph ---> new package collections? containers? or split further? (dep. on array, generics, concurrent)
Data.HashTable (and thus Data.Array.*) is used in the implementation of Data.Typeable. It also differs from the others in being a mutable data structure. I imagine that without it this package wouldn't need to depend on array and concurrent.
Good point; I propose to leave Data.HashTable where it is for now. It doesn't depend on array, fortunately, becuase it uses the low-level IOArray, so this won't prevent the array package from being split.
Data.Monoid could possibly go here too. Another possibility is to split the 4 class modules from the concrete data structures.
If the 4 class modules were in a separate package, any suggestions for naming? Cheers, Simon

simonmarhaskell:
This is an attempt to propose a set of changes that we could reasonably make in the GHC 6.8 timeframe, that would significantly reduce the size of base and give us more flexibility to independently develop packages.
This is not an attempt to solve all the issues we have in one go, but a practical incremental step towards the goal. I'm not trying to make base compiler-independent for example; that's a worthy goal, but it's not clear (at least to me) how to get there yet. What I'm doing here is disentangling the dependencies from the top down. We started on this path before GHC 6.6 (see http://hackage.haskell.org/trac/ghc/ticket/710), this proposal will clarify some the details of the next stage.
There may be dependencies I've missed, but as far as I'm aware all of the following is possible without any significant rewriting of code. These reoganisations will allow more reuse: for example, more libraries will be able to make direct use of the unix/Win32 packages rather than being limited to the primitives available in the base package, so cleanups will be possible after the reoganisation. There are several things I'm not completely sure about, and some naming issues to resolve, so please take a look and comment if you can.
I'll make one request regarding comments: please don't suggest renaming modules at this point. There are various modules whose names I'm aware are contentious, but I think it would be a distraction to discuss those as part of this proposal; let's do it separately.
Here goes:
System.Time --> new package old-time (dep. on unix/Win32)
System.Locale --> new package old-locale
System.Posix.Signals --> unix (System.Cmd depends on it, but moves to new package process)
System.Directory --> new package directory (dep. on filepath, unix/Win32) System.Directory.Internals goes away, use filepath instead
Data.Array.* --> new package array (maybe; I'm slightly dubious here) (dep. on concurrent for Data.Array.Diff)
Data.Generics.* --> generics (maybe; Data class is defined for everything and is derivable)
Data.ByteString.* --> fps (dep. on base, generics, array)
Yep, but we might call it the 'bytestring' package as Ian suggests. I'm good with this.
Control.Concurrent.*, System.Timeout --> new package concurrent (needed by Data.Unique, where to move it?)
Control.Parallel.* --> new package parallel
System.Process, System.Cmd --> new package process (dep. on concurrent)
Control.Applicative Data.Foldable, Data.Traversable Data.Map, Data.IntMap, Data.Set, Data.IntSet Data.Sequence, Data.Tree Data.HashTable Data.Graph ---> new package collections? containers? or split further? (dep. on array, generics, concurrent)
System.Console.GetOpt ---> new package getopt? consoleutils?
Text.PrettyPrint.* ---> new package pretty
System.Random ---> new package random (modify to use time, not old-time)
Other modules we could move: Text.Printf, Data.Unique, Data.Monoid, System.CPUTime.
Topological sort of core packages with dependencies --------------------------------------------------- base unix/Win32 (base) generics (base) concurrent (base) parallel (base) filepath (base) Cabal (base) readline (base) regex-base (base) regex-posix (base, regex-base) regex-compat (base, regex-base, regex-posix) parsec (base) stm (base) template-haskell (base) pretty (base) (could drop from core-packages) getopt (base) (could drop from core-packages) old-locale (base) old-time (base, unix/Win32) array (base, generics, concurrent) fps (base, generics, array) process (base, unix/Win32, concurrent) directory (base, unix/Win32, filepath) time (base, unix/Win32) random (base, unix/Win32, time) haskell98 (old-time, old-locale, random, directory, process) containers (base, array, generics, concurrent) _______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

On 4/3/07, Simon Marlow
System.Directory --> new package directory (dep. on filepath, unix/Win32)
Why bother with having hierarchical module names if package names are not similarly qualified? It seems like collisions in the packaging name space are just as likely as in the module name space (especially with single-module packages like many that you proposed). There are multiple competing collection libraries available, and a name like "collections" or "containers" does not really help to differentiate this particular library from the others, or suggest its provenance. Considering that arrays are collections, and the "collections/containers" package depends on Data.Array such that significant modifications to Data.Array will probably cause modifications to the modules in "collections/containers," why not just merge Data.Array into the collections/containers package? Regards, Brian

On 4/3/07, Simon Marlow
Control.Applicative Data.Foldable, Data.Traversable Data.Map, Data.IntMap, Data.Set, Data.IntSet Data.Sequence, Data.Tree Data.HashTable Data.Graph ---> new package collections? containers? or split further? (dep. on array, generics, concurrent)
Applicative seems more related to Functor, Monad, and Arrow than to the collections. Even its haddock page mentions parsers before Traversable. I think Control.Applicative should stay in base or follow Control.Arrow if that moves somewhere else. -- Namasté, Jeffrey Yasskin

Jeffrey Yasskin wrote:
On 4/3/07, Simon Marlow
wrote: Control.Applicative Data.Foldable, Data.Traversable Data.Map, Data.IntMap, Data.Set, Data.IntSet Data.Sequence, Data.Tree Data.HashTable Data.Graph ---> new package collections? containers? or split further? (dep. on array, generics, concurrent)
Applicative seems more related to Functor, Monad, and Arrow than to the collections. Even its haddock page mentions parsers before Traversable. I think Control.Applicative should stay in base or follow Control.Arrow if that moves somewhere else.
Agreed, and I think Foldable and Traversable too. Also, Monad should one day be a subclass of Applicative, IMO: http://hackage.haskell.org/trac/haskell-prime/ticket/113. The rest are collections or containers. -- Ashley Yakeley

On 4/5/07, Ashley Yakeley
Agreed, and I think Foldable and Traversable too.
Anything that's Foldable or Traversable is kind of inherently a collection, so I'd support putting them in the collections package. Or do you know of potential instances that aren't collections?
Also, Monad should one day be a subclass of Applicative, IMO: http://hackage.haskell.org/trac/haskell-prime/ticket/113.
I agree, but not in this change. ;) -- Namasté, Jeffrey Yasskin
participants (10)
-
Ashley Yakeley
-
Brian Smith
-
Bulat Ziganshin
-
dons@cse.unsw.edu.au
-
Ian Lynagh
-
Jeffrey Yasskin
-
Ross Paterson
-
Simon Marlow
-
Stefan O'Rear
-
Sven Panne