Revamping the module hierarchy (was: ANNOUNCE: OpenGLRaw 1.0.0.0)

All,
On Fri, Jun 12, 2009 at 11:24 AM, Sven Panne
One point here is debatable: Do we really need the ".Rendering" part in the package name or would simply "Graphics.OpenGL.Raw" be enough? We discussed the structure of the hierarchy when hierarchical packages were in their infancy, many years ago, and it was consensus then to distinguish between "Graphics.Rendering" and "Graphics.UI". I don't have very strong feelings about ".Rendering" and ".UI", and if the consensus nowadays is to remove it, I'll be happy to change this. But let's move the discussion about this to the libraries mailing list.
Perhaps it's time to overhaul the hierarchy. Some top level module namespaces like Network have become very crowded. Network is a very generic name that it conveys very little information today when most software has a network component. I suggest that parts of it be broken out into new top level modules. As a first step I suggest we create a new Http (and not HTTP with all caps please) module where we can have: Http.Client Http.Server Http.UrlEncoding Http.Cookies etc. Cheers, Johan

Am Freitag, 12. Juni 2009 22:46 schrieb Johan Tibell:
Perhaps it's time to overhaul the hierarchy. Some top level module namespaces like Network have become very crowded. Network is a very generic name that it conveys very little information today when most software has a network component. I suggest that parts of it be broken out into new top level modules.
As a first step I suggest we create a new Http (and not HTTP with all caps
Very good idea. I like to ask for renaming Graphics.UI to just UI. UIs are not always graphical and it’s not so important whether they are. please) module where we can have: […] *Please* name it HTTP, i.e., *with* all caps. HTTP is an acronym so every letter stands for a single word. Http doesn’t make sense in my opinion. Should Yampa and Grapefruit also go into a namespace Frp instead of FRP? I wouldn’t like this. At least, we should have a common rule for case of acronyms in identifiers. Best wishes, Wolfgang

On Fri, Jun 12, 2009 at 10:46:07PM +0200, Johan Tibell wrote:
Perhaps it's time to overhaul the hierarchy. Some top level module namespaces like Network have become very crowded. Network is a very generic name that it conveys very little information today when most software has a network component. I suggest that parts of it be broken out into new top level modules. As a first step I suggest we create a new Http (and not HTTP with all caps please) module where we can have:
Http.Client Http.Server Http.UrlEncoding Http.Cookies etc.
I don't follow the logic. If Network is crowded, doesn't that mean we should be aiming to subdivide it, e.g. moving Network.Http.* to Network.Protocol.Http.* (FSVO "Protocol"; could be "Tcp", or something else entirely)? If we move everything up to the root then the root will be even more crowded than Network is. Thanks Ian

Hi,
I agree with Johan that the name hierarchy should be changed. The
current approach has a number of drawbacks. In no specific order:
* Trying to use a single hierarchy to classify modules is inaccurate
because many module could logically belong in multiple locations. We
have many examples that demonstrate this in the current hierarchy:
Text is not Data; the HTTP protocol is under Network, but XML is under
Text even though both are text based protocols; URLs are under Network
(and so are neither Data nor Text), file operations are under
System.IO but Network operations are in their own name space. This is
not because the authors of the packages were not careful in selecting
the names. The problem is that for many module there isn't a single
name that describes its content.
* The current naming convention makes it harder to understand
programs (independent of overly long import names like
Network.Protocol.Http.Cookies, which could be just as well described
as Protocol.Network.Http.Cookies). The real problem with readability
is that looking at the imports of a module does not give any
indication of what package the modules come from, which makes it hard
to understand the dependencies of the module and, more pragmatically,
makes it hard to lookup documentation for the module contents.
* The current naming convention does not scale because each package
may introduce modules that are placed all over the name hierarchy.
For example, the utf-8 library redefines some IO operations so it has
modules under System.IO, it provides some ByteString support so it
also has modules under Data.ByteString, and finally it also deals with
text, so it has modules under Text.Codec. This is a problem because
it is hard for package writers to avoid name collisions, without
knowing the modules in all available packages.
I think that a better way to organize our programs is to prefix the
modules in a package with the package name. This will avoid the name
collision issue (or at least, greatly simplify it, because packages
that are uploaded to hackage need to have different names). It would
also make the dependencies of a module quite obvious. It would also
make our import lists much simpler. For example, we would write
"import HaXml" instead of import "Text.XML.HaXML", or "import
Parsec.Char" instead of "import Text.ParsingCombinators.Parsec.Char".
If classifying modules according to their purpose is necessary (and I
am not sure that it is, if we can do it at the package level), then we
could think of a more suitable mechanism to achieve that goal then the
hierarchical names.
-Iavor
On Tue, Jun 16, 2009 at 7:45 AM, Ian Lynagh
On Fri, Jun 12, 2009 at 10:46:07PM +0200, Johan Tibell wrote:
Perhaps it's time to overhaul the hierarchy. Some top level module namespaces like Network have become very crowded. Network is a very generic name that it conveys very little information today when most software has a network component. I suggest that parts of it be broken out into new top level modules. As a first step I suggest we create a new Http (and not HTTP with all caps please) module where we can have:
Http.Client Http.Server Http.UrlEncoding Http.Cookies etc.
I don't follow the logic. If Network is crowded, doesn't that mean we should be aiming to subdivide it, e.g. moving Network.Http.* to Network.Protocol.Http.* (FSVO "Protocol"; could be "Tcp", or something else entirely)?
If we move everything up to the root then the root will be even more crowded than Network is.
Thanks Ian
_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

Iavor Diatchki wrote:
I think that a better way to organize our programs is to prefix the modules in a package with the package name. This will avoid the name collision issue (or at least, greatly simplify it, because packages that are uploaded to hackage need to have different names). It would also make the dependencies of a module quite obvious. It would also make our import lists much simpler. For example, we would write "import HaXml" instead of import "Text.XML.HaXML", or "import Parsec.Char" instead of "import Text.ParsingCombinators.Parsec.Char". If classifying modules according to their purpose is necessary (and I am not sure that it is, if we can do it at the package level), then we could think of a more suitable mechanism to achieve that goal then the hierarchical names.
I disagree. One of the nice things about the current arrangement is that the package namespace is orthogonal to the module namespace. These two concepts really are orthogonal, so it's good to keep them that way. When they get conflated into one, you end up with Java's import mechanism which is a complete wreck. When you keep them orthogonal you can have some really nice package managers like Monticello for Squeak. I agree with Maurico that what we really need is to have the tools to be able to rearrange the tree at will. The Haskell language has no business dealing with the provenance of where modules come from--- and forcing modules to be named after their packages would make it do so. Currently, ghc-pkg (or whatever) handles the provenance of making sure that packages are visible to have their modules be loaded. As it stands, this provenance mechanism automatically roots all packages at the same place, but there's no reason it needs to. We just have to come up with the right DSL for scripting ghc-pkg (or equivalently, the right CLI) to be able to play around with the module namespace in a more intelligent way. For instance, let's assume we have: > ghc-pkg describe libfoo-0.0.0 ... exposed-modules: Data.Foo Control.Bar Control.Bar.Baz ... Now, if we say: ghc-pkg expose libfoo-0.0.0 Then any Haskell programs can now load the modules mentioned above, by the names mentioned above. If instead we said something like: ghc-pkg expose libfoo-0.0.0 at Zot Then Haskell programs would be able to load the modules by the names Zot.Data.Foo, Zot.Control.Bar, and Zot.Control.Bar.Baz instead. And if we wanted to rebase subtrees then we could say something like: ghc-pkg expose libfoo-0.0.0:Control.Bar as Quux Which would make the modules Quux and Quux.Baz available for loading, and would effectively hide libfoo-0.0.0:Data.Foo from being loadable. To implement this we need to update not only ghc-pkg, but also the Cabal format. Rather than just specifying which dependent packages must be exposed, we also need to specify *where* the package expects them to be exposed in the module namespace. Assuming this is implemented sanely, then all of the renaming for changing the root and for rebasing subtrees can be boiled out and undone during the linking phase (that is, when GHC is "linking" things to follow imports etc; not when ld is actually linking things). An import declaration is a reference to an actual compiled module, the name is just a proxy to know where to find it, the name doesn't have any meaning in itself. Since every package gets their own local view of the module namespace, every package can choose their own names for things. Moreover, since every package must specify their local view, if one wants to have some crazy jumbled view then the burden is on them to specify how to do it. Since every package exposes a view of its exposed module namespace, this serves as the default view. Since it takes work for people to rearrange things, there will still be a force to give things good names in the first place. Only we would no longer be stuck with bad decisions. -- Live well, ~wren

On Fri, Jun 19, 2009 at 6:06 AM, wren ng thornton < wren@community.haskell.org> wrote:
I agree with Maurico that what we really need is to have the tools to be able to rearrange the tree at will. The Haskell language has no business dealing with the provenance of where modules come from--- and forcing modules to be named after their packages would make it do so. Currently, ghc-pkg (or whatever) handles the provenance of making sure that packages are visible to have their modules be loaded. As it stands, this provenance mechanism automatically roots all packages at the same place, but there's no reason it needs to. We just have to come up with the right DSL for scripting ghc-pkg (or equivalently, the right CLI) to be able to play around with the module namespace in a more intelligent way.
For instance, let's assume we have:
ghc-pkg describe libfoo-0.0.0
... exposed-modules: Data.Foo Control.Bar Control.Bar.Baz ...
Now, if we say:
ghc-pkg expose libfoo-0.0.0
Then any Haskell programs can now load the modules mentioned above, by the names mentioned above. If instead we said something like:
ghc-pkg expose libfoo-0.0.0 at Zot
Then Haskell programs would be able to load the modules by the names Zot.Data.Foo, Zot.Control.Bar, and Zot.Control.Bar.Baz instead. And if we wanted to rebase subtrees then we could say something like:
ghc-pkg expose libfoo-0.0.0:Control.Bar as Quux
Which would make the modules Quux and Quux.Baz available for loading, and would effectively hide libfoo-0.0.0:Data.Foo from being loadable.
To implement this we need to update not only ghc-pkg, but also the Cabal format. Rather than just specifying which dependent packages must be exposed, we also need to specify *where* the package expects them to be exposed in the module namespace. Assuming this is implemented sanely, then all of the renaming for changing the root and for rebasing subtrees can be boiled out and undone during the linking phase (that is, when GHC is "linking" things to follow imports etc; not when ld is actually linking things). An import declaration is a reference to an actual compiled module, the name is just a proxy to know where to find it, the name doesn't have any meaning in itself.
Since every package gets their own local view of the module namespace, every package can choose their own names for things. Moreover, since every package must specify their local view, if one wants to have some crazy jumbled view then the burden is on them to specify how to do it. Since every package exposes a view of its exposed module namespace, this serves as the default view. Since it takes work for people to rearrange things, there will still be a force to give things good names in the first place. Only we would no longer be stuck with bad decisions.
+1 I really like this proposal. I agree that I much prefer the current orthogonality of modules provided to package names. It lets you refactor packages into several smaller chunks, and this would not even be possible under the other namespacing schemes I've seen bandied about without breaking other code. The biggest problem that I have with the current scheme is the inability to work with packages with conflicting namespaces (i.e. to support both the mtl and one of its competitors that overlap it). This quite elegantly works around that restriction. It is not perfect because there is still the corner case that you still can't work with conflicting instance declarations for types from the Prelude, but its a damn sight better than anything else I've seen put forward. -Edward Kmett

On 19/06/2009 14:44, Edward Kmett wrote:
On Fri, Jun 19, 2009 at 6:06 AM, wren ng thornton
mailto:wren@community.haskell.org> wrote: I agree with Maurico that what we really need is to have the tools to be able to rearrange the tree at will. The Haskell language has no business dealing with the provenance of where modules come from--- and forcing modules to be named after their packages would make it do so. Currently, ghc-pkg (or whatever) handles the provenance of making sure that packages are visible to have their modules be loaded. As it stands, this provenance mechanism automatically roots all packages at the same place, but there's no reason it needs to. We just have to come up with the right DSL for scripting ghc-pkg (or equivalently, the right CLI) to be able to play around with the module namespace in a more intelligent way. For instance, let's assume we have: > ghc-pkg describe libfoo-0.0.0 ... exposed-modules: Data.Foo Control.Bar Control.Bar.Baz ... Now, if we say: ghc-pkg expose libfoo-0.0.0 Then any Haskell programs can now load the modules mentioned above, by the names mentioned above. If instead we said something like: ghc-pkg expose libfoo-0.0.0 at Zot Then Haskell programs would be able to load the modules by the names Zot.Data.Foo, Zot.Control.Bar, and Zot.Control.Bar.Baz instead. And if we wanted to rebase subtrees then we could say something like: ghc-pkg expose libfoo-0.0.0:Control.Bar as Quux Which would make the modules Quux and Quux.Baz available for loading, and would effectively hide libfoo-0.0.0:Data.Foo from being loadable. To implement this we need to update not only ghc-pkg, but also the Cabal format. Rather than just specifying which dependent packages must be exposed, we also need to specify *where* the package expects them to be exposed in the module namespace. Assuming this is implemented sanely, then all of the renaming for changing the root and for rebasing subtrees can be boiled out and undone during the linking phase (that is, when GHC is "linking" things to follow imports etc; not when ld is actually linking things). An import declaration is a reference to an actual compiled module, the name is just a proxy to know where to find it, the name doesn't have any meaning in itself. Since every package gets their own local view of the module namespace, every package can choose their own names for things. Moreover, since every package must specify their local view, if one wants to have some crazy jumbled view then the burden is on them to specify how to do it. Since every package exposes a view of its exposed module namespace, this serves as the default view. Since it takes work for people to rearrange things, there will still be a force to give things good names in the first place. Only we would no longer be stuck with bad decisions.
+1
I really like this proposal.
I agree that I much prefer the current orthogonality of modules provided to package names. It lets you refactor packages into several smaller chunks, and this would not even be possible under the other namespacing schemes I've seen bandied about without breaking other code.
The biggest problem that I have with the current scheme is the inability to work with packages with conflicting namespaces (i.e. to support both the mtl and one of its competitors that overlap it). This quite elegantly works around that restriction.
There's a little-known extension in GHC called PackageImports that lets you do this: import "monads-tf" Control.Monad.State we use this to implement the base3-compat overlay. I'm not claiming this is something we want to advertise widely or start using to resolve conflicts, just pointing out its existence. wren's proposal above actually requires a good deal of effort to implement. It would decouple the compile-time namespace of module names from the actual module names used in the compiled package, and that is a deep change. However, having made that change, lots of things become possible. I should point out that there have been many proposals of this kind in the past (search for "grafting" and "mounting" in the mailing-list archives). To my mind the reason we haven't done anything like this so far is because there hasn't been a single proposal that stands out as being the right thing, and with good power-to-weight ratio. In the past it has been hard to predict what we actually *need* in the way of module namespace manipulation when we start scaling up to thousands of packages, but that is now changing, so it might well be time to think about this again. Cheers, Simon

wren ng thornton wrote:
One of the nice things about the current arrangement is that the package namespace is orthogonal to the module namespace. These two concepts really are orthogonal, so it's good to keep them that way. When they get conflated into one, you end up with Java's import mechanism which is a complete wreck.
Right.
I agree with Maurico that what we really need is to have the tools to be able to rearrange the tree at will. The Haskell language has no business dealing with the provenance of where modules come from--- and forcing modules to be named after their packages would make it do so. Currently, ghc-pkg (or whatever) handles the provenance of making sure that packages are visible to have their modules be loaded. As it stands, this provenance mechanism automatically roots all packages at the same place, but there's no reason it needs to. We just have to come up with the right DSL for scripting ghc-pkg (or equivalently, the right CLI) to be able to play around with the module namespace in a more intelligent way.
For instance, let's assume we have:
> ghc-pkg describe libfoo-0.0.0
... exposed-modules: Data.Foo Control.Bar Control.Bar.Baz ...
Now, if we say:
ghc-pkg expose libfoo-0.0.0
Then any Haskell programs can now load the modules mentioned above, by the names mentioned above. If instead we said something like:
ghc-pkg expose libfoo-0.0.0 at Zot
Then Haskell programs would be able to load the modules by the names Zot.Data.Foo, Zot.Control.Bar, and Zot.Control.Bar.Baz instead. And if we wanted to rebase subtrees then we could say something like:
ghc-pkg expose libfoo-0.0.0:Control.Bar as Quux
Which would make the modules Quux and Quux.Baz available for loading, and would effectively hide libfoo-0.0.0:Data.Foo from being loadable.
If I want to import a module I have to decide on /one/ module name. Since I cannot know at which point in the hierarchy users might have exposed modules from other packages, I must chose the default 'root' point. So, this will not help library authors who want to e.g. import the 'same' module from either mtl or transformers. IMO it makes much more sense to let client packages decide from where in the module hierarchy they want to import modules from another package, rather than forcing users to decide this globally per installation. Thus, grafting should not be done when exposing packages, but rather when actually using them. Your examples above would become ghc -package libfoo-0.0.0@Zot ... resp. ghc -package libfoo-0.0.0:Control.Bar@Quux This would also better play with the way cabal does things: cabal currently ignores hidden/expoosed status of packages; instead it hides everything and then explicitly 'imports' exact versions using the -package option. With a few tweaks to the cabal file syntax, we could easily declare package 'mount points' (even for subtrees) when declaring the dependent packages and this would be tranformed to the ghc command line syntax above. Cheers Ben

Ben Franksen wrote:
If I want to import a module I have to decide on /one/ module name. Since I cannot know at which point in the hierarchy users might have exposed modules from other packages, I must chose the default 'root' point. So, this will not help library authors who want to e.g. import the 'same' module from either mtl or transformers.
No, that misses a big point in the proposal. Yes, every compiled module must have only one "name", but that name does not need to be the same "name" that is used in Haskell code. This is what it means to separate provenance from reference. To make this more concrete, consider the installed package: libfoo.cabal: ... Build-depends: base (>= 3.0 && < 4.0) at Base exposed-modules: Data.Foo ... Data/Foo.hs: {-# LANGUAGE NoImplicitPrelude #-} package Data.Foo where import Base.Prelude ... And consider the client package we are compiling: libbar.cabal: ... Build-depends: base (>= 3.0 && < 4.0) at Elsewhere, libfoo at Foo exposed-modules: Control.Bar ... Control/Bar.hs: {-# LANGUAGE NoImplicitPrelude #-} package Control.Bar where import Elsewhere.Prelude import Foo.Data.Foo ... Still with me? Now, when we compiled base-3.5.0 we compiled the base-3.5.0:Prelude module. Once we've compiled it we need to give it some globally unique name so that we know to refer to exactly some byte-offset into some file located on some sector of some disk. What this name actually looks like is irrelevant. We could call the compiled module "base-3.5.0:Prelude" or we could call it "0xDEADBEEF". If we wanted to avoid name-/versionspace clashing up at the package layer, then we may prefer something like the latter; but for this discussion I'll stick with the former for simplicity. So before we compile libbar, we have the following compiled modules available: base-3.5.0:Prelude libfoo-0.0.0:Data.Foo The linking/reference process can be considered like a dialogue between the source code and the compiler (or between the compiler and the package-manager, if you prefer. The dialogue for compiling libbar-42:Control.Bar will look something like this: Code: set LANGUAGE NoImplicitPrelude GHC: okay. Code: call me libbar-42:Control.Bar GHC: righto, libbar-42:Control.Bar Code: I need something called Elsewhere.Prelude GHC: okay, just a sec. GHC: hey pkg! PKG: j0, wassup dawg GHC: I need something called Elsewhere.Prelude PKG: I have that at 0xDEADBEEF GHC: what? PKG: Oh, I mean I have that at base-3.5.0:Prelude GHC: thanks. /GHC memorizes Elsewhere.Prelude = base-3.5.0:Prelude GHC: hey libbar-42:Control.Bar, you still there? Code: yeah GHC: I found Elsewhere.Prelude Code: thanks ... Code: I need to get the type of Elsewhere.Prelude.curry GHC: okay, just a sec. GHC: hey pkg, what's the type of base-3.5.0:Prelude.curry ? PKG: base-3.5.0:Prelude.curry :: ((a, b) -> c) -> a -> b -> c GHC: hey libbar-42:Control.Bar, Elsewhere.Prelude.curry :: ((a, b) -> c) -> a -> b -> c ... /Code leaves #haskell /GHC forgets module mappings /GHC waits for Code to join #haskell Naturally GHC needs to be in on the joke and needs to be aware of both "names" for the same compiled module. But this is no different than what we already have. The module names used in Haskell code do not refer to the version of the module they need, and again they shouldn't have to. When the code asks for Prelude, it's up to GHC and PKG to determine which version of the Prelude should be linked to the code. The only thing that changes in this proposal is that PKG can have a more sophisticated way of mapping Haskell module names into compiled module object files.
IMO it makes much more sense to let client packages decide from where in the module hierarchy they want to import modules from another package, rather than forcing users to decide this globally per installation.
Right. For each package (or compilation unit), the user/client constructs a map from the compiled module object files to the Haskell module names. The namespace that each package sees is only a fabrication, because the Haskell module names are rewritten into compiled module object names in the Core code GHC produces. So every package can make up their own independent mapping.
Thus, grafting should not be done when exposing packages, but rather when actually using them. Your examples above would become
ghc -package libfoo-0.0.0@Zot ...
resp.
ghc -package libfoo-0.0.0:Control.Bar@Quux
This would also better play with the way cabal does things: cabal currently ignores hidden/expoosed status of packages; instead it hides everything and then explicitly 'imports' exact versions using the -package option. With a few tweaks to the cabal file syntax, we could easily declare package 'mount points' (even for subtrees) when declaring the dependent packages and this would be tranformed to the ghc command line syntax above.
Six of one... :) As far as the proposal goes, the only important bit is that the names that Code uses are different than the names GHC/PKG use. Whether the namespace mapping is done by ghc-pkg, ghc, ghci, or whatever doesn't really matter since they're all on the same side of the fence. At that point it's just delegation of responsibility. The reason I was singling out ghc-pkg as the PKG is because (so far as I know) that's its current purpose. When Cabal runs, it needs to sanitize the namespace mapping. It does this by first hiding all packages, and then exposing only the ones the *.cabal file indicates are necessary (apparently via flags to ghc rather than calls to ghc-pkg). Right now, all packages are exposed at the same root in the module namespace; the extension is just to say that packages (and subtrees of packages) can be exposed wherever we want. After Cabal is done, it restores whatever mapping was in place before it started sanitizing things. From what (little) I know of how Cabal works under the covers, it makes sense to me that the "exposure" step is the right place to do grafting. If the map from Haskell module names to the compiled modules in exposed packages is already separate from the exposure process, then of course grafting should be done wherever that mapping is kept. If the real purpose of ghc-pkg is to give a system-default module namespace for ghc/ghci when commandline flags are not set, then sure ghc/ghci will need new flags. Of course ghc-pkg will also need new flags since it too is constructing a module namespace. -- Live well, ~wren

I disagree. One of the nice things about the current arrangement is that the package namespace is orthogonal to the module namespace. These two concepts really are orthogonal, so it's good to keep them that way. When they get conflated into one, you end up with Java's import mechanism which is a complete wreck. When you keep them orthogonal you can have some really nice package managers like Monticello for Squeak.
Care to elaborate on that a little? I'm not familiar with either java imports or monticello, but I have a little experience with python, in which each package lives in its own namespace. So it seems reasonable to me to put each package in its own branch instead of merging the various trees exported by all the packages into one big tree. It sounds like what you're talking about would do that by requiring that each package is at a single root like Elsewhere. Python doesn't allow you to rename packages so you wind up with either version numbers embedded in the package name, or upgrades that upgrade everyone whether they want it or not. Is this the wreck you're talking about? I've always felt a little uncomfortable that a package may scatter modules around the hierarchy. I suppose it could help haddock browsing and discovery, but in practice the haddocks all get installed in their own directories, unless there's some haddock merger I don't know about...

Amen to Iavor's proposal!
Hierarchical decomposition leads to arbitrary and thus unguessable
decisions, because many such decompositions are possible. This problem
nearly always happens, as Clay Shirky illustrates at
http://www.shirky.com/writings/ontology_overrated.html . Iavor has given
some examples. Data vs Control provides some more. Another, as Wolfgang
hinted at, is UI vs Graphics. These two notions overlap, with neither being
more specific than the other.
Module hierarchy tries to give ontology and collision-avoidance. Ontology
is an failure as we've seen (and inevitably so, as Clay Shirky
demonstrates). Collision-avoidance has failed also, as Iavor pointed out,
since packages can easily have module name collisions (e.g., I had a
Data.Fun at one point). However, we already prohibit collisions of package
names, so we can get module uniqueness by using the package name as the
top-level portion of every module in a package. Beyond that requirement,
package implementors can use whatever organzation style they like.
- Conal
On Fri, Jun 19, 2009 at 12:08 AM, Iavor Diatchki
Hi, I agree with Johan that the name hierarchy should be changed. The current approach has a number of drawbacks. In no specific order:
* Trying to use a single hierarchy to classify modules is inaccurate because many module could logically belong in multiple locations. We have many examples that demonstrate this in the current hierarchy: Text is not Data; the HTTP protocol is under Network, but XML is under Text even though both are text based protocols; URLs are under Network (and so are neither Data nor Text), file operations are under System.IO but Network operations are in their own name space. This is not because the authors of the packages were not careful in selecting the names. The problem is that for many module there isn't a single name that describes its content.
* The current naming convention makes it harder to understand programs (independent of overly long import names like Network.Protocol.Http.Cookies, which could be just as well described as Protocol.Network.Http.Cookies). The real problem with readability is that looking at the imports of a module does not give any indication of what package the modules come from, which makes it hard to understand the dependencies of the module and, more pragmatically, makes it hard to lookup documentation for the module contents.
* The current naming convention does not scale because each package may introduce modules that are placed all over the name hierarchy. For example, the utf-8 library redefines some IO operations so it has modules under System.IO, it provides some ByteString support so it also has modules under Data.ByteString, and finally it also deals with text, so it has modules under Text.Codec. This is a problem because it is hard for package writers to avoid name collisions, without knowing the modules in all available packages.
I think that a better way to organize our programs is to prefix the modules in a package with the package name. This will avoid the name collision issue (or at least, greatly simplify it, because packages that are uploaded to hackage need to have different names). It would also make the dependencies of a module quite obvious. It would also make our import lists much simpler. For example, we would write "import HaXml" instead of import "Text.XML.HaXML", or "import Parsec.Char" instead of "import Text.ParsingCombinators.Parsec.Char". If classifying modules according to their purpose is necessary (and I am not sure that it is, if we can do it at the package level), then we could think of a more suitable mechanism to achieve that goal then the hierarchical names.
-Iavor
On Tue, Jun 16, 2009 at 7:45 AM, Ian Lynagh
wrote: On Fri, Jun 12, 2009 at 10:46:07PM +0200, Johan Tibell wrote:
Perhaps it's time to overhaul the hierarchy. Some top level module namespaces like Network have become very crowded. Network is a very
generic
name that it conveys very little information today when most software has a network component. I suggest that parts of it be broken out into new top level modules. As a first step I suggest we create a new Http (and not HTTP with all caps please) module where we can have:
Http.Client Http.Server Http.UrlEncoding Http.Cookies etc.
I don't follow the logic. If Network is crowded, doesn't that mean we should be aiming to subdivide it, e.g. moving Network.Http.* to Network.Protocol.Http.* (FSVO "Protocol"; could be "Tcp", or something else entirely)?
If we move everything up to the root then the root will be even more crowded than Network is.
Thanks Ian
_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries
_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

Conal Elliott wrote:
Amen to Iavor's proposal!
Hierarchical decomposition leads to arbitrary and thus unguessable decisions, because many such decompositions are possible. This problem nearly always happens, as Clay Shirky illustrates at http://www.shirky.com/writings/ontology_overrated.html . Iavor has given some examples. Data vs Control provides some more. Another, as Wolfgang hinted at, is UI vs Graphics. These two notions overlap, with neither being more specific than the other.
Module hierarchy tries to give ontology and collision-avoidance. Ontology is an failure as we've seen (and inevitably so, as Clay Shirky demonstrates). Collision-avoidance has failed also, as Iavor pointed out, since packages can easily have module name collisions (e.g., I had a Data.Fun at one point). However, we already prohibit collisions of package names, so we can get module uniqueness by using the package name as the top-level portion of every module in a package. Beyond that requirement, package implementors can use whatever organzation style they like.
I would like to amend this with the proposal I've been floating around. In particular, even though the package name/version should be the root of a global naming scheme, it should be considered orthogonal to how modules are named from within Haskell code. The grafting mechanism I've proposed offers one way of taking advantage of that orthogonality. Both the grafting proposal and the proposal of making the package name be the root will, ultimately, fall prey to the same problems of ontology/hierarchy (until we find a better way of naming modules within Haskell). However, the grafting proposal will delay the inevitable for longer, since it allows every compilation-unit to define their own private hierarchy which needs only suffice for their purposes (i.e. constructing an initial hierarchy for others to manipulate). To make things more concrete, it's the provenance issue. We don't want to encode package versions in the within-Haskell module names, for obvious reasons. Similarly, we don't want to encode the package names. Over time packages have a tendency to grow and split into multiple packages, and we don't want code that was valid at minus-epsilon from the change to break at plus-epsilon when no actual code has changed. And there are similar issues with merging packages or migrating modules from one package to another. -- Live well, ~wren

On 22/06/2009 00:21, Conal Elliott wrote:
Amen to Iavor's proposal!
Hierarchical decomposition leads to arbitrary and thus unguessable decisions, because many such decompositions are possible. This problem nearly always happens, as Clay Shirky illustrates at http://www.shirky.com/writings/ontology_overrated.html . Iavor has given some examples. Data vs Control provides some more. Another, as Wolfgang hinted at, is UI vs Graphics. These two notions overlap, with neither being more specific than the other.
Module hierarchy tries to give ontology and collision-avoidance. Ontology is an failure as we've seen (and inevitably so, as Clay Shirky demonstrates). Collision-avoidance has failed also, as Iavor pointed out, since packages can easily have module name collisions (e.g., I had a Data.Fun at one point). However, we already prohibit collisions of package names, so we can get module uniqueness by using the package name as the top-level portion of every module in a package. Beyond that requirement, package implementors can use whatever organzation style they like.
On the other hand, having module names be independent of package names means that - we can have multiple packages that implement the same API, making it easy to compile code against different implementations of an API - we can reorganise the contents of package without requiring any changes to source code, only .cabal files A package conflates lots of things - perhaps too many things. But one thing a package is useful for is to behave like an interface; a package is an API. I've been thinking that we should make it easier to construct packages that are nothing but views on other packages, i.e. a pure interface. Right now it's possible to do this (this is how we implement base3-compat), but it's not convenient. A view has the advantage that it can hide parts of the packages it depends on, and can hence change less frequently. So by depending on a view you get a more robust dependency. We're considering doing something like this for the base package in GHC: base-internals would be what we currently call the base package, and "base" would be a view on that that hides the GHC.* modules, and hence doesn't change its API as often. (there are various options here, I think Ian is going to outline the proposals soon). Cheers, Simon

Simon Marlow wrote:
A package conflates lots of things - perhaps too many things. But one thing a package is useful for is to behave like an interface; a package is an API.
I would say: A package has an API. ^^^ Just like an OCaml module has a module type, and different OCaml modules can have the same module type. The difference is that the OCaml module type can be defined and referred to independently from the OCaml module, while the Haskell package API is not a similarly independent entity. The Haskell module system still needs to catch up here, and perhaps there is a chance to achive something even better than in OCaml by doing it at the level of packages? Wolfram

kahl:
Simon Marlow wrote:
A package conflates lots of things - perhaps too many things. But one thing a package is useful for is to behave like an interface; a package is an API.
I would say: A package has an API. ^^^
Just like an OCaml module has a module type, and different OCaml modules can have the same module type.
The difference is that the OCaml module type can be defined and referred to independently from the OCaml module, while the Haskell package API is not a similarly independent entity.
The Haskell module system still needs to catch up here, and perhaps there is a chance to achive something even better than in OCaml by doing it at the level of packages?
Type checking of package interfaces would make Cabal's versioning constraint solver (how it picks what packages work) less ad hoc.... Great research problem, lots of upside if you solve it. Time to get the language research community interested in packages!! -- Don

Hello,
On Wed, Jun 24, 2009 at 3:47 AM, Simon Marlow
On 22/06/2009 00:21, Conal Elliott wrote:
Amen to Iavor's proposal! ...
On the other hand, having module names be independent of package names means that
Just to be clear, I was just proposing a _convention_ that we should use when it's applicable (which I think is in many cases) and not a policy of any sort. In particular, if two different packages are providing the exact same functionality, then it may make sense for them to provide the same modules.
- we can have multiple packages that implement the same API, making it easy to compile code against different implementations of an API
- we can reorganise the contents of package without requiring any changes to source code, only .cabal files
I agree that both of these are desirable in some situations (and they don't conflict with what I was proposing). In my experience, being able to replace one library with another without having to change the code is the exception rather then the norm, mostly because the whole point of using one library over another is that they are different somehow (licensing issues provide an exception to this rule, but I don't think that's the norm). For example, I might change my library to depend on the uu-parsing combinators rather then parsec. Both provide similar functionality, but I still need to change the code more then simply adjusting the imports.
A package conflates lots of things - perhaps too many things. But one thing a package is useful for is to behave like an interface; a package is an API. I've been thinking that we should make it easier to construct packages that are nothing but views on other packages, i.e. a pure interface. Right now it's possible to do this (this is how we implement base3-compat), but it's not convenient.
A view has the advantage that it can hide parts of the packages it depends on, and can hence change less frequently. So by depending on a view you get a more robust dependency. We're considering doing something like this for the base package in GHC: base-internals would be what we currently call the base package, and "base" would be a view on that that hides the GHC.* modules, and hence doesn't change its API as often. (there are various options here, I think Ian is going to outline the proposals soon).
I think of a package as convenient way to distribute a collection of modules that do something. As such, a package is likely to _have_ an API but I would not say that it _is_ an API. I think that the idea of "interface" packages is also useful and interesting, but I don't think that all packages need to be like that. For example, if we were to identify some kind of "standard" API for parser combinator libraries, then we could have packages that provided implementations of this API for different parsing libraries (e.g., parsec-common-api, uu-common-api, readp-common-api etc), these packages would mostly do renaming and small fixes to make the implementations match the interface. Then, it would be up to a programmer to decide if they want to write their code against the common API (to gain portability) or to use a specific library (because it provides some nifty feature). As Wolfram wrote, at the moment we have no (machine checked) way to provide an interface without an actual implementation but we should keep this distinction in mind when we design libraries. -Iavor

On Tue, Jun 16, 2009 at 4:45 PM, Ian Lynagh
On Fri, Jun 12, 2009 at 10:46:07PM +0200, Johan Tibell wrote:
Perhaps it's time to overhaul the hierarchy. Some top level module namespaces like Network have become very crowded. Network is a very
generic
name that it conveys very little information today when most software has a network component. I suggest that parts of it be broken out into new top level modules. As a first step I suggest we create a new Http (and not HTTP with all caps please) module where we can have:
Http.Client Http.Server Http.UrlEncoding Http.Cookies etc.
I don't follow the logic. If Network is crowded, doesn't that mean we should be aiming to subdivide it, e.g. moving Network.Http.* to Network.Protocol.Http.* (FSVO "Protocol"; could be "Tcp", or something else entirely)?
Or Network.Protocol.Tcp.Server.Http? Or perhaps Protocol.Network.Tcp.Http.Server? ;) The argument I was trying to make is for some sub-module hierarchies (e.g. HTTP) the word "network" communicates very little information and is thus superfluous. Also the deeper we make the hierarchy the more difficult it is to navigate it. Where would you expect to find a HTTP server module? Network.Protocol Network.Web Network.Asynchronous Here are three possible "answers": Network.Protocol.Http.Server -- How is a server a protocol? Network.Web.Server -- HTTP is a protocol that doesn't necessarily imply "web". Network.Asynchronous -- This is an implementation detail. Or even worse, an HTTP server might not live in any one module making it difficult to find all the pieces you need for your current *task*. Data.Http.Server.State Control.Server -- runServer Network.Protocol.Http -- sendResponse I'm not convinced programmers think in these hierarchical terms at all when programming. More likely the thinking is task based than anything (although I need to find some evidence to support this view). Also consider how many tokens you need to read when scanning e.g. an import list to find the relevant parts: Network.X.Y.Server Network.X.Z.Uri etc Furthermore, if to module imports always co-occur I think the module probably would have made more sense as one module unless 1) that would result in one huge modules or 2) that is not possible because of name clashes. If we move everything up to the root then the root will be even more
crowded than Network is.
Yes, a bit. P.S. The module reorganization effort surrounding Python 3.0 might be of interest for people in this thread. -- Johan

On Fri, Jun 19, 2009 at 02:44:25PM +0200, Johan Tibell wrote:
The argument I was trying to make is for some sub-module hierarchies (e.g. HTTP) the word "network" communicates very little information and is thus superfluous.
I agree that by the time you reach "HTTP", "Network" is superfluous. But the purpose of "Network" is to help the people looking for "OpenGL": they know that they can ignore the "Network" hierarchy, and in doing so they cut out a significant chunk of the search space in one go. They don't want to have to look through all of "HTTP", "NNTP", "FTP", ...
Also the deeper we make the hierarchy the more difficult it is to navigate it. Where would you expect to find a HTTP server module?
Network.Protocol Network.Web Network.Asynchronous
If only one of those exists then there isn't a problem. The hierarchy is to help people looking through a tree of available modules, e.g. the haddock contents page. I don't think that hierarchies are perfect, but I do think that human brains work well with them; probably better than with other systems which on paper ought to be better. Thanks Ian
participants (12)
-
Ben Franksen
-
Conal Elliott
-
Don Stewart
-
Edward Kmett
-
Evan Laforge
-
Ian Lynagh
-
Iavor Diatchki
-
Johan Tibell
-
kahl@cas.mcmaster.ca
-
Simon Marlow
-
Wolfgang Jeltsch
-
wren ng thornton