Proposal: Reserved module namespace for packages on Hackage

In the interests of reducing module name collisions, I suggest reserving part of the module name space for individual packages on Hackage. Specifically, I'm suggesting that a new top-level module name, "Lib", be added to the module naming conventions, and that the children of "Lib" be reserved for the Hackage package with the same name. That is, "Lib.Foo" and "Lib.Foo.*" would be reserved for the package "Foo" on Hackage. This would not require packages to *use* this namespace. However, packages that do use it would have a greatly reduced chance of conflicting with other packages. Implementation costs are minor. At most, we might want some code in Hackage to prevent packages from using module names reserved for other packages. At the least, all we need to do is add "Lib" to the list of allowable top-level module names. Developers who object to giving the provenance of a module in its name are free to take their chances with the rest of the module hierarchy. Mapping package names to module names is mostly straightforward. According to the Cabal documentation, a package name consists of one or more alphanumeric words separated by hyphens, where each word contains at least one letter. Since hyphens aren't allowed in module names, they would get mapped to underscores, which are not allowed in package names. Thus, "Lib.Foo_Bar" would be reserved for package "Foo-Bar". It's less obvious what to do with packages whose names start with lower-case letters or digits. I see three possible solutions: (a) Do not reserve module names for these packages. (b) Map these package names to module names in a way that avoids conflicts, e.g., prefixing the package name with "P'", which cannot occur in a package name. That is, package "foo" would get "Lib.P'foo". (c) Change the rules for package names on Hackage by disallowing package names which start with a digit or which differ from an existing package only in the case of the first letter, and reserve module names based on capitalized package names. That is, package "foo" would get "Lib.Foo", and Hackage would not accept a new package "Foo" if there was a preexisting "foo", and vice versa. My preference is for (c). In fact, I might go further and forbid any package whose name differs only in case from an existing package in Hackage. Further thoughts: (1) I chose "Lib" because it's short and, so far as I know, unused. "Hackage" might be a better choice, since the scheme depends on Hackage to prevent name collisions. (2) It was surprisingly difficult to find out the rules for valid package naming. None of the tutorials I found discussed choosing a valid name. The GHC documentation mentions that package names must have a specific form, but I couldn't find any description of it. (3) I did not find a definition of "alphanumeric" in the Cabal documentation. Does this include non-ASCII characters? (4) We could also reserve additional module names corresponding to specific versions of packages, e.g., "Foo-1.0" might get "Lib.Foo_1_0". This does not create ambiguity, because "Foo-1-0" is not a valid package name.

David Menendez wrote:
Implementation costs are minor.
There is a serious cost: Sometimes another package is *supposed to* provide the same interface, including the same module names (e.g. forks or reimplementations. e.g. SOE). If Hackage rejected them, we would have a serious problem once people started depending on any package using a Lib. name. But it's not hard to pretty much avoid conflicts; you don't even need the Lib. prefix, you can just use the package name as your top-level module name. (right? or does hackage arbitrarily reject some module names?) -Isaac

On Mon, Aug 18, 2008 at 8:03 PM, Isaac Dupree
David Menendez wrote:
Implementation costs are minor.
There is a serious cost: Sometimes another package is *supposed to* provide the same interface, including the same module names (e.g. forks or reimplementations. e.g. SOE). If Hackage rejected them, we would have a serious problem once people started depending on any package using a Lib. name.
Would we? How many packages out there are drop-in replacements? Even things like Data.List.Stream, which is a drop-in replacement for Data.List, uses a different module name. The packages I've seen that abstract over other packages tend to use preprocessor commands to get the right modules. I can see your point about forks. That's one case where it might be better to use the same module names as a different package. But I'm leery of relying on two modules with the same name having the same interface. The ideal solution would be something like the package mounting proposal, but that has a major implementation cost. This is more of a stop-gap measure that could be implemented today. http://hackage.haskell.org/trac/ghc/wiki/PackageMounting
But it's not hard to pretty much avoid conflicts; you don't even need the Lib. prefix, you can just use the package name as your top-level module name. (right? or does hackage arbitrarily reject some module names?)
As I understand it, Hackage complains if you use a top-level name that
isn't on the approved list. Putting the package name at the top-level
is also a possibility, but putting it one level down is more
future-proof.
Really, all my proposal needs is to add "Lib" to the list of
acceptable top-level names, and to have some document on the web
explain what it's for.
--
Dave Menendez

I tend to think this is a really bad idea. Although things get messy and there are plenty of corner cases, it seems to me the current system, haphazard as it is, is closer to the "right way." If, e.g., I want a Maybe transformer, I want to import it from Control.Monad.MaybeT, not from Lib.MaybeT. That way I can sort my imports sanely and see all my Control things in one place, no matter their provenance, all my data structures in another, be they from collections or bloom filters from hackage, etc. The other problem is that either everything eventually goes under lib, which creates the same problem again, or there is an implicit set of exceptions for things which, although not part of the official libraries (which we're trying to reduce, remember) are obviously too "standard" for lib (e.g., HTTP, and such). The problem here is that maybe this doesn't scale, since it requires hackage contributors to think about the package namespace as a whole, and some vigilance in that regard, the need to mark packages as depreciated properly, etc. But on the other hand, arbitrary namespacing leads to fragmentation, with everyone reimplementing things under their own hierarchy, and encouraging uses of standard(ish) namespaces also contributes to a mindset where people will pare down packages into lots of little reusable conceptual units that only do one thing well. The problem -- duplication of functionality and fragmentation -- is a real one, but dealing with it through throwing namespacing to the wind won't solve the underlying issues, which I think need to be addressed though the Haskell community guiding the direction of various efforts, and not through an artificial measure that makes fragmentation less immediately painful while doing nothing to mitigate the long term consequences. --S On Aug 18, 2008, at 7:32 PM, David Menendez wrote:
In the interests of reducing module name collisions, I suggest reserving part of the module name space for individual packages on Hackage. Specifically, I'm suggesting that a new top-level module name, "Lib", be added to the module naming conventions, and that the children of "Lib" be reserved for the Hackage package with the same name. That is, "Lib.Foo" and "Lib.Foo.*" would be reserved for the package "Foo" on Hackage.
This would not require packages to *use* this namespace. However, packages that do use it would have a greatly reduced chance of conflicting with other packages.
Implementation costs are minor. At most, we might want some code in Hackage to prevent packages from using module names reserved for other packages. At the least, all we need to do is add "Lib" to the list of allowable top-level module names. Developers who object to giving the provenance of a module in its name are free to take their chances with the rest of the module hierarchy.
Mapping package names to module names is mostly straightforward. According to the Cabal documentation, a package name consists of one or more alphanumeric words separated by hyphens, where each word contains at least one letter. Since hyphens aren't allowed in module names, they would get mapped to underscores, which are not allowed in package names. Thus, "Lib.Foo_Bar" would be reserved for package "Foo-Bar".
It's less obvious what to do with packages whose names start with lower-case letters or digits. I see three possible solutions:
(a) Do not reserve module names for these packages.
(b) Map these package names to module names in a way that avoids conflicts, e.g., prefixing the package name with "P'", which cannot occur in a package name. That is, package "foo" would get "Lib.P'foo".
(c) Change the rules for package names on Hackage by disallowing package names which start with a digit or which differ from an existing package only in the case of the first letter, and reserve module names based on capitalized package names. That is, package "foo" would get "Lib.Foo", and Hackage would not accept a new package "Foo" if there was a preexisting "foo", and vice versa.
My preference is for (c). In fact, I might go further and forbid any package whose name differs only in case from an existing package in Hackage.
Further thoughts:
(1) I chose "Lib" because it's short and, so far as I know, unused. "Hackage" might be a better choice, since the scheme depends on Hackage to prevent name collisions.
(2) It was surprisingly difficult to find out the rules for valid package naming. None of the tutorials I found discussed choosing a valid name. The GHC documentation mentions that package names must have a specific form, but I couldn't find any description of it.
(3) I did not find a definition of "alphanumeric" in the Cabal documentation. Does this include non-ASCII characters?
(4) We could also reserve additional module names corresponding to specific versions of packages, e.g., "Foo-1.0" might get "Lib.Foo_1_0". This does not create ambiguity, because "Foo-1-0" is not a valid package name. _______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

On Mon, Aug 18, 2008 at 8:07 PM, Sterling Clover
I tend to think this is a really bad idea. Although things get messy and there are plenty of corner cases, it seems to me the current system, haphazard as it is, is closer to the "right way." If, e.g., I want a Maybe transformer, I want to import it from Control.Monad.MaybeT, not from Lib.MaybeT. That way I can sort my imports sanely and see all my Control things in one place, no matter their provenance, all my data structures in another, be they from collections or bloom filters from hackage, etc.
Unless you're mechanically sorting your module imports, I don't see how the Lib names would prevent that. As far as Haskell is concerned, module names are entirely arbitrary.
The other problem is that either everything eventually goes under lib, which creates the same problem again, or there is an implicit set of exceptions for things which, although not part of the official libraries (which we're trying to reduce, remember) are obviously too "standard" for lib (e.g., HTTP, and such).
How does putting everything under Lib create the same problem again?
Hackage already forbids the multiple packages from having the same
name, so the reserved names for each package would be disjoint.
--
Dave Menendez

On Mon, 2008-08-18 at 19:32 -0400, David Menendez wrote:
In the interests of reducing module name collisions, I suggest reserving part of the module name space for individual packages on Hackage. Specifically, I'm suggesting that a new top-level module name, "Lib", be added to the module naming conventions, and that the children of "Lib" be reserved for the Hackage package with the same name. That is, "Lib.Foo" and "Lib.Foo.*" would be reserved for the package "Foo" on Hackage.
Note that this is entirely contrary to the existing (and well established) convention of naming according to the purpose / content of the module rather than the name of the implementation. What I mean is, it's a significant change. I'll throw in my opinion too. :-) I don't think it's necessary. The existing recommendations on naming mean we already don't get too many clashes, eg we get Database.HDBC and Database.HSQL. Even when names do clash they're typically implementations of similar things and how many packages need both at once? It's more common to pick one implementation of some functionality. It would certainly be interesting to make a service on hackage to work out what packages do have clashing names so that maintainers can work out with each other how to resolve things. For example suppose we have two packages implementing Text.PrettyPrint then we'd ask both to use Text.PrettyPrint.ImplName. If we allowed overlap in the modules exported by the packages in use then both can still export Text.PrettyPrint that just re-exports Text.PrettyPrint.ImplName. That way one can pick and no existing code breaks. So far in practise it seems that overlap is a pretty minor problem and could easily be resolved in most instances with just a little communication. It's not obvious that we need something much more heavyweight. If we really do need more then package-qualified imports is probably a better approach than a big change in module naming conventions. Duncan

On Mon, Aug 18, 2008 at 9:24 PM, Duncan Coutts
On Mon, 2008-08-18 at 19:32 -0400, David Menendez wrote:
In the interests of reducing module name collisions, I suggest reserving part of the module name space for individual packages on Hackage. Specifically, I'm suggesting that a new top-level module name, "Lib", be added to the module naming conventions, and that the children of "Lib" be reserved for the Hackage package with the same name. That is, "Lib.Foo" and "Lib.Foo.*" would be reserved for the package "Foo" on Hackage.
Note that this is entirely contrary to the existing (and well established) convention of naming according to the purpose / content of the module rather than the name of the implementation.
What I mean is, it's a significant change.
Is it?
Look at the XML category at Hackage.
formlets - no common prefix
generic-xml - all modules prefixed with Xml
HaXml - every module is prefixed with Text.XML.HaXml
hexpat - both modules are prefixed with Text.XML.Expat
HXQ - one module, prefixed with Text.XML.HXQ
hxt - 95 of 113 modules are prefixed with Text.XML.HXT
libxml - all modules prefixed with Text.XML.LibXML
tagsoup - 7 of 8 modules prefixed with Text.HTML.TagSoup
xml - all modules prefixed with Text.XML.Light
Selecting things semi-randomly from the parser category, I see:
attoparsec - all modules prefixed with Data.ParserCombinators.Attoparsec
binary - all modules prefixed with Data.Binary
binary-strict - all modules prefixed with Data.Binary.Strict
bytestringparser - all modules prefixed with Data.ParserCombinators.Attoparsec
PArrows - all modules prefixed with Text.ParserCombinators.PArrow
Parsec - all modules prefixed with Text.ParserCombinators.Parsec
parsely - all modules prefixed with Text.ParserCombinators.Parsely
polyparse - no common prefix
uulib - all modules prefixed with UU
To me, it looks like a common pattern is to give most or all of the
modules in a package a common prefix consisting of a general
classification and the package name (or a close variant). All I'm
suggesting is to give library authors the option to drop the
classification part. Trying to create a collaborative, hierarchical
classification system is a sucker's game. That's why Hackage itself
uses tags.
--
Dave Menendez

Hello,
I also don't think that we need to prefix everything with Lib.
However, I also do not like the current style of naming library
packages, where a single package can sprinkle modules all over the
hierarchy because:
- It makes it hard to figure out where modules come from (e.g., when
I see an import in the source code, it is hard to tell what library it
came from),
- The reverse problem also holds---when you look at the docs, it is
hard to tell which modules are provided by a given package,
- It discourages diversity (which some people may say is a good
thing :-). What I mean is that there is a kind of "land rush" to
stake out the good names in the hierarchy (I know that multiple
packages can provide the same module, but it is still a pain,
especially if you want _some_ modules from two conflicting packages).
- I don't think the system scales that well. For example, if I was
to create a package that draws graphs, should I put it under
Data.Graph, and hope that no one uses both it, and the graph modules.
And does that mean that to pick names for my modules I have to know
all the modules in all libraries out there?
- There are much better ways to classify modules by their purposes
than the single hierarchy imposed by the module name space (think
labels, tags, categories, keywords, all the usual ways people use on
the internet to classify things).
I think that it is a much better idea to use the package name as the
top-level module name space, as we have already put some effort in
ensuring that these are more or less unique.
-Iavor
On Mon, Aug 18, 2008 at 8:48 PM, David Menendez
On Mon, Aug 18, 2008 at 9:24 PM, Duncan Coutts
wrote: On Mon, 2008-08-18 at 19:32 -0400, David Menendez wrote:
In the interests of reducing module name collisions, I suggest reserving part of the module name space for individual packages on Hackage. Specifically, I'm suggesting that a new top-level module name, "Lib", be added to the module naming conventions, and that the children of "Lib" be reserved for the Hackage package with the same name. That is, "Lib.Foo" and "Lib.Foo.*" would be reserved for the package "Foo" on Hackage.
Note that this is entirely contrary to the existing (and well established) convention of naming according to the purpose / content of the module rather than the name of the implementation.
What I mean is, it's a significant change.
Is it?
Look at the XML category at Hackage.
formlets - no common prefix generic-xml - all modules prefixed with Xml HaXml - every module is prefixed with Text.XML.HaXml hexpat - both modules are prefixed with Text.XML.Expat HXQ - one module, prefixed with Text.XML.HXQ hxt - 95 of 113 modules are prefixed with Text.XML.HXT libxml - all modules prefixed with Text.XML.LibXML tagsoup - 7 of 8 modules prefixed with Text.HTML.TagSoup xml - all modules prefixed with Text.XML.Light
Selecting things semi-randomly from the parser category, I see:
attoparsec - all modules prefixed with Data.ParserCombinators.Attoparsec binary - all modules prefixed with Data.Binary binary-strict - all modules prefixed with Data.Binary.Strict bytestringparser - all modules prefixed with Data.ParserCombinators.Attoparsec PArrows - all modules prefixed with Text.ParserCombinators.PArrow Parsec - all modules prefixed with Text.ParserCombinators.Parsec parsely - all modules prefixed with Text.ParserCombinators.Parsely polyparse - no common prefix uulib - all modules prefixed with UU
To me, it looks like a common pattern is to give most or all of the modules in a package a common prefix consisting of a general classification and the package name (or a close variant). All I'm suggesting is to give library authors the option to drop the classification part. Trying to create a collaborative, hierarchical classification system is a sucker's game. That's why Hackage itself uses tags.
-- Dave Menendez
http://www.eyrie.org/~zednenem/ _______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

On 2008 Aug 19, at 12:20, Iavor Diatchki wrote:
I think that it is a much better idea to use the package name as the top-level module name space, as we have already put some effort in ensuring that these are more or less unique.
May I suggest the Alexandrian solution? Module aliases. "alias Foo- package.Data.HashSet as Data.HashSet". -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH

On Tue, 19 Aug 2008, Brandon S. Allbery KF8NH wrote:
On 2008 Aug 19, at 12:20, Iavor Diatchki wrote:
I think that it is a much better idea to use the package name as the top-level module name space, as we have already put some effort in ensuring that these are more or less unique.
May I suggest the Alexandrian solution? Module aliases. "alias Foo-package.Data.HashSet as Data.HashSet".
Or what about using Lib top-level for new libraries written by only a few authors and used by only a few users. When it becomes clear that many people need it or there are multiple packages for the same purpose, they can start a joint effort to create "the real thing" in the existing module hierarchy. This way 'Lib' would be the sand-box and 'Data' and friends are for the "standards".
participants (7)
-
Brandon S. Allbery KF8NH
-
David Menendez
-
Duncan Coutts
-
Henning Thielemann
-
Iavor Diatchki
-
Isaac Dupree
-
Sterling Clover