What's in a name?

Andrew Coppin

13 Aug 2008 13 Aug '08

8:58 p.m.

The naming of cats is a difficult matter... Ahem. So as you may have noticed, we seem to have a profusion of packages all called "binary" or something dangeriously similar. There's also several "MD5" packages. I could point out a few others. So what I'm wondering is... Do we have a formal convention for the naming of packages and/or the naming of the modules they contain? How are name collisions supposed to be avoided? (E.g., Java uses domain names for this. If I write a package named Foo, I put all the classes in orphi.me.uk.Foo.*)

Show replies by date

Thomas M. DuBuisson

13 Aug 13 Aug

10:13 p.m.

...

Do we have a formal convention for the naming of packages and/or the naming of the modules they contain? There is a recommended set of categories and in general I believe

library authors try and follow the previously established names.

...

How are name collisions supposed to be avoided? In the case of pureMD5 I looked at the other modules and decided to name mine something with a proper hierarchy that doesn't collide with 'Crypto'. Hence the extra "Pure" part of the module name.

I believe that an informal process, such as what I did, is much better than formalizing every aspect of Haskell/Hackage libraries. The cost in terms of processes / bureaucracy are just too much to formalize everything. Suggestion: Have Hackage warn when a library is uploaded that has Module conflicts with other libraries. Thomas

Henning Thielemann

14 Aug 14 Aug

5:41 a.m.

On Wed, 13 Aug 2008, Andrew Coppin wrote:

...

The naming of cats is a difficult matter...

Ahem. So as you may have noticed, we seem to have a profusion of packages all called "binary" or something dangeriously similar. There's also several "MD5" packages. I could point out a few others. So what I'm wondering is... Do we have a formal convention for the naming of packages and/or the naming of the modules they contain?

There is a page on the Wiki which lists several common top-level module names. Unfortunately www.haskell.org seems to be down currently.

Henning Thielemann

6:58 p.m.

On Thu, 14 Aug 2008, Henning Thielemann wrote:

...

On Wed, 13 Aug 2008, Andrew Coppin wrote:

...
The naming of cats is a difficult matter...

Ahem. So as you may have noticed, we seem to have a profusion of packages all called "binary" or something dangeriously similar. There's also several "MD5" packages. I could point out a few others. So what I'm wondering is... Do we have a formal convention for the naming of packages and/or the naming of the modules they contain?

There is a page on the Wiki which lists several common top-level module names. Unfortunately www.haskell.org seems to be down currently.

http://www.haskell.org/haskellwiki/Hierarchical_module_names

Andrew Coppin

15 Aug 15 Aug

5:37 p.m.

Henning Thielemann wrote:

...

On Thu, 14 Aug 2008, Henning Thielemann wrote:

...
On Wed, 13 Aug 2008, Andrew Coppin wrote:

...
The naming of cats is a difficult matter...

Ahem. So as you may have noticed, we seem to have a profusion of packages all called "binary" or something dangeriously similar. There's also several "MD5" packages. I could point out a few others. So what I'm wondering is... Do we have a formal convention for the naming of packages and/or the naming of the modules they contain?

There is a page on the Wiki which lists several common top-level module names. Unfortunately www.haskell.org seems to be down currently.

http://www.haskell.org/haskellwiki/Hierarchical_module_names

Right. So if for some reason two people both developed a hashtable implementation (say), we would end up with two modules both called Data.Hashtable, but (hopefully) the packages themselves would be called james-hashtable and chris-hashtable (or something). Now both packages can be installed at once, but when I say "import Data.Hashtable", GHC has no way to know which one I mean. That doesn't sound too clever to me...

Felipe Lessa

6:21 p.m.

On Fri, Aug 15, 2008 at 2:37 PM, Andrew Coppin wrote:

...

Now both packages can be installed at once, but when I say "import Data.Hashtable", GHC has no way to know which one I mean. That doesn't sound too clever to me...

GHC can hide packages or, put it another way, can show only the packages you want. That's what Cabal does when compiling. For example, try to remove some package from the dependencies and watch GHC complain. -- Felipe.

Sean Leather

9:26 p.m.

...

Andrew Coppin wrote:

...
Now both packages can be installed at once, but when I say "import Data.Hashtable", GHC has no way to know which one I mean. That doesn't sound too clever to me...

I agree, Andrew. The hierarchical module approach depends on a global resource for allocating names (or at least everybody agreeing on the scheme of choice). By trying to make all module names equal descriptive categories, it doesn't scale well. There are too many possibilities for overlap or different categorizations for the same thing. Felipe Lessa wrote:

...

GHC can hide packages or, put it another way, can show only the packages you want. That's what Cabal does when compiling. For example, try to remove some package from the dependencies and watch GHC complain.

That doesn't work if you want to use two packages that have modules sharing the same hierarchical name, and this is a definite possibility given my statements above. Of course, having the ability to import modules from specific packages [1] would fix this, but only as long as the package names are also unique. Personally, I like the Java package naming scheme recommendation. It scales better, because each package name uses the organization or URI to uniquely identify a subset. Sean [1] http://thread.gmane.org/gmane.comp.lang.haskell.cvs.ghc/29319 - But notice the not really intended for general use bit.

wren ng thornton

11:24 p.m.

Sean Leather wrote:

...

That doesn't work if you want to use two packages that have modules sharing the same hierarchical name, and this is a definite possibility given my statements above. Of course, having the ability to import modules from specific packages [1] would fix this, but only as long as the package names are also unique.

Personally, I like the Java package naming scheme recommendation. It scales better, because each package name uses the organization or URI to uniquely identify a subset.

Personally, I have major qualms with the Java package naming scheme. In particular, using domain names sets the barrier to entry much too high for casual developers (e.g. most of the Haskell user base). Yes, DNs are cheap and plentiful, but this basically requires a lifetime lease of the DN in question and the migration path is covered in brambles. The alternative is simply to lie and make up a DN, in which case this degenerates into the exact same resource quandary as doing nothing (but with high overhead in guilt or registration paperwork). The way CPAN is set up is much more egalitarian, though mired in a bit much administrivia for casual developers. The orthogonality of package names to module names is something I consider very much a feature, and not a bug. The only other packaging system I've seen to offer this is Monticello for Squeak/SmallTalk, and I've missed it ever since. By making packages orthogonal that allows for developers to create drop-in replacement packages that offer the same module services as another package, without needing to alter any code that uses the old package (save relinking/recompiling). This is the same advantage as allowing different modules to offer the same functions (e.g. having Data.ByteString as a drop-in for the [ ]-portions of the Prelude), but lifted up to the next tier. The question then is two-fold. First, is the question of how to minimize the problems of ambiguity and how to resolve conflicts when they arise. Second, is the question of whether this is really the job of Haskell, the language itself, or whether it is more appropriately dealt with by the build tools, e.g. Cabal. I'll deal a bit more with the latter question. (( For readers who don't want to slog through the rest of this post, the conclusion is that I feel an agile packaging system is an imperative, as discussed above. The trick is finding a way to be agile without creating a maintenance and conflict nightmare. But given the imperative: baby, bathwater, etc. )) I do like your (Sean Leather's) patch for being able to specify package names in source code, though I'd think something like Core's "package:module.module.module" syntax would be better if it gets adopted into Haskell'. I do however think that specifying the package should be optional, with conflicts to be resolved by commandline flags or via Cabal. Without this we loose the ability to have drop-in replacement packages, which in turn greatly complicates migration paths. The community is still young, but forks do happen and we would do best to allow for forwards compatibility whenever possible. This approach also gives the same sort of split control as the various {-# FOO #-} pragma give. As an ad-hoc GHC solution, adding a new PACKAGE pragma would be better than just using a string there. In theory we can already do this with OPTIONS_GHC, though that pragma seems not to respect the -package option. Of course, the new pragma should be position restricted to make it obvious which imports it applies to, rather than assuming to apply to the whole file (i.e. by putting it where you put the strings). One issue with this and Java's scheme of just concatenating package names onto module names is that they offer no provisions for specifying version restrictions. For a PACKAGE pragma we could design it deal with this too, since the modules themselves don't have versions. Of course this starts getting into hairy issues which Cabal was designed to resolve, so porting it back to the compiler seems misguided. Perhaps a simpler option, for a Haskell' world, would be to give modules versions and give the import syntax some way of specifying the version to use. Sticking with something like the current packaging system, packages would just specify the module versions they provide, and those versions need not be related to the version of the package itself. This has the benefit of being able to release and maintain legacy packages, once the world has forked or moved on to a new major version. As an addendum to this, it could be helpful if "package" names (i.e. alphanumeric sequences) were a part of the module version specification. This way a package hfoo-legacy could continue to provide the hfoo-1.24 versions of modules, and it would be the package that forked off rather than forcing the new hfoo package to rename itself to break ties from the legacy code. Another ability that the package/module system lacks right now is a good way for annotating deprecations. Java has this, but again they do it wrong. Whenever something is specified as deprecated it needs to provide a migration path to non-deprecated code. Simply saying "you fail" is an insufficient error message. This proposal doesn't solve the resource allocation issue. That issue will always be around so long as we assume nodes in the dependency graph have unique names. And that assumption is a very useful expedient so we're unlikely to abandon it any time soon (though maybe we should). But I think giving modules explicit name-version annotations is a better path forward than adding more bureaucracy to the module hierarchy. I think the suggested best practices for naming modules should be refined since they're starting to get out of date with all the code on Hackage. In particular there's a lot of conflict about (1) where to put new interesting Num data types (Data.Number.*, Data.*, Numeric.*, ...?); (2) where to put testing and diagnostic tools (Debug.*, Test.*); and (3) where to put modules for the core operation of application projects. But beyond providing better guidance, I don't think we should have a central body issuing leases for the module namespace. Especially because we already have a packaging system which is orthogonal to the module system. One of the reasons I love Haskell so much is because it is so extremely agile. I've been an active open-source developer for many years, and of all the languages I've used Haskell has by far the easiest system for communal public release of code. Perl's community is also very nice though it's gotten to be large enough that they do really need the bureaucracy they have. All the same it means less of my Perl code has made it into the wild than I would have liked. As for C and Java, the only stuff of mine that's managed to eek out into the public are whole projects, never any of the many small building blocks it takes to make something run and to make people able to bang out a program in a few hours because all the dirty work is already done and available in a large public repository. -- Live well, ~wren

David Menendez

16 Aug 16 Aug

1:29 a.m.

On Fri, Aug 15, 2008 at 7:24 PM, wren ng thornton wrote:

...

(( For readers who don't want to slog through the rest of this post, the conclusion is that I feel an agile packaging system is an imperative, as discussed above. The trick is finding a way to be agile without creating a maintenance and conflict nightmare. But given the imperative: baby, bathwater, etc. ))

Have you seen the PackageMounting proposal? http://hackage.haskell.org/trac/ghc/wiki/PackageMounting Essentially, each package would get its own hierarchy, which would then be attached to the larger module hierarchy at compile-time according to compiler options, or Cabal data, or methods as yet unseen. If, for some reason, you need to import two versions of the same package, or two packages that have a module name collision, you can change the default mounting point for one or both. -- Dave Menendez http://www.eyrie.org/~zednenem/

wren ng thornton

4:27 a.m.

David Menendez wrote:

...

On Fri, Aug 15, 2008 at 7:24 PM, wren ng thornton wrote:

...
(( For readers who don't want to slog through the rest of this post, the conclusion is that I feel an agile packaging system is an imperative, as discussed above. The trick is finding a way to be agile without creating a maintenance and conflict nightmare. But given the imperative: baby, bathwater, etc. ))

Have you seen the PackageMounting proposal?

http://hackage.haskell.org/trac/ghc/wiki/PackageMounting

Essentially, each package would get its own hierarchy, which would then be attached to the larger module hierarchy at compile-time according to compiler options, or Cabal data, or methods as yet unseen. If, for some reason, you need to import two versions of the same package, or two packages that have a module name collision, you can change the default mounting point for one or both.

I hadn't seen it. It looks pretty nice. It reminds me of the recent discussion about hierarchical imports for the Gtk library.[1] A feature that's typically nice in grafting systems is to allow re-grafting (and pruning), so that after a package is mounted subpackages can be moved around (or removed). These features add complication to building the right tree before linking, but it adds a great deal of power and flexibility. A typical use case is when your project is using some other large project and you want to mark certain parts of that project as deprecated, unsafe, overridden, etc so that your own code doesn't accidentally use it. Defensive programming and all that. Of course, used improperly, it also lets you create byzantine structures that channel mind-bending energies that keep systems administrators tossing and turning at nights. [1] http://www.haskell.org/pipermail/haskell-cafe/2008-June/thread.html#44133 -- Live well, ~wren

Robert Greayer

2:18 a.m.

--- On Sat, 8/16/08, wren ng thornton wrote:

...

Personally, I have major qualms with the Java package naming scheme. In particular, using domain names sets the barrier to entry much too high for casual developers (e.g. most of the Haskell user base). Yes, DNs are cheap and plentiful, but this basically requires a lifetime lease of the DN in question and the migration path is covered in brambles. The alternative is simply to lie and make up a DN, in which case this degenerates into the exact same resource quandary as doing nothing (but with high overhead in guilt or registration paperwork).

This does sound in theory like a real problem; the actual practice has worked out much differently for Java: the existence of durable domains willing to host development of small libraries for the Java space are plentiful. In other words, the barrier to entry has turned out to be quite low. Nevertheless, hackage of course provides an even cheaper alternative to DN-based naming, since package names registered on hackage are guaranteed unique (across the hackage-using community). The ubiquitous convention for Haskell could easily be that if you want your library to interoperate without conflict, register it on hackage (otherwise you take your chances, just as in Java if you ignore the DN-based convention). Having the ability to use package names to avoid module-name conflicts (i.e. an agile packaging system, in your words) would still be needed. The need to *recompile* to avoid conflicts is a problem though, if haskell aspires to attract commercial package vendors. I don't know how it could be avoided though. rcg

Henning Thielemann

5:25 a.m.

On Fri, 15 Aug 2008, Andrew Coppin wrote:

...

Henning Thielemann wrote:

...
http://www.haskell.org/haskellwiki/Hierarchical_module_names

Right. So if for some reason two people both developed a hashtable implementation (say), we would end up with two modules both called Data.Hashtable, but (hopefully) the packages themselves would be called james-hashtable and chris-hashtable (or something).

Now both packages can be installed at once, but when I say "import Data.Hashtable", GHC has no way to know which one I mean. That doesn't sound too clever to me...

Although it is possible to hide packages by GHC options, we should not do this, because several libraries might use different Hash tables and it would not be possible to write a program which uses many of these libraries. It's better to add a distinguishing part to the module name, like Data.HashTable.Coppin or so.

Andrew Coppin

9 a.m.

Henning Thielemann wrote:

...

Although it is possible to hide packages by GHC options, we should not do this, because several libraries might use different Hash tables and it would not be possible to write a program which uses many of these libraries. It's better to add a distinguishing part to the module name, like Data.HashTable.Coppin or so.

This is more the sort of thing I had in mind, yes. As others have pointed out, not everybody has a domain name, so Java's technique of inserting a domain name perhaps isn't the best one. However, if we all agreed that, say, packages should be named "coppin-hashtable" or something, then there wouldn't be much danger of ambiguous package names. (As I already pointed out, there's at least 3 packages called "bianry", which is just confusing.) It's rather less clear what to do with something like, say, ByteString, which is the product of a large number of collaborators. Then again, it's a big enough package that nobody is likely to come up with a similar one. (Unless it's a fork I suppose - in which case a prefix or suffix for the person who forked it might be appropriate?) What to do at the module level is less obvious. Having several packages provide different implementations of the same thing is arguably useful. (E.g., I know Gtk2hs provies an SOE module. What about wxHaskell? If the interface is standard enough, a given application might not actually care which implementation it gets.) I'm open to suggestions here... I don't claim to have all the answers. I'd just like to see some debate happening. ;-)

Robert Greayer

3:20 p.m.

--- On Sat, 8/16/08, Andrew Coppin wrote:

...

...
Although it is possible to hide packages by GHC options, we should not do this, because several libraries might use different Hash tables and it would not be possible to write a program which uses many of these libraries. It's better to add a distinguishing part to the module name, like Data.HashTable.Coppin or so.

This is more the sort of thing I had in mind, yes.

This seems to be a common approach, but it runs counter to the objective of separating 'provenance' from module naming. 'Coppin' is (part of, sans version) the provenance of the hashtable implementation, so I'm not sure how this sort of scheme is better than just shoving the unique prefix at the front of the module, e.g. Coppin.Data.Hashtable Though embedding the provenance down in the hierarchy is a common pattern, I think it is can be pretty messy. For example, the Parsec package exposes many modules, including "Text.Parsec.String" and "Text.ParserCombinators.Parsec.Token" -- the provenance appears at different levels in the hierarchy. If you're going to shove the package name in there, it seems simpler to me to just shove it at the front: Parsec.Text.ParserCombinators.Token. The package mounting scheme might solve this (though it seems to me that it requires that source for packages be kept around. I may be wrong).

...

(As I already pointed out, there's at least 3 packages called "bianry", > which is just confusing.)

On hackage? I only see one with that the exact name "binary".

Andrew Coppin

5:22 p.m.

Robert Greayer wrote:

...

This seems to be a common approach, but it runs counter to the objective of separating 'provenance' from module naming. 'Coppin' is (part of, sans version) the provenance of the hashtable implementation, so I'm not sure how this sort of scheme is better than just shoving the unique prefix at the front of the module, e.g.

Coppin.Data.Hashtable

Though embedding the provenance down in the hierarchy is a common pattern, I think it is can be pretty messy. For example, the Parsec package exposes many modules, including "Text.Parsec.String" and "Text.ParserCombinators.Parsec.Token" -- the provenance appears at different levels in the hierarchy. If you're going to shove the package name in there, it seems simpler to me to just shove it at the front: Parsec.Text.ParserCombinators.Token. The package mounting scheme might solve this (though it seems to me that it requires that source for packages be kept around. I may be wrong).

Yeah, as I said, it's not immediately obvious exactly what the best solution is. Maybe we just need to get everybody to come up with more inventive names than just "hashtable" or "binary". (E.g., We have several parsers already, but they all have distinctive names that are unlikely to clash. Maybe we just need to do that for everything? IDK.)

...

...
(As I already pointed out, there's at least 3 packages called "bianry", > which is just confusing.)

On hackage? I only see one with that the exact name "binary".

OK, that's interesting. Apparently something has changed. Last time I looked, there was "binary", "old-binary", "new-binary", "alt-binary" and so forth. (It seems there is now a "binary-strict", but it's pretty obvious how that relates to the normal "binary" package.) Obviously, having this profusion of nearly identical names is just confusing.

Brandon S. Allbery KF8NH

5:28 p.m.

On 2008 Aug 16, at 13:22, Andrew Coppin wrote:

...

Yeah, as I said, it's not immediately obvious exactly what the best solution is. Maybe we just need to get everybody to come up with more inventive names than just "hashtable" or "binary". (E.g., We have several parsers already, but they all have distinctive names that are unlikely to clash. Maybe we just need to do that for everything? IDK.)

The names should really be more descriptive. What makes hashtable A different/distinct from hashtable B? What's so special about new- binary? -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH

Brandon S. Allbery KF8NH

4:54 p.m.

On 2008 Aug 16, at 5:00, Andrew Coppin wrote:

...

What to do at the module level is less obvious. Having several packages provide different implementations of the same thing is arguably useful. (E.g., I know Gtk2hs provies an SOE module. What about wxHaskell? If the interface is standard enough, a given application might not actually care which implementation it gets.) I'm open to suggestions here...

The standard way to deal with this is virtual packages. But this would require significant changes to Cabal, not only to track multiple names for a single package but also to not complain about collisions. -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH

David Menendez

15 Aug 15 Aug

9:53 p.m.

Whoops. Forgot to hit "reply all". On Wed, Aug 13, 2008 at 4:58 PM, Andrew Coppin wrote:

...

The naming of cats is a difficult matter...

Ahem. So as you may have noticed, we seem to have a profusion of packages all called "binary" or something dangeriously similar. There's also several "MD5" packages. I could point out a few others. So what I'm wondering is... Do we have a formal convention for the naming of packages and/or the naming of the modules they contain? How are name collisions supposed to be avoided? (E.g., Java uses domain names for this. If I write a package named Foo, I put all the classes in orphi.me.uk.Foo.*)

So far as I know, there are no rules about naming packages, except that Hackage won't allow two packages with the same name. There is a loose convention about fitting modules into the larger hierarchy. Personally, I think trying to fit modules from different packages into the same hierarchy is a mistake, in that you either get the same module name used twice (meaning that both packages can't be used in the same program), or need to insert the package name into the name. Thus, "Test.HUnit", "Test.QuickCheck", "Text.ParserCombinators.Parsec", "Text.Pretty.HughesPJ", and so forth. We'd be better off just using the package name as the first element of the module names. Or, if that's objectionable, using something like "Package.Parsec" or "Lib.Parsec" (or "Hackage.Parsec", since Hackage enforces the uniqueness of package names). This is arguably one area where Java does better than Haskell. The resulting module names are long, but they don't require coordination and they don't raise tricky questions. (Quick, what's the difference between Data.* and Control.*? Is QuickCheck under Debug.* or Test.*?) -- Dave Menendez http://www.eyrie.org/~zednenem/

6170

Age (days ago)

6173

Last active (days ago)

List overview

Download

17 comments

9 participants

participants (9)

Andrew Coppin
Brandon S. Allbery KF8NH
David Menendez
Felipe Lessa
Henning Thielemann
Robert Greayer
Sean Leather
Thomas M. DuBuisson
wren ng thornton

What's in a name?

Andrew Coppin

Andrew Coppin

Sean Leather

wren ng thornton

wren ng thornton

Robert Greayer

Andrew Coppin

Robert Greayer

Andrew Coppin

tags

participants (9)