Correspondence between libraries and modules

Hi there, I've only dabbled in Haskell, so please excuse my ignorance: why isn't there a 1-to-1 mapping between libraries and modules? As I understand it, a library can provide any number of unrelated modules, and conversely, a single module could be provided by more than one library. I can see how this affords library authors more flexibility, but at a cost: there is no longer a single, unified view of the library universe. (The alternative would be for every module to be its own, hermetic library.) So I'm very interested in the rationale behind that aspect of the library system. Thanks! Alvaro

On Sun, Apr 22, 2012 at 13:15, Alvaro Gutierrez
As I understand it, a library can provide any number of unrelated modules, and conversely, a single module could be provided by more than one library. I can see how this affords library authors more flexibility, but at a cost: there is no longer a single, unified view of the library universe. (The alternative would be for every module to be its own, hermetic library.) So I'm very interested in the rationale behind that aspect of the library system.
One reason: modules serve multiple purposes; one of these is namespacing, and in the case of interfaces to foreign libraries that may force a division that would otherwise not exist. More generally, making libraries and modules one-to-one means that either modules exist solely for libraries, or libraries must be artificially split. Perhaps this indicates that modules have too many other functions, but in that case you should propose an alternative system to replace them. As to multiple libraries providing the same module: the Haskell ecosystem is still evolving and it's not always appropriate to give a particular implementation sole ownership of a general module name. Type families vs. functional dependencies are an example of this (theoretically type families were considered superior but to date they haven't lived up to it and recently some cases were shown that fundeps can solve but type families can't; parallel monad libraries based on both still exist). New container implementations have existed as standalone packages, some of which later merge with standard packages while others are discarded. Your proposal to reject this reflects a static library ecosystem that does not exist. (It could be enforced dictatorially, but there is no Guido van Rossum of Haskell and a mistake in an evolving system is difficult to fix after the fact even with a dictator; we're already living with some difficult to fix issues not related to modules.) -- brandon s allbery allbery.b@gmail.com wandering unix systems administrator (available) (412) 475-9364 vm/sms

Thanks for your response.
On Sun, Apr 22, 2012 at 4:45 PM, Brandon Allbery
One reason: modules serve multiple purposes; one of these is namespacing, and in the case of interfaces to foreign libraries that may force a division that would otherwise not exist.
Interesting. Could you elaborate on what the other purposes are, and perhaps point to an instance of the foreign library case? More generally, making libraries and modules one-to-one means that either
modules exist solely for libraries, or libraries must be artificially split. Perhaps this indicates that modules have too many other functions, but in that case you should propose an alternative system to replace them.
Oh, I don't intend to replace it -- at most I want to understand why the system is set up the way it is, what the cons/pros are, and so on. I've come across a lot of design discussions for various Haskell features, but not this one; are there any? As to multiple libraries providing the same module: the Haskell ecosystem
is still evolving and it's not always appropriate to give a particular implementation sole ownership of a general module name. Type families vs. functional dependencies are an example of this (theoretically type families were considered superior but to date they haven't lived up to it and recently some cases were shown that fundeps can solve but type families can't; parallel monad libraries based on both still exist). New container implementations have existed as standalone packages, some of which later merge with standard packages while others are discarded.
I see. I didn't imagine there was as much variability with respect to module names and implementations as you suggest. I'm confused as to how type families vs. fundeps play a role here -- as far as I can tell both are compiler extensions that do not provide modules. I'm interested to see examples where two or more well-known yet unrelated modules clash under the same name; I can't imagine them coexisting in public very long -- wouldn't the confusion among users (e.g. when looking for documentation) be enough to either reconcile the modules or change one of the names?
Your proposal to reject this reflects a static library ecosystem that does not exist. (It could be enforced dictatorially, but there is no Guido van Rossum of Haskell and a mistake in an evolving system is difficult to fix after the fact even with a dictator; we're already living with some difficult to fix issues not related to modules.)
Right, assuming there could only be one implementation of a module, this is one of the main drawbacks; on the flip side, it is a "feature" in that there is no confusion as to what Foo.Bar.Qux means. As it is, any import requires out-of-band information in order to be resolved (both cognitively and by the compiler), in the form of the library it comes from. (There's also versioning information, but that could be equally specified per-library or per-module.) On the other hand, enforcing a single implementation is orthogonal to having a 1-to-1 module/library mapping. That is, you could allow multiple implementations either way. Alvaro

On 4/22/12 6:30 PM, Alvaro Gutierrez wrote:
On Sun, Apr 22, 2012 at 4:45 PM, Brandon Allbery
wrote: One reason: modules serve multiple purposes; one of these is namespacing, and in the case of interfaces to foreign libraries that may force a division that would otherwise not exist.
Interesting. Could you elaborate on what the other purposes are, and perhaps point to an instance of the foreign library case?
The main purpose of namespacing (IMO) is to separate concerns and make it easier to figure out how a project fits together. The primary goal of modules is to resolve namespacing issues. Consider one of my own libraries (chosen randomly via Safari's url autocompletion): http://hackage.haskell.org/package/bytestring-lexing When I inherited this package there were the Data.ByteString.Lex.Double and Data.ByteString.Lex.Lazy.Double modules, which were separated because they provide the same API but for strict vs lazy ByteStrings. Both of those modules are concerned with lexing floating point numbers. I inherited the package because I wanted to publicize some code I had for lexing integers in various formats. Since that's quite a different task than lexing floating point numbers, I put it in its own module: Data.ByteString.Lex.Integral. When dealing with FFI code, because of the impedance mismatch between Haskell and imperative languages like C, it's clear that there's going to be some massaging of the API beyond simply declaring FFI calls. As such, clearly we'd like to have separate modules for doing the low-level binding vs presenting a high-level API. Moreover, depending on what you're interfacing with, you may be forced to have multiple low-level modules. For example, if you use Google protocol buffers via the hprotoc package, then it will generate a separate module for each buffer type. That's fine, but usually it's not something you want to foist on your users. On the other hand, the main purpose of packages or libraries is as unit of distribution, code reuse, and separate compilation. Even with the Haskell culture of making small libraries, most worthwhile units of distribution/reuse/compilation tend to be larger than a single namespace/concern. Thus, it makes sense to have more than one module per package, because otherwise we'd need some higher level mechanism in order to manage the collections of package-modules which should be considered a single unit (i.e., clients will almost always want the whole bunch of them). However, centralization is prone to bottlenecks and systemic failure. As such, while it would be nice to ensure that a given module is provided by only one package, there is no mechanism in place to enforce this (except at compile time for the code that links the conflicting modules together). With few exceptions, it's considered bad form to knowingly use the same module name as is being used by another package. In part, it's bad form because egos are involved; but it's also bad form because there's poor technical support for resolving namespace collisions for module names. In GHC you can use -XPackageImports, which is workable but conflates issues of code with issues of provenance, which the Haskell Report intentionally keeps separate. However, until better technical support is implemented (not just for GHC, but also jhc, UHC,...) it's best to follow social practice.
I'm confused as to how type families vs. fundeps play a role here -- as far as I can tell both are compiler extensions that do not provide modules.
Both TFs (or rather associated types) and fundeps aim to solve the same problem. Namely: when using multi-parameter type classes, it is often desirable to declare that one parameter is wholly defined by other parameters, either for semantic reasons or (more often) to help type inference. Since they both aim to solve the same problem, this raises a new problem: for some given type class, do I implement it with TF/ATs or with fundeps? Some people figured to solve the new issue by implementing it both ways in separate packages, but reusing the same module names. (Witness for example mtl-2 aka monads-fd, vs monads-tf.) In practice, that didn't work out so well. Part of the reason for failure is that although fundeps and TF/ATs are formally equivalent in theory, in practice the implementation of TF/ATs has(had?) been missing some necessary machinery, and consequentially the TF/AT versions were not as powerful as the original fundep versions. Though the butterfly dependency issues certainly didn't help.
I'm interested to see examples where two or more well-known yet unrelated modules clash under the same name; I can't imagine them coexisting in public very long -- wouldn't the confusion among users (e.g. when looking for documentation) be enough to either reconcile the modules or change one of the names?
That's not much of a problem in practice. There are lots of different books with a Chapter 1, but rarely is there any confusion about which one is meant. The same is true of module names in packages. -- Live well, ~wren

On 04/23/2012 12:03 AM, wren ng thornton wrote:
However, until better technical support is implemented (not just for GHC, but also jhc, UHC,...) it's best to follow social practice.
Wren, I am new to Haskell and not aware of all of the conventions. Is there a place where I can find information on these social practices? Are they documented some place?
However, centralization is prone to bottlenecks and systemic failure. As such, while it would be nice to ensure that a given module is provided by only one package, there is no mechanism in place to enforce this (except at compile time for the code that links the conflicting modules together).
From someone new to the community, it seems that yes centralization has its issues, but it also seems that practices could be put in place that minimize the bottlenecks and systemic failures. Unless I greatly misunderstand the challenges, there seem to be lot of ways to approach this problem and none of them are new. We all use systems that are composed of many modules neatly combined into complete systems. Linux distributions do this well. So does Java. Maybe should borough from their experiences and think about how we put packages together and what mechanisms we need to resolve inter-package dependencies. Am I missing something that makes this problem harder than other systems and languages? Is anyone currently working on the packaging and distribution issues? If not, does anyone else want to work on it?

On Mon, Apr 23, 2012 at 11:39, Gregg Lebovitz
Am I missing something that makes this problem harder than other systems and languages? Is anyone currently working on the packaging and distribution issues? If not, does anyone else want to work on it?
The other dirty little secret that is carefully being avoided here is the battle between the folks for whom Haskell is a language research platform and those who use it to get work done. It's not entirely inaccurate to say the former group would regard a fragmented module namespace as a good thing, specifically because it discourages people from considering it to be stable.... -- brandon s allbery allbery.b@gmail.com wandering unix systems administrator (available) (412) 475-9364 vm/sms

On Mon, Apr 23, 2012 at 17:16, Gregg Lebovitz
On 4/23/2012 3:39 PM, Brandon Allbery wrote:
The other dirty little secret that is carefully being avoided here is the battle between the folks for whom Haskell is a language research platform and those who use it to get work done. It's not entirely inaccurate to say the former group would regard a fragmented module namespace as a good thing, specifically because it discourages people from considering it to be stable....
Brandon, I find that a little hard to believe. If the issues are similar to other systems and languages, then I think it is more likely that no one has volunteered to work on it. You volunteering to help?
Yes, you do find it hard to believe; so hard that you went straight past it and tried to point to the "easy" technical solution to the problem you decided to see in place of the real one, which doesn't have a technical solution. -- brandon s allbery allbery.b@gmail.com wandering unix systems administrator (available) (412) 475-9364 vm/sms

On 4/24/12 9:59 AM, Gregg Lebovitz wrote:
The question of how to support rapid innovation and stable deployment is not an us versus them problem. It is one of staging releases. The Linux kernel is a really good example. The Linux development team innovates faster than the community can absorb it. The same was true of the GNU team. Distributions addressed the gap by staging releases.
In that case, what you are interested in is not Hackage (the too-fast torrent of development) but rather the Haskell Platform (a policed set of stable/core libraries with staged releases). I forget who the best person to contact is these days if you want to get involved with helping the HP, but I'm sure someone on the list will say shortly :) -- Live well, ~wren

On 4/24/2012 11:49 PM, wren ng thornton wrote:
On 4/24/12 9:59 AM, Gregg Lebovitz wrote:
The question of how to support rapid innovation and stable deployment is not an us versus them problem. It is one of staging releases. The Linux kernel is a really good example. The Linux development team innovates faster than the community can absorb it. The same was true of the GNU team. Distributions addressed the gap by staging releases.
In that case, what you are interested in is not Hackage (the too-fast torrent of development) but rather the Haskell Platform (a policed set of stable/core libraries with staged releases).
No, that was not what I was thinking because a stable policed set of core libraries is at the opposite end of the spectrum from how you describe Hackage. What I am suggesting is a way of creating an upstream that feeds increasingly stable code into an ever increasing set of stable and useful components. Using the current open system model, the core compiler team for gcc releases the compiler and a set of libstdc and libstdc++ libraries. The GNU folks release more useful libraries, and then projects like GNOME build on the other components. Right now we have Hackage that moves to fast and the Haskell core that rightfully moves more slowly. Maybe the answer is to add a rating system to Hackage and mark packages as experimental, unsupported, and supported, or use a 5 star rating system like the app store. Later on when we have appropriate testing tools, we can include a rating from the automated tests.
I forget who the best person to contact is these days if you want to get involved with helping the HP, but I'm sure someone on the list will say shortly :)

On Tue, Apr 24, 2012 at 7:29 PM, Gregg Lebovitz
On 4/23/2012 10:17 PM, Brandon Allbery wrote:
On Mon, Apr 23, 2012 at 17:16, Gregg Lebovitz
wrote: On 4/23/2012 3:39 PM, Brandon Allbery wrote:
The other dirty little secret that is carefully being avoided here is the battle between the folks for whom Haskell is a language research platform and those who use it to get work done. It's not entirely inaccurate to say the former group would regard a fragmented module namespace as a good thing, specifically because it discourages people from considering it to be stable....
Brandon, I find that a little hard to believe. If the issues are similar to other systems and languages, then I think it is more likely that no one has volunteered to work on it. You volunteering to help?
Does haskell/hackage have something like debian's lintian?
Debian has a detailed policy document that keeps evolving: http://www.debian.org/doc/debian-policy/ Lintian tries hard to automate (as much as possible) policy-compliance http://lintian.debian.org/manual/index.html Eg how packages should use the file system http://tldp.org/LDP/Linux-Filesystem-Hierarchy/html/ Even 'boring' legal stuff like license-checking is somewhat automated http://dep.debian.net/deps/dep5/ And most important is the dos and donts for package dependency making possible nice pics http://collab-maint.alioth.debian.org/debtree/ Of course as Wren pointed out, the Linux communities have enough manpower to police their distributions which haskell perhaps cannot. My question is really: Would not something like a haskell-lintian make such sanity checking easier and more useful for everyone?

On Wed, May 23, 2012 at 9:44 PM, Gregg Lebovitz
Rustom,
I am drafting a document that captures some of the social norms from the comments on this list, mostly from Brandon and Wren. I have captured the discussion about module namespace and am sorting out the comments on the relationship between libraries and packages.
My initial question to the list was to try an identify where Haskell is different from other open source distributions. From what I can tell, the issues are very similar. The module name space seems to have characteristics very similar to the include file hierarchy of linux distributions.
If you have some spare cycles and would like to contribute, I think everyone would appreciate your help and effort
Gregg
Hi Gregg. One of the common complaints one gets from a first year programming student (and its now about 3 decades I dealing with these!) is: "The compiler/interpreter etc HAS a BUG!!!" So... While I am an old geezer with programming and functional programming -- doing, teaching, playing, implementing, or just plain thinking but -- I am too much of a noob to ghc to risk falling into the "1st year student" trap above. Yes perhaps not a typical noob... Somethings are easier for me than the typical noob -- all the 'classical' good-stuff like pattern-matching, lambda-calculus, type-inferencing, polymorphism etc. And this is helpful to understand the 'modern good stuff' starting monads and onwards But then I get hit -- finding my way round hackage, installing with cabal etc -- even tho I'm an ol-time unix hacker and sysadmin-er. So I guess its best to assume (as of now) that I dont know the ropes rather than something is wrong/broken with them. O well... If the noob trap is one error playing it safe is probably another so here goes with me saying things that I (probably) know nothing about: 1. cabal was a beautiful system 10 years ago. Now its being forcibly scaled up 2 (3?) orders of magnitude and is creaking at the seams 2. There's too much conflicting suggestions out there on the web for a noob - use system install (eg apt-get) or use cabal - cabal in user area or system area etc - the problem is exponentiated by the absence of cabal uninstall

Rustom:
O well... If the noob trap is one error playing it safe is probably another so here goes with me saying things that I (probably) know nothing about: 1. cabal was a beautiful system 10 years ago. Now its being forcibly scaled up 2 (3?) orders of magnitude and is creaking at the seams
The problem is, Cabal is not a package management system. The name gives it away: it is the Common Architecture for *Building* Applications and Libraries. Cabal is to Haskell how GNU autotools + make is to C: a thin wrapper that checks for dependencies and invokes the compiler. All that boring not-making-your-package-break-everything-else stuff belongs to the distribution maintainer, not Hackage and Cabal.
2. There's too much conflicting suggestions out there on the web for a noob - use system install (eg apt-get) or use cabal
Use apt-get. Your distribution packages are usually new enough, have been tested thoroughly, and most importantly, do not conflict with each other.
- cabal in user area or system area etc
Installing with --user is usually the best, since they won't clobber system packages and if^H^Hwhen they do go wrong, you can simply rm -r ~/.ghc. For actual coding, it's better to use a sandboxing tool such as [cabal-dev][] instead. [cabal-dev]: http://hackage.haskell.org/package/cabal-dev
- the problem is exponentiated by the absence of cabal uninstall
See above. By the way, someone else a whole article about it: https://ivanmiljenovic.wordpress.com/2010/03/15/repeat-after-me-cabal-is-not... Hope that clears it up for you. Chris

On 4/23/12 11:39 AM, Gregg Lebovitz wrote:
On 04/23/2012 12:03 AM, wren ng thornton wrote:
However, until better technical support is implemented (not just for GHC, but also jhc, UHC,...) it's best to follow social practice.
Wren, I am new to Haskell and not aware of all of the conventions. Is there a place where I can find information on these social practices? Are they documented some place?
Not that I know of, though they're fairly standard for any open-source programming community. E.g., when it comes to module names: familiarize yourself with what's out there; try to fit in with the patterns you see[1]; don't intentionally clash, steal namespaces[2], or squat on valuable territory[3]; be reasonable and conscientious when interacting with people. [1] e.g., the use of Data.* for data structures which are predominantly/universally treated as such, vs the use of Control.* for things which are often thought of as control structures (monads, etc). The use of Foo.Bar.Strict and Foo.Bar.Lazy when you provide both strict and lazy versions of some whole API, usually with Foo.Bar re-exporting whichever one seems the sensible default. The use of Foo.Bar.Class to resolve circular import issues when defining a class and a bunch of datatypes with instances. Etc. [2] I mean things like if some package is providing a bunch of Foo.Bar.* modules, and it's the only one doing so, then you should try to get in touch with the maintainer before you start publishing your own Foo.Bar.* modules--- in order to collaborate, to send patches up-stream, or just to let them know what's going on. [3] Witness an unintentional breach of this myself a while back. When I was hacking up the exact-combinatorics package for my own use, I put things in Math.Combinatorics.* since that's a reasonable place and wasn't in use; but I didn't think of that fact when I decided to publish the code. When pointed out, I promptly moved everything to Math.Combinatorics.Exact.* since that project is only interested in exact combinatorics and I have no intention of codifying all of combinatoric theory; hence using Math.Combinatorics.* would be squatting on very valuable names.
However, centralization is prone to bottlenecks and systemic failure. As such, while it would be nice to ensure that a given module is provided by only one package, there is no mechanism in place to enforce this (except at compile time for the code that links the conflicting modules together).
From someone new to the community, it seems that yes centralization has its issues, but it also seems that practices could be put in place that minimize the bottlenecks and systemic failures.
Unless I greatly misunderstand the challenges, there seem to be lot of ways to approach this problem and none of them are new. We all use systems that are composed of many modules neatly combined into complete systems. Linux distributions do this well. So does Java. Maybe should borough from their experiences and think about how we put packages together and what mechanisms we need to resolve inter-package dependencies.
Java attempts to resolve the issue by imposing universal authority (use reverse urls for the first part of your package name). Many Java developers flagrantly ignore that claim to authority. Sun/Oracle has no interest in actually policing these violations, and there's no central repository for leveraging social pressure to do it. Moreover, open-source developers who do not have a commercial/institutional affiliation are specifically placed in a tough spot, and are elided from public discourse because of that fact, which is extremely problematic on too many levels to get into here. Furthermore, many developers ---especially among open-source and academic authors--- have an inherent distrust for ambient authority like this. To pick another similar namespacing issue, consider the problem of Google Code. In Google Code there's a single namespace for projects, and the Google team spends a lot of effort on maintaining that namespace and resolving conflicts. (I know folks who've worked in the lab next door to that team. So, yes, they do spend a lot of work on it.) Whereas if you consider BitBucket or GitHub, each user is given a separate project namespace, and therefore the only thing that has to be maintained is the user namespace--- which has to be done anyways in order to deal with logins. The model of Google Code, SourceForge, and Java all assume that projects and repositories are scarce resources. Back in the day that may have been true (or may not), but today it is clearly false. Repos are cheap and everyone has a dozen side projects. If you look at the case of Perl and CPAN, there's the same old story: universal authority. Contrary to Java, CPAN does very much actively police (or rather, vett) the namespace. However, this extreme level of policing requires a great deal of work and serves to drive away a great many developers from publishing their code on CPAN. I'm not as familiar with the innards of how various Linux distros manage things, but they're also tasked with the additional burden of needing to pull in stuff from places like CPAN, Hackage, etc. Because of that, their namespace situation seems quite different from that of Hackage or CPAN on their own. I do know that Debian at least (and presumably the others as well) devote a great deal of manpower to all this. So we have (1) the Java model where there are rules that noone follows; (2) the Google Code, CPAN, and Linux distro model of devoting a great deal of community resources to maintaining the rules; and (3) the BitBucket, GitHub, Hackage model of having few institutionalized rules and leaving it to social factors. The first option buys us nothing over the last, excepting a false sense of security and the ability to alienate private open-source developers. The second option does arguably give us something, but it's extremely expensive. I don't know if you've been involved in the administrative side of that, but if not then it is far more expensive than you realize. I've worked with CPAN, and many of the folks on this list do packaging for Debian, Arch, and other Linux distros, so we're familiar with what it means to ask for a universal authority. The Perl and Linux distro communities are *huge* and so they can actually afford the cost of setting up this authority, but even they run into limitations of scale. Considering how much difficulty we've had getting someone to officially take over Hackage so that we can finally get to using hackage2, it's fair to say that Haskell has nowhere near a large enough community to sustain the kind of work it would take to police the namespace. There is no technical solution to this problem, at least not any used by the communities you cite. The only solutions on offer require a great deal of human effort, which is always a social/political/economic matter. The only technical avenues I see are ways of making the problem less problematic, such as GitHub and BitBucket distinguishing the user namespace from each user's project namespace, such as the -XPackageImports extension (which is essentially the same as GitHub/BitBucket), or such as various ideas about using tree-grafting to rearrange the module namespace on a per-project basis thereby allowing clients to resolve the conflicts rather than requiring a global solution. I'm quite interested in that last one, though I don't have any time for it in the foreseeable future. -- Live well, ~wren

On Wed, 25 Apr 2012 05:44:28 +0200, wren ng thornton
On 4/23/12 11:39 AM, Gregg Lebovitz wrote:
On 04/23/2012 12:03 AM, wren ng thornton wrote:
However, until better technical support is implemented (not just for GHC, but also jhc, UHC,...) it's best to follow social practice.
Wren, I am new to Haskell and not aware of all of the conventions. Is there a place where I can find information on these social practices? Are they documented some place?
Not that I know of, though they're fairly standard for any open-source programming community. E.g., when it comes to module names: familiarize yourself with what's out there; try to fit in with the patterns you see[1]; don't intentionally clash, steal namespaces[2], or squat on valuable territory[3]; be reasonable and conscientious when interacting with people.
The following page gives you some idea of the module names: http://www.haskell.org/haskellwiki/Hierarchical_module_names An overview of pages about programming style: http://www.haskell.org/haskellwiki/Category:Style Regards, Henk-Jan van Tuyl -- http://Van.Tuyl.eu/ http://members.chello.nl/hjgtuyl/tourdemonad.html Haskell programming --

On 4/24/2012 11:44 PM, wren ng thornton wrote:
To pick another similar namespacing issue, consider the problem of Google Code. In Google Code there's a single namespace for projects, and the Google team spends a lot of effort on maintaining that namespace and resolving conflicts. (I know folks who've worked in the lab next door to that team. So, yes, they do spend a lot of work on it.) Whereas if you consider BitBucket or GitHub, each user is given a separate project namespace, and therefore the only thing that has to be maintained is the user namespace--- which has to be done anyways in order to deal with logins. The model of Google Code, SourceForge, and Java all assume that projects and repositories are scarce resources. Back in the day that may have been true (or may not), but today it is clearly false. Repos are cheap and everyone has a dozen side projects.
Actually, I like the idea of combining an assigned User name with the repo name as the namespace. We already have login names for haskell.org, why not use those. I agree that it is not an end all, but it would be a start. My top level namespace would be Org.Haskell.Glebovitz. It is democratic and it identifies the code by the repoand the user the created it. If someone else decided to use their github id then it their modules would be org.github.username or org.github.project. Of course people can choose to ignore the namespace common practice, but they can do that anyway.
If you look at the case of Perl and CPAN, there's the same old story: universal authority. Contrary to Java, CPAN does very much actively police (or rather, vett) the namespace. However, this extreme level of policing requires a great deal of work and serves to drive away a great many developers from publishing their code on CPAN.
I'm not as familiar with the innards of how various Linux distros manage things, but they're also tasked with the additional burden of needing to pull in stuff from places like CPAN, Hackage, etc. Because of that, their namespace situation seems quite different from that of Hackage or CPAN on their own. I do know that Debian at least (and presumably the others as well) devote a great deal of manpower to all this.
Yes, but that goes back to my comments about upstream and downstream. Hackage can try to solve the problem for itself, but eventually someone is going to put together a distribution, whether it be ubuntu, or Microsoft and they will have to sort out the name collisions for their packages and modules. If we have a good naming scheme to start with, it will make the downstream problem a bit easier. Even so, they will probably change it anyways. I know that ubuntu and fedora take different approaches to packaging. When I try to use a package like Qt on these different platforms, I have to figure out which package contains which library.
So we have (1) the Java model where there are rules that noone follows; (2) the Google Code, CPAN, and Linux distro model of devoting a great deal of community resources to maintaining the rules; and (3) the BitBucket, GitHub, Hackage model of having few institutionalized rules and leaving it to social factors. The first option buys us nothing over the last, excepting a false sense of security and the ability to alienate private open-source developers.
I think my combo of formalized namespace and social rules would work best here. The problem is that we do have module collisions because the namespace is too simple. Right now it is not an issue because the community is not huge. Eventually it will be a problem if Haskell popularity grows.
There is no technical solution to this problem, at least not any used by the communities you cite. The only solutions on offer require a great deal of human effort, which is always a social/political/economic matter. The only technical avenues I see are ways of making the problem less problematic, such as GitHub and BitBucket distinguishing the user namespace from each user's project namespace, such as the -XPackageImports extension (which is essentially the same as GitHub/BitBucket), or such as various ideas about using tree-grafting to rearrange the module namespace on a per-project basis thereby allowing clients to resolve the conflicts rather than requiring a global solution. I'm quite interested in that last one, though I don't have any time for it in the foreseeable future.
There probably is a technical solution, but no one is going to discover it and build it anytime soon. dI think we all agree that a centralized global solution is out. No one would want to manage it. I do think the repo.username namespace has potential. The problem is that informal social convention works if the community is small. Once it starts to grow it has to be codified to some degree.

Thanks for the write-up -- it's been very helpful!
On Mon, Apr 23, 2012 at 12:03 AM, wren ng thornton
Consider one of my own libraries (chosen randomly via Safari's url autocompletion):
http://hackage.haskell.org/**package/bytestring-lexinghttp://hackage.haskell.org/package/bytestring-lexing
When I inherited this package there were the Data.ByteString.Lex.Double and Data.ByteString.Lex.Lazy.**Double modules, which were separated because they provide the same API but for strict vs lazy ByteStrings. Both of those modules are concerned with lexing floating point numbers. I inherited the package because I wanted to publicize some code I had for lexing integers in various formats. Since that's quite a different task than lexing floating point numbers, I put it in its own module: Data.ByteString.Lex.Integral.
I see. The first thing that comes to mind is the notion of module granularity, which of course is subjective, so whether a single module or multiple ones should handle e.g. doubles and integrals is a good question; are there guidelines as to how those choices are made? At any rate, why do these modules, with sufficiently-different functionality, live in the same library -- is it that they share some common bits of implementation, or to ease the management of source code? When dealing with FFI code, because of the impedance mismatch between
Haskell and imperative languages like C, it's clear that there's going to be some massaging of the API beyond simply declaring FFI calls. As such, clearly we'd like to have separate modules for doing the low-level binding vs presenting a high-level API. Moreover, depending on what you're interfacing with, you may be forced to have multiple low-level modules.
Ah, that's a good use case. Is the lower-level module usually made "public" as well, or is it only an implementation detail?
On the other hand, the main purpose of packages or libraries is as unit of distribution, code reuse, and separate compilation. Even with the Haskell culture of making small libraries, most worthwhile units of distribution/reuse/compilation tend to be larger than a single namespace/concern. Thus, it makes sense to have more than one module per package, because otherwise we'd need some higher level mechanism in order to manage the collections of package-modules which should be considered a single unit (i.e., clients will almost always want the whole bunch of them).
This is the part that I'm trying to get a better sense of. I can see how in some cases, it makes sense for more than one module to form a unit, because they are tightly coupled semantically or implementation-wise -- so clients will indeed want the whole bunch. On the other hand, several libraries provide modules that are all over the place, in a way that doesn't form a "unit" of any kind (e.g. MissingH), and it's not clear that you would want any Network stuff when all you need is String utilities. However, centralization is prone to bottlenecks and systemic failure. As
such, while it would be nice to ensure that a given module is provided by only one package, there is no mechanism in place to enforce this (except at compile time for the code that links the conflicting modules together). With few exceptions, it's considered bad form to knowingly use the same module name as is being used by another package. In part, it's bad form because egos are involved; but it's also bad form because there's poor technical support for resolving namespace collisions for module names. In GHC you can use -XPackageImports, which is workable but conflates issues of code with issues of provenance, which the Haskell Report intentionally keeps separate. However, until better technical support is implemented (not just for GHC, but also jhc, UHC,...) it's best to follow social practice.
But the way you describe it, it seems that despite centralization having those disadvantages, it is more or less the way the system works, socially (egos, bad form, etc.) and technically (because of the lack of compiler support) -- except that it is ad-hoc instead of mechanically enforced. In other words, I don't see what the advantages of allowing ambiguity currently are. Some people figured to solve the new issue by implementing it both ways in
separate packages, but reusing the same module names. (Witness for example mtl-2 aka monads-fd, vs monads-tf.) In practice, that didn't work out so well. Part of the reason for failure is that although fundeps and TF/ATs are formally equivalent in theory, in practice the implementation of TF/ATs has(had?) been missing some necessary machinery, and consequentially the TF/AT versions were not as powerful as the original fundep versions. Though the butterfly dependency issues certainly didn't help.
Ah, interesting. So, perhaps I misunderstand, but this seems like an argument in favor of having uniquely-named modules (e.g. Foo.FD and Foo.TF) instead of overlapping ones, right? Alvaro

On 4/23/12 3:06 PM, Alvaro Gutierrez wrote:
I see. The first thing that comes to mind is the notion of module granularity, which of course is subjective, so whether a single module or multiple ones should handle e.g. doubles and integrals is a good question; are there guidelines as to how those choices are made?
I'm not sure if there are any guidelines per se; that's more of a general software engineering problem. If you browse around on Hackage you'll get a fairly good idea what the norms are though. Everyone seems to have settled on a common range of scope--- with notable exceptions like the containers library with far too many functions per module, and some of Ed Kmett's work on category theory which tends towards very few declarations per module.
At any rate, why do these modules, with sufficiently-different functionality, live in the same library -- is it that they share some common bits of implementation, or to ease the management of source code?
I contacted Don Stewart (the former maintainer) to see whether he thought I should release the integral stuff on its own, or integrate it into bytestring-lexing. We agreed that it made more sense to try to build up a core library for lexing various common data types, rather than having a bunch of little libraries. He'd just never had time to get around to developing bytestring-lexing further; so I took over. Eventually I plan to add rendering functions for floating point, and to split up the parsers for different floating point formats[1], so that it more closely resembles the integral stuff. But that won't be until this fall or later, unless someone requests it sooner. [1] Having an omni-parser can be helpful when you want to be liberal about your input. But when you're writing parsers for a specified format, usually they're not that liberal so we need to offer restricted lexers in order to give code reuse.
When dealing with FFI code, because of the impedance mismatch between Haskell and imperative languages like C, it's clear that there's going to be some massaging of the API beyond simply declaring FFI calls. As such, clearly we'd like to have separate modules for doing the low-level binding vs presenting a high-level API. Moreover, depending on what you're interfacing with, you may be forced to have multiple low-level modules.
Ah, that's a good use case. Is the lower-level module usually made "public" as well, or is it only an implementation detail?
Depends on the project. For ByteStrings, most of that is hidden away as implementation details. For binding to C libraries, I think the current advice is to offer the low-level interface so that if there's something the high-level interface can't handle well, people have some easy recourse.
On the other hand, the main purpose of packages or libraries is as unit of distribution, code reuse, and separate compilation. Even with the Haskell culture of making small libraries, most worthwhile units of distribution/reuse/compilation tend to be larger than a single namespace/concern. Thus, it makes sense to have more than one module per package, because otherwise we'd need some higher level mechanism in order to manage the collections of package-modules which should be considered a single unit (i.e., clients will almost always want the whole bunch of them).
This is the part that I'm trying to get a better sense of. I can see how in some cases, it makes sense for more than one module to form a unit, because they are tightly coupled semantically or implementation-wise -- so clients will indeed want the whole bunch. On the other hand, several libraries provide modules that are all over the place, in a way that doesn't form a "unit" of any kind (e.g. MissingH), and it's not clear that you would want any Network stuff when all you need is String utilities.
Yeah, MissingH and similar libraries are just grab-bags full of stuff. Usually grab-bag libraries think of themselves as place-holders, with the intention of breaking things out once there's something of a large enough size to warrant being its own package. (Whether the breaking out actually happens is another matter.) But to get the general sense of things, you should ignore them. Instead, consider one of the parsing libraries like uu-parsinglib, attoparsec, parsec, frisby. There are lots of pieces to a parsing framework, but it makes sense to distribute them together. Or, consider one of the base libraries for iteratees, enumerators, pipes, conduits, etc. Like parsing, these offer a whole framework. You won't usually need 100% of it, but everyone needs a different 80%. Or to mention some more of my own packages, consider stm-chans, unification-fd, or unix-bytestrings. In unification-fd, the stuff outside of Control.Unification.* could be moved elsewhere, but the stuff within there makes sense to be split up yet distributed together. For stm-chans because of the similarity in interfaces, use cases, etc, it'd be peculiar to want to separate them into different packages. In unix-bytestring I separated off the Iovec stuff (FFI implementation details) from the main API, but clearly they must go together.
But the way you describe it, it seems that despite centralization having those disadvantages, it is more or less the way the system works, socially (egos, bad form, etc.) and technically (because of the lack of compiler support)
There's a difference between centralization and communalization. With centralization there's a central authority who makes all the rules and (usaully) enforces them. This is the benevolent dictator model common in open-source. The problem is: what do you do if the dictator goes missing (gets hit by a bus, is too busy this semester, etc)? With communalization, there's no central authority that writes/enforces the laws; instead, the community as a whole will come to agree on the norms. This is the way societies often operate (i.e., societies as cultures, rather than as governments). In virtue of the social interaction, things come to be a particular way, but there isn't necessarily any person or committee that decided it should be that way. Moreover, in order to disrupt the norms it's not enough to dispose of a dictator; you need some wide-scale way of disrupting the network of social interaction. The problem here is that it can be very hard to steer a community. If you've identified a problem, it's not clear how to get it fixed (whereas a dictator could just issue a fiat). In practice, every organization has a bit of both models; it's just a question of how much of each, and in what contexts. The Haskell community is more centralized when it comes to things like the Haskell Report and the Haskell Platform, because you really need it there. Whereas Hackage and the Cafe are more of your standard social community.
except that it is ad-hoc instead of mechanically enforced. In other words, I don't see what the advantages of allowing ambiguity currently are.
If you mechanically enforce things then you will find clashes. That's not the problem: clashes exist, you find them, whatever. The problem is: now that you've found it, how are you going to resolve it? You can't just make Hackage refuse packages which would cause a module name conflict. If you try then you'll get angry developers who just leave or who badmouth Haskell (or both), which does no good for anyone. You have to have an escape hatch, some way for people to raise legitimate issues such as "the conflictor hasn't been maintained in five years and has no users", or "I wrote the old package and this new package is meant to supersede it", etc. But now you need to have a group of people who work on resolving those issues and making those case-by-case decisions about how conflicts should be resolved. Allowing clashes saves you from needing that group of people. If you allow clashes, there are no developer complaints to be resolved. A lot of resources are tied up in making those central authority groups, and by not having such a central authority we free up those resources to be used elsewhere. In cases like Perl's CPAN and Linux distros, they have enough resources that they can afford the overhead cost to create and maintain such groups. In addition, they're large enough that the resources for that group doesn't necessarily diminish the resources for other things. E.g., some members of the Linux developer community are no good at programming, but they're great at social organization. If you have a central authority group, they can contribute to that and thereby provide resources; vs, if there's no such group, they're unlikely to offer programming time or other resources instead. Whereas for small communities: overhead costs are higher proportionally, and small communities aren't able to gather as many resources to cover them. In addition, the person who could offer social organization is probably already offering other resources which she wouldn't be able to offer if she moved over to helping the central authority; so you're closer to a zero-sum game of needing to decide how to allocate your scarce resources.
Ah, interesting. So, perhaps I misunderstand, but this seems like an argument in favor of having uniquely-named modules (e.g. Foo.FD and Foo.TF) instead of overlapping ones, right?
Yeah, probably. I mean, ideally I'd like to see GHC retooled so that both fundeps and type families actually compile down to the same code, and one is just sugar for the other (or both are sugar for some third thing). Then we'd get rid of the real problem of there being multiple incompatible ways of doing the same thing. Until then, it's probably better to just pick one approach for each project, rather than trying to maintain parallel forks for each approach. But if you're going to maintain parallel forks, then it's probably best to not do the module punning thing. -- Live well, ~wren

Alvaro Gutierrez wrote:
I've only dabbled in Haskell, so please excuse my ignorance: why isn't there a 1-to-1 mapping between libraries and modules?
As I understand it, a library can provide any number of unrelated modules, and conversely, a single module could be provided by more than one library. I can see how this affords library authors more flexibility, but at a cost: there is no longer a single, unified view of the library universe. (The alternative would be for every module to be its own, hermetic library.) So I'm very interested in the rationale behind that aspect of the library system.
I am probably repeating arguments brought forward by others, but I really like that the Haskell module name space is ordered along functionality rather than authorship. If I ever manage to complete an implementaton of the EPICS pvData project in Haskell, it will certainly inherit the Java module naming convention and thus will contain modules named Org.Epics.PvData.XXX, *but* if I need to add utility functions to the API that are generic list processing functions they will certainly live in the Data.List.* name space and if I need to add type level stuff (which is likely) it will be published under Data.Type.* etc. This strikes me as promoting re-use: makes it far easier and more likely to factor out these things into a separate general purpose library or maybe even integrate them into a widely known standard library. It also gives you a much better idea what the thing you export is doing than if it is from, say, Org.Epics.PvData.Util. Finally, it gives the package author an incentive to actually do the refactoring that makes it obvious where the function belongs to, functionally. Cheers Ben
participants (8)
-
Alvaro Gutierrez
-
Ben Franksen
-
Brandon Allbery
-
Chris Wong
-
Gregg Lebovitz
-
Henk-Jan van Tuyl
-
Rustom Mody
-
wren ng thornton