RE: Stop untracked dependencies!

On 01 April 2005 07:31, S. Alexander Jacobson wrote:
I want to allow the user to know exactly which implementations they are using when they import a module-id while at the same time giving them maximal choice of implementations to use. Everything else here is just details.
I'll comment on the overall goal then: I'm in complete agreement that requiring dependencies to be specified is a good thing. For some reason you want to do this with language extensions or compiler extensions rather than using Cabal - but this is one of the things that we expected Cabal to address, which is why you're getting some pushback. I'm interested in why you're avoiding Cabal: if you could elaborate on that (perhaps in a separate thread) we might be able to make some improvements. You've identified a few problems with the current system: (a) some packages are exposed by default, which leads to people being able to write code without knowing (or needing to know) which packages they're using, resulting in untracked dependencies. I think this is a compromise, and I think it's a good one. Being able to just import any of the hierarchical modules I have installed on my system without having to give a -package flag is really useful (don't touch that keyboard! I haven't finished yet :-). If I decide that I want to give the code to someone else, then the dependencies become important. This is where Cabal comes in: if you're not already using Cabal as a plain build system, then you probably want to use it to ship the code to someone else anyway. And at that point you're forced to enumerate the dependencies. Sure, we currently have a problem that the dependencies in Cabal are unchecked, but that's just a bug. (I'm coming around to Isaac's earlier suggestion of -fhide-all-packages, actually). (b) dependencies aren't specified per-module. I can't express that I expect module M to come from package P. I see this as documentation, but I agree it's useful documentation, and it would be even more useful if it were automatically checked. I don't see any fundamental problem with adding this to Cabal.
* Oh, we made local changes too!? What were they?
Does Cabal/Hackage notify the user of differences between official packages and local versions? If not, this is another untracked dependency.
If you modify a package, you're required to give it a new version number and/or a different package name. No this isn't checked, but it could be (calculating a hash of the external interface of the package, for example).
* But what happens if I don't have root/admin on the target machine? [...] * Oh, this package conflicts with something already installed? [etc.]
Are some packages allowed to modify the haskell implementation itself? More risk of untracked dependencies.
Nope, they aren't (fortunately). This is one reason we removed the extra_ghc_options field that used to be in the package config. Cheers, Simon

If we are in agreement about the overall goal, then I'll move on to the specific issues that need addressing: * We Need Free Packaging Anyone should be able to create a package at any time for use in their projects without having having to negotiate with any particular centralized authority or community. HTTP URLs have this property. The current package namespace does not. Malcolm says "think wiki." I say: think world wide web. * Free Packaging Means We To Handle Collisions in Module Namespace Anyone should be able to use any combination of packages they want in any of their programs at any time without worrying about whether those packages export the same module name. Simon says:
We don't allow programs to contain two modules with the same name, for good reasons.
I'm not asking for that. I am totally ok with each module name mapping to one and only one implementation per program. I just want to be able choose that implementation even when my program uses two packages that both happen to export the same module name. * We Need to Associate Module Names with Specific Packages In associating module names with particular packages, the design choices come down to: a. using a "Modules" file that maps module names to package ids/URLs b. allowing package ids/URLs in import statements c. allowing package *names* in import statement and using an external "Packages" file to map package names to package ids/URLs. You could extend Cabal to implement (a). I just think (a) is really annoying because it forces you to edit an external packages file every time you do an import. I prefer (c) because it does not have this problem and import statements and remains very readable while instantly communicating the dependency being created. * Default Exposed Packages Are Untenable Although, I agree that fussing with a -packages command line is annoying, the need to associate module names with packages ids makes the existing default exposed package compromise untenable. But, all is not lost! If you choose (c) above, nothing stops you from having a default global "Packages" file. Then your marginal work is just to supply a package name in your import statements e.g.: import HaXML HaXML.XML.Parse And this doesn't seem so bad. And, the big bonus here is that the default case of using Cabal gets much simpler because it can extract the package names from your code and then automatically include the relevant subset of the global packages file in your code. No more need to produce a "build-depends" line manualy and no more risk of getting it wrong! You get an even bigger bonus if the package format has a standard content-type because then Cabal can offer the packaging user the option of including the dependent packages in the code itself so the user doesn't have to worry about the recipient not having access to the Internet, etc. * In Any Case, Default Exposed Packages Are Also a Poor Compromise. I don't want to have to worry about accumulating untracked dependcies when I am doing quickie work with GHCi. I want my dependencies checked every step of the way and should not have to round trip through a Cabal packaging step just to verify them. Adding -fhide-all-packages without doing (c) above does not solve this problem. It just means that I keep having to restart GHCi with a different command line every time I change a dependency OR I can keep GHCi open and lose track of which packages I have added interactively or just lose track of which packages are still necessary. In any case, (c) with a default packages file is nearly as simple and MUCH cleaner. * Conclusion: Freedom is good. At the top level, the choices comes down to these tradeoffs: 1. accepting some form of bureaucratic centralized control of package and module name space in exchange for the putative convenience of having default exposed packages and increasing the risk of untracked dependencies. -or- 2. reduing the risk of untracked dependencies and gaining freedom from centralized control in exchange for acceptings some responsibility for specifying which packages you want to use. I weigh the risk of untracked dependencies highly and think the marginal cost to the user of supplying package names in import statements is minimal so I strongly prefer (2) and (c). -Alex- ______________________________________________________________ S. Alexander Jacobson tel:917-770-6565 http://alexjacobson.com

On Friday 01 April 2005 21:31, S. Alexander Jacobson wrote:
* We Need Free Packaging
Anyone should be able to create a package at any time for use in their projects without having having to negotiate with any particular centralized authority or community.
I competely agree with this.
HTTP URLs have this property. The current package namespace does not. Malcolm says "think wiki." I say: think world wide web.
I may be wrong, but AFAIK, the "current package namespace" is whatever you have installed on your system (plus what you installed as 'user'). There is no central control as long as you don't distribute your code to the internet. You can give your internal packages the same name as some (via hackage) 'officially registered' package. A conflict will arise only when you try to install a different package with the same name as yours.
* Free Packaging Means We To Handle Collisions in Module Namespace
Anyone should be able to use any combination of packages they want in any of their programs at any time without worrying about whether those packages export the same module name.
Simon says:
We don't allow programs to contain two modules with the same name, for good reasons.
I'm not asking for that. I am totally ok with each module name mapping to one and only one implementation per program. I just want to be able choose that implementation even when my program uses two packages that both happen to export the same module name.
I agree that this would be /very/ nice to have. But this can be solved in Cabal with a configuration file that gives the package name for (only) those modules that (1) are imported and (2) appear in more than one used package. For each such module, the file states from which package the module should be taken. Cabal should complain as long as there are unresolved module name conflicts.
* We Need to Associate Module Names with Specific Packages
In associating module names with particular packages, the design choices come down to:
a. using a "Modules" file that maps module names to package ids/URLs
b. allowing package ids/URLs in import statements
c. allowing package *names* in import statement and using an external "Packages" file to map package names to package ids/URLs.
You could extend Cabal to implement (a). I just think (a) is really annoying because it forces you to edit an external packages file every time you do an import.
See above: this is only necessary in case of an actual conflict and thus it is not too annoying, I think.
* In Any Case, Default Exposed Packages Are Also a Poor Compromise.
I don't want to have to worry about accumulating untracked dependcies when I am doing quickie work with GHCi. I want my dependencies checked every step of the way and should not have to round trip through a Cabal packaging step just to verify them.
Well, a compromise is not poor just because it is a compromise. What you are demanding here is IMHO a bit extreme. Why do you think dependency tracking is essential for "doing quickie work with GHCi"? I couldn't care less about such things whenever I start hacking /anything/, using interpreter or compiler or whatever. Later on, when it becomes a larger project, I will use some sort of build system anyway, and then there is still plenty of time to care about what implementations (packages) exactly I am using. Ben

"S. Alexander Jacobson"
* We Need Free Packaging
Anyone should be able to create a package at any time for use in their projects without having having to negotiate with any particular centralized authority or community.
Agree. And this is the default current case.
HTTP URLs have this property. The current package namespace does not. Malcolm says "think wiki." I say: think world wide web.
Wiki might have been a poor example, but I did also mention DNS as a model for package naming. URLs are *not* completely arbitrary. You first require a valid domain name, which gets allocated by a single (but distributed) authority, using a lax first-come-first-served policy (at least approximately). And then you can only create a valid URL if you have permission from the domain owner to write to the webserver. You want all package authors to distribute their works through individual webservers, where they have their own self-appointed authority. I don't have a problem with that, but I would also like to have one or more trusted webservers, which collect together many packages, making them all available easily, with search facilities, and some reasonable dependency/update policy. The "trusted" authority would be, broadly speaking, the (or /a/) community, which would probably strive for a certain coherency and even perhaps "style" across packages. Think of the difference between building a Linux system by collecting the latest versions of every system tool and application yourself from the individual project websites, resolving all cross dependencies manually, or just downloading a complete RedHat, debian, SuSe, whatever, distribution, where somebody has already done the dependency analysis for you. Debian is a particularly good example, since you might collect only a minimal system to start with, but be secure in the knowledge that when you want to add more packages, they will be compatible, and the community has resolved all of the dependencies for you transparently. Some people will want to do the former. More power to them. But the vast majority will prefer the latter, simply for convenience.
* Free Packaging Means We To Handle Collisions in Module Namespace
I am totally ok with each module name mapping to one and only one implementation per program. I just want to be able choose that implementation even when my program uses two packages that both happen to export the same module name.
Yes, clashes of module names is absolutely the most central issue in the whole library universe. In the beginning, the Haskell module namespace was flat - big opportunity for overlaps - bad. Then we introduced hierarchy - a much larger namespace, more easily searchable - but still the potential for overlaps. So then we introduced "packages" - to gather modules together, so that we could permit the same hierarchical module name in different packages, provided the package name was different. Originally, we said that package names must be unique, but now you want package names to overlap as well. So now, to avoid ambiguity, the proposal is that the *storage location* of the package should be unique. It seems to me that we are just pushing the overlap-resolution problem around the plate - each time a new namespace is introduced, the potential for overlaps rears its ugly head again, and yet another new mechanism is proposed, which eventually fails to solve the problem too. You are against a central authority for allocating package names, yet you seem happy to accept a central authority for allocating the domain name part of a URL. So I conclude that you are not against central authority itself, but rather you would prefer to use a "standard" well-established authority (the DNS) rather than a new, yet-to-be-implemented authority (Hackage). Would that be a fair guess?
* We Need to Associate Module Names with Specific Packages
In associating module names with particular packages, the design choices come down to:
a. using a "Modules" file that maps module names to package ids/URLs
b. allowing package ids/URLs in import statements
c. allowing package *names* in import statement and using an external "Packages" file to map package names to package ids/URLs.
There is another choice: d. Have one or more external servers resolve module names to specific packages. Requires a single file specifying a list of servers to contact, in preference order. If a server cannot resolve the name, it is delegated further down the list. For a project that uses lots of wild packages with odd version numbers and weird dependencies, you might need to set up your own dedicated server just for that project. But most projects would just use two servers: (a) a local one for locally-written packages, and (b) a central community one for publically-contributed packages. Regards, Malcolm
participants (4)
-
Benjamin Franksen
-
Malcolm Wallace
-
S. Alexander Jacobson
-
Simon Marlow