Re: Stop untracked dependencies!

"S. Alexander Jacobson"
A major problem in every major language I have ever used is module systems that make it all too easy for untracked dependencies to creep silently into code. The result has repeatedly been nasty surprises when moving code between machines. A Haskell example of this problem might look something like this:
* Oh, Network.HTTP is non-standard!? Thats wierd. * Uh Oh. Google reveals at least three different implementations... * Now, how do I determine which one is installed here? * Is there a local copy of the package somewhere? * No? Ok, is it available for download somewhere? * Uhm, is the version I am using still available for download? * Oh, we made local changes too!? What were they? * Ok, we added Network.Socket.SSL. That's standard right? [rinse/repeat] [A lot of detective work later...] * Ok, now we've figured out all the packages, how do we ship them? [shipping strategy devised...] * But what happens if I don't have root/admin on the target machine? [...] * Oh, this package conflicts with something already installed? [etc.]
What does this example have to do with untracked dependencies? Sounds like more of a problem with a lack of a central database ala Hackage.
It is true that Cabal's Build-Depends doesn't do everything you want it to, but your proposal also duplicates some of the functionality of Build-Depends, and makes it so that there are two places where dependencies are stored. We should only have to specify dependencies once.
I agree you should only have to specify dependencies once. But, deferring dependency tracking to the point you are ready to ship (with Cabal or otherwise) is a recipe for disaster.
Actually, it gets checked at build time if you use cabal to build during development. This can fail for (ghc-pkg) exposed packages, though, see the message I just posted.
Strawman proposal 2:
Modify the import syntax to allow package identifiers:
import qualified HaXML HaXML.XML.Parse import HAppS HAppS.ACID import Personal MyLib
Can you relate this proposal to the previous grafting work? It sounds like you're basically just adding a new root to the module hierarchy. My $0.02: I'm still more comfortable with solving this outside the core language, with the package system, cabal, and hackage. (snip)
I don't think it's necessary to specify dependencies with module granularity. This just increases the number of possible error cases, without adding functionality.
If I use multiple packages that export the same module identifier, I need a way to specify which one I want to use. Haskell's existing packaging model doesn't let met do that easily.
Does the grafting proposal solve this problem?
- Package dependencies in Build-Depends could be specified using URLs.
But then dependencies aren't checked at compile time and you can't specify which modules come from which packages.
- Cabal could download, build, and install dependencies without any user intervention.
Download and build? Great! Install? No thank you!
FWIW, right now (thanks to Lemmih), Hackage has a client-server interface so that the client can query hackage as to the dependencies for a given package, and the URLs. So you'll be able to "caball-get foo" and it'll download and install foo's dependencies along with foo. Since it'll be source-packages only (at least at first), there's no "cabal-get build-dependencies-of foo", since to use foo you'll need the build and run-time dependencies. peace, isaac

Isaac, I want to allow the user to know exactly which implementations they are using when they import a module-id while at the same time giving them maximal choice of implementations to use. Everything else here is just details. On Thu, 31 Mar 2005, Isaac Jones wrote:
What does this example have to do with untracked dependencies? Sounds like more of a problem with a lack of a central database ala Hackage.
* Oh, Network.HTTP is non-standard!? Thats wierd.
Someone installed Network.HTTP is installed as an "exposed package." Exposed packages are an obvious source of untracked dependencies as you agreed in the other thread. They should be prohibited.
* Uh Oh. Google reveals at least three different implementations... * Now, how do I determine which one is installed here?
Silence on which implementation are actually used creates a second untracked dependency. The implicit point here is that Cabal does not guarantee a mapping from a module identifier to a particular implementation. And even if you specify a build-depends, Cabal still does not provide any guarantees about the integrity of package namespace. Requiring that packages be listed in some centralized database may help, but it seems like the worst form of bureacracy. How does code get into the database? Who controls package namespace? What if I don't want to share all my code with the operators of the hackage server? Who is allowed to modify packages on the hackage server? And, if there is more than one hackage database/server, then we don't have a centralized database anymore.
* Is there a local copy of the package somewhere?
Does Cabal guarantee that you can easily reproduce a portable package from the stuff that is installed on the local machine? Or does it make it easy for the original package to be destroyed. As soon as you've destroyed the original package, you've created another untracked dependency.
* No? Ok, is it available for download somewhere? * Uhm, is the version I am using still available for download?
And if you don't have the original URL from which you retrieved it you have another untracked dependency. Also, does Cabal/Hackage provide a mechanism to give you notice if the official version of the package has changed and allow you to decide whether to update or not? If you use HTTP URLs rather than a new package namespace, then can use get HTTP's caching semantics to give you this update model for free.
* Oh, we made local changes too!? What were they?
Does Cabal/Hackage notify the user of differences between official packages and local versions? If not, this is another untracked dependency.
* Ok, we added Network.Socket.SSL. That's standard right? [rinse/repeat] [A lot of detective work later...] * Ok, now we've figured out all the packages, how do we ship them? [shipping strategy devised...]
Does Cabal presume that all packages are guaranteed to be available based on the name in build-depends or does it provide an option to ship them from a machine with a local copy? If you don't have the later, then you have another risk in your code.
* But what happens if I don't have root/admin on the target machine? [...] * Oh, this package conflicts with something already installed? [etc.]
Are some packages allowed to modify the haskell implementation itself? More risk of untracked dependencies.
I agree you should only have to specify dependencies once. But, deferring dependency tracking to the point you are ready to ship (with Cabal or otherwise) is a recipe for disaster.
Actually, it gets checked at build time if you use cabal to build during development. This can fail for (ghc-pkg) exposed packages, though, see the message I just posted.
I don't know what "build time" means here. I want them checked when I am playing with my code interactively using ghci. And I need a way to resolve potential conflicts in the dependencies of packages I am using. See the remap and redirect tags from packages file format in my proposal.
Strawman proposal 2:
Modify the import syntax to allow package identifiers:
import qualified HaXML HaXML.XML.Parse import HAppS HAppS.ACID import Personal MyLib
Can you relate this proposal to the previous grafting work?
The previous grafting work was about using the same module id from different implementations. Consistent with the community's rejection of grafting, this proposal explicitly requires that the compiler produce an error if the user attempts to use the same module id from two different implementations.
It sounds like you're basically just adding a new root to the module hierarchy.
No, I am just providing a way for the user to specify exactly which implementation should be used to provide the functionality identified by a given module id.
I don't think it's necessary to specify dependencies with module granularity. This just increases the number of possible error cases, without adding functionality.
If I use multiple packages that export the same module identifier, I need a way to specify which one I want to use. Haskell's existing packaging model doesn't let met do that easily.
Does the grafting proposal solve this problem?
It might, but grafting was rejected and this problem needs to be solved. Implementation of my proposal would solve the problem without requiring grafting. -Alex- ______________________________________________________________ S. Alexander Jacobson tel:917-770-6565 http://alexjacobson.com

Alex writes:
I want to allow the user to know exactly which implementations they are using when they import a module-id while at the same time giving them maximal choice of implementations to use. Everything else here is just details.
Great! I believe this is exactly one of the problems that Cabal + Hackage set out to solve. We more-or-less agree on the problem, so now the details of a solution do need to be debated, and hopefully we will end up with a better design, with better coverage of the main use cases. Your email had plenty of useful observations and questions, and I don't have answers to all of them. So here are my thoughts on just a few of your points.
* Uh Oh. Google reveals at least three different implementations... * Now, how do I determine which one is installed here?
Silence on which implementation are actually used creates a second untracked dependency.
I recall that Cabal had (at some early design stage) the ability to specify a range of version numbers on each package dependency, e.g. Foo requires Bar, any version between 1.2 and 1.8. The ranges are open, so you could say "anything > 6.2" for instance.
The implicit point here is that Cabal does not guarantee a mapping from a module identifier to a particular implementation.
If you specify a singular package version, then you get a particular individual implementation, but it could be useful to be more flexible as well, where any one of several implementations would suffice.
And even if you specify a build-depends, Cabal still does not provide any guarantees about the integrity of package namespace.
The community oversees the namespace (currently). It is a social process, which means there will be occasional mistakes, misunderstandings, and changes. If you really want /guarantees/ of non-overlaps, then an automated central authority is inevitable. That would be OK, but then you said the following...
Requiring that packages be listed in some centralized database may help, but it seems like the worst form of bureacracy.
On your specific questions:
How does code get into the database?
The author submits it. (Think wiki.)
Who controls package namespace?
If there is a centralised authority, then it can be automated in a way similar to the DNS. The author of a package suggests a name, and the authority accepts it, provided only that it does not already exist.
What if I don't want to share all my code with the operators of the hackage server?
Then you get into the idea of additional, local, private, servers, over which you hold authority. With this distributed authority however, you now need to come up with a policy on overlaps between the central public namespace, and the local private one. I would guess probably "private overrides public" is a reasonable policy.
Who is allowed to modify packages on the hackage server?
The author. If the author decides not to continue with a package, then they could delegate to a maintainer. I would expect some packages to become orphaned, with no maintainer. On the one hand, that means the package is pretty stable for users. :-) But on the other, it means no-one has authority to fix bugs either. So I would expect that orphaned packages could be "held in trust for the community" or something - i.e. an administrator could give temporary permission to someone to update it when necessary.
And, if there is more than one hackage database/server, then we don't have a centralized database anymore.
The DNS system has thousands of servers, and something like seven "root" servers, yet it seems to manage pretty well...
[...]
Regards, Malcolm
participants (3)
-
Isaac Jones
-
Malcolm Wallace
-
S. Alexander Jacobson